Trip Purpose Imputation Using GPS Trajectories with Machine Learning

Qinggang Gao; Joseph Molloy; Kay W. Axhausen

doi:10.3390/ijgi10110775

Trip Purpose Imputation Using GPS Trajectories with Machine Learning

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10110775 ◽

2021 ◽

Vol 10 (11) ◽

pp. 775

Author(s):

Qinggang Gao ◽

Joseph Molloy ◽

Kay W. Axhausen

Keyword(s):

Machine Learning ◽

Data Mining ◽

Classification Accuracy ◽

Personal Information ◽

Research Field ◽

Machine Learning Techniques ◽

Small Subset ◽

Limited Information ◽

Trip Purpose ◽

Gps Trajectories

We studied trip purpose imputation using data mining and machine learning techniques based on a dataset of GPS-based trajectories gathered in Switzerland. With a large number of labeled activities in eight categories, we explored location information using hierarchical clustering and achieved a classification accuracy of 86.7% using a random forest approach as a baseline. The contribution of this study is summarized below. Firstly, using information from GPS trajectories exclusively without personal information shows a negligible decrease in accuracy (0.9%), which indicates the good performance of our data mining steps and the wide applicability of our imputation scheme in case of limited information availability. Secondly, the dependence of model performance on the geographical location, the number of participants, and the duration of the survey is investigated to provide a reference when comparing classification accuracy. Furthermore, we show the ensemble filter to be an excellent tool in this research field not only because of the increased accuracy (93.6%), especially for minority classes, but also the reduced uncertainties in blindly trusting the labeling of activities by participants, which is vulnerable to class noise due to the large survey response burden. Finally, the trip purpose derivation accuracy across participants reaches 74.8%, which is significant and suggests the possibility of effectively applying a model trained on GPS trajectories of a small subset of citizens to a larger GPS trajectory sample.

Download Full-text

A Systematic Review of Deep Learning Approaches to Educational Data Mining

Complexity ◽

10.1155/2019/1306039 ◽

2019 ◽

Vol 2019 ◽

pp. 1-22 ◽

Cited By ~ 15

Author(s):

Antonio Hernández-Blanco ◽

Boris Herrera-Flores ◽

David Tomás ◽

Borja Navarro-Colorado

Keyword(s):

Machine Learning ◽

Data Mining ◽

Deep Learning ◽

Language Processing ◽

Educational Data Mining ◽

Research Field ◽

Machine Learning Techniques ◽

Mining Machine ◽

Learning Approaches ◽

Learning Techniques

Educational Data Mining (EDM) is a research field that focuses on the application of data mining, machine learning, and statistical methods to detect patterns in large collections of educational data. Different machine learning techniques have been applied in this field over the years, but it has been recently that Deep Learning has gained increasing attention in the educational domain. Deep Learning is a machine learning method based on neural network architectures with multiple layers of processing units, which has been successfully applied to a broad set of problems in the areas of image recognition and natural language processing. This paper surveys the research carried out in Deep Learning techniques applied to EDM, from its origins to the present day. The main goals of this study are to identify the EDM tasks that have benefited from Deep Learning and those that are pending to be explored, to describe the main datasets used, to provide an overview of the key concepts, main architectures, and configurations of Deep Learning and its applications to EDM, and to discuss current state-of-the-art and future directions on this area of research.

Download Full-text

Classification of Operational and Financial Variables Affecting the Bullwhip Effect in Indian Sectors: A Machine Learning Approach

Recent Patents on Computer Science ◽

10.2174/2213275911666181012121059 ◽

2019 ◽

Vol 12 (3) ◽

pp. 171-179 ◽

Cited By ~ 6

Author(s):

Sachin Gupta ◽

Anurag Saxena

Keyword(s):

Machine Learning ◽

Data Mining ◽

Supply Chain ◽

Supply Chain Management ◽

Product Life Cycle ◽

Consumer Preference ◽

Bullwhip Effect ◽

Machine Learning Techniques ◽

Chain Management ◽

Financial Variables

Background: The increased variability in production or procurement with respect to less increase of variability in demand or sales is considered as bullwhip effect. Bullwhip effect is considered as an encumbrance in optimization of supply chain as it causes inadequacy in the supply chain. Various operations and supply chain management consultants, managers and researchers are doing a rigorous study to find the causes behind the dynamic nature of the supply chain management and have listed shorter product life cycle, change in technology, change in consumer preference and era of globalization, to name a few. Most of the literature that explored bullwhip effect is found to be based on simulations and mathematical models. Exploring bullwhip effect using machine learning is the novel approach of the present study. Methods: Present study explores the operational and financial variables affecting the bullwhip effect on the basis of secondary data. Data mining and machine learning techniques are used to explore the variables affecting bullwhip effect in Indian sectors. Rapid Miner tool has been used for data mining and 10-fold cross validation has been performed. Weka Alternating Decision Tree (w-ADT) has been built for decision makers to mitigate bullwhip effect after the classification. Results: Out of the 19 selected variables affecting bullwhip effect 7 variables have been selected which have highest accuracy level with minimum deviation. Conclusion: Classification technique using machine learning provides an effective tool and techniques to explore bullwhip effect in supply chain management.

Download Full-text

Data Mining-based Financial Statement Fraud Detection: Systematic Literature Review and Meta-analysis to Estimate Data Sample Mapping of Fraudulent Companies Against Non-fraudulent Companies

Global Business Review ◽

10.1177/0972150920984857 ◽

2021 ◽

pp. 097215092098485

Author(s):

Sonika Gupta ◽

Sushil Kumar Mehta

Keyword(s):

Machine Learning ◽

Data Mining ◽

Literature Review ◽

Systematic Literature Review ◽

Classification Accuracy ◽

Meta Analysis ◽

Financial Statement ◽

Research Articles ◽

Financial Statement Fraud ◽

Data Mining Techniques

Data mining techniques have proven quite effective not only in detecting financial statement frauds but also in discovering other financial crimes, such as credit card frauds, loan and security frauds, corporate frauds, bank and insurance frauds, etc. Classification of data mining techniques, in recent years, has been accepted as one of the most credible methodologies for the detection of symptoms of financial statement frauds through scanning the published financial statements of companies. The retrieved literature that has used data mining classification techniques can be broadly categorized on the basis of the type of technique applied, as statistical techniques and machine learning techniques. The biggest challenge in executing the classification process using data mining techniques lies in collecting the data sample of fraudulent companies and mapping the sample of fraudulent companies against non-fraudulent companies. In this article, a systematic literature review (SLR) of studies from the area of financial statement fraud detection has been conducted. The review has considered research articles published between 1995 and 2020. Further, a meta-analysis has been performed to establish the effect of data sample mapping of fraudulent companies against non-fraudulent companies on the classification methods through comparing the overall classification accuracy reported in the literature. The retrieved literature indicates that a fraudulent sample can either be equally paired with non-fraudulent sample (1:1 data mapping) or be unequally mapped using 1:many ratio to increase the sample size proportionally. Based on the meta-analysis of the research articles, it can be concluded that machine learning approaches, in comparison to statistical approaches, can achieve better classification accuracy, particularly when the availability of sample data is low. High classification accuracy can be obtained with even a 1:1 mapping data set using machine learning classification approaches.

Download Full-text

Leveraging Road Characteristics and Contributor Behaviour for Assessing Road Type Quality in OSM

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10070436 ◽

2021 ◽

Vol 10 (7) ◽

pp. 436

Author(s):

Amerah Alghanim ◽

Musfira Jilani ◽

Michela Bertolotto ◽

Gavin McArdle

Keyword(s):

Machine Learning ◽

Spatial Data ◽

Classification Accuracy ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Data Set ◽

Semantic Inference ◽

Road Type ◽

The Impact

Volunteered Geographic Information (VGI) is often collected by non-expert users. This raises concerns about the quality and veracity of such data. There has been much effort to understand and quantify the quality of VGI. Extrinsic measures which compare VGI to authoritative data sources such as National Mapping Agencies are common but the cost and slow update frequency of such data hinder the task. On the other hand, intrinsic measures which compare the data to heuristics or models built from the VGI data are becoming increasingly popular. Supervised machine learning techniques are particularly suitable for intrinsic measures of quality where they can infer and predict the properties of spatial data. In this article we are interested in assessing the quality of semantic information, such as the road type, associated with data in OpenStreetMap (OSM). We have developed a machine learning approach which utilises new intrinsic input features collected from the VGI dataset. Specifically, using our proposed novel approach we obtained an average classification accuracy of 84.12%. This result outperforms existing techniques on the same semantic inference task. The trustworthiness of the data used for developing and training machine learning models is important. To address this issue we have also developed a new measure for this using direct and indirect characteristics of OSM data such as its edit history along with an assessment of the users who contributed the data. An evaluation of the impact of data determined to be trustworthy within the machine learning model shows that the trusted data collected with the new approach improves the prediction accuracy of our machine learning technique. Specifically, our results demonstrate that the classification accuracy of our developed model is 87.75% when applied to a trusted dataset and 57.98% when applied to an untrusted dataset. Consequently, such results can be used to assess the quality of OSM and suggest improvements to the data set.

Download Full-text

A data mining approach based on machine learning techniques to classify biological sequences

Knowledge-Based Systems ◽

10.1016/s0950-7051(01)00143-5 ◽

2002 ◽

Vol 15 (4) ◽

pp. 217-223 ◽

Cited By ~ 15

Author(s):

M. Maddouri ◽

M. Elloumi

Keyword(s):

Machine Learning ◽

Data Mining ◽

Machine Learning Techniques ◽

Biological Sequences ◽

Data Mining Approach ◽

Learning Techniques

Download Full-text

Data Mining and Machine Learning Techniques for Bank Customers Segmentation: A Systematic Mapping Study

Advances in Intelligent Systems and Computing - Intelligent Systems and Applications ◽

10.1007/978-3-030-55187-2_48 ◽

2020 ◽

pp. 666-684

Author(s):

Maricel Monge ◽

Christian Quesada-López ◽

Alexandra Martínez ◽

Marcelo Jenkins

Keyword(s):

Machine Learning ◽

Data Mining ◽

Machine Learning Techniques ◽

Systematic Mapping Study ◽

Mapping Study ◽

Systematic Mapping ◽

Learning Techniques ◽

Customers Segmentation

Download Full-text

Pose Invariant Hand Gesture Recognition using Two Stream Transfer Learning Architecture

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.f9058.109119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 1771-1777

Keyword(s):

Machine Learning ◽

Transfer Learning ◽

Gesture Recognition ◽

Classification Accuracy ◽

Decision Fusion ◽

Hand Gesture Recognition ◽

Machine Learning Techniques ◽

Hand Gesture ◽

Language Recognition ◽

Sign Language Recognition

The hand gesture detection problem is one of the most prominent problems in machine learning and computer vision applications. Many machine learning techniques have been employed to solve the hand gesture recognition. These techniques find applications in sign language recognition, virtual reality, human machine interaction, autonomous vehicles, driver assistive systems etc. In this paper, the goal is to design a system to correctly identify hand gestures from a dataset of hundreds of hand gesture images. In order to incorporate this, decision fusion based system using the transfer learning architectures is proposed to achieve the said task. Two pretrained models namely ‘MobileNet’ and ‘Inception V3’ are used for this purpose. To find the region of interest (ROI) in the image, YOLO (You Only Look Once) architecture is used which also decides the type of model. Edge map images and the spatial images are trained using two separate versions of the MobileNet based transfer learning architecture and then the final probabilities are combined to decide upon the hand sign of the image. The simulation results using classification accuracy indicate the superiority of the approach of this paper against the already researched approaches using different quantitative techniques such as classification accuracy.

Download Full-text

Systematic Methods on Machine Learning Techniques for Clinical Predictive Modelling

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.e2138.039520 ◽

2020 ◽

Vol 9 (5) ◽

pp. 288-297

Keyword(s):

Machine Learning ◽

Data Mining ◽

High Performance ◽

Predictive Modelling ◽

Medical Application ◽

Medical Data ◽

Machine Learning Techniques ◽

Screening Process ◽

Huge Data ◽

System Data

Predictive modelling is a mathematical technique which uses Statistics for prediction, due to the rapid growth of data over the cloud system, data mining plays a significant role. Here, the term data mining is a way of extracting knowledge from huge data sources where it’s increasing the attention in the field of medical application. Specifically, to analyse and extract the knowledge from both known and unknown patterns for effective medical diagnosis, treatment, management, prognosis, monitoring and screening process. But the historical medical data might include noisy, missing, inconsistent, imbalanced and high dimensional data.. This kind of data inconvenience lead to severe bias in predictive modelling and decreased the data mining approach performances. The various pre-processing and machine learning methods and models such as Supervised Learning, Unsupervised Learning and Reinforcement Learning in recent literature has been proposed. Hence the present research focuses on review and analyses the various model, algorithm and machine learning technique for clinical predictive modelling to obtain high performance results from numerous medical data which relates to the patients of multiple diseases.

Download Full-text

Predicting Student Failure in University Examination using Machine Learning Algorithms

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.e2643.039520 ◽

2020 ◽

Vol 9 (5) ◽

pp. 956-959

Keyword(s):

Machine Learning ◽

Data Mining ◽

Performance Management ◽

Student Performance ◽

Learning Algorithms ◽

Educational Data Mining ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Social Characteristics ◽

Student Failure

Student Performance Management is one of the key pillars of the higher education institutions since it directly impacts the student’s career prospects and college rankings. This paper follows the path of learning analytics and educational data mining by applying machine learning techniques in student data for identifying students who are at the more likely to fail in the university examinations and thus providing needed interventions for improved student performance. The Paper uses data mining approach with 10 fold cross validation to classify students based on predictors which are demographic and social characteristics of the students. This paper compares five popular machine learning algorithms Rep Tree, Jrip, Random Forest, Random Tree, Naive Bayes algorithms based on overall classifier accuracy as well as other class specific indicators i.e. precision, recall, f-measure. Results proved that Rep tree algorithm outperformed other machine learning algorithms in classifying students who are at more likely to fail in the examinations.

Download Full-text

Dr. Phish: Phishing Website Detector

E3S Web of Conferences ◽

10.1051/e3sconf/202129701032 ◽

2021 ◽

Vol 297 ◽

pp. 01032

Author(s):

Harish Kumar ◽

Anshal Prasad ◽

Ninad Rane ◽

Nilay Tamane ◽

Anjali Yeole

Keyword(s):

Machine Learning ◽

Data Mining ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Cyber Crime ◽

Data Mining Algorithms ◽

Learning Techniques ◽

Mining Algorithms ◽

Host Properties ◽

New Strategies

Phishing is a common attack on credulous people by making them disclose their unique information. It is a type of cyber-crime where false sites allure exploited people to give delicate data. This paper deals with methods for detecting phishing websites by analyzing various features of URLs by Machine learning techniques. This experimentation discusses the methods used for detection of phishing websites based on lexical features, host properties and page importance properties. We consider various data mining algorithms for evaluation of the features in order to get a better understanding of the structure of URLs that spread phishing. To protect end users from visiting these sites, we can try to identify the phishing URLs by analyzing their lexical and host-based features.A particular challenge in this domain is that criminals are constantly making new strategies to counter our defense measures. To succeed in this contest, we need Machine Learning algorithms that continually adapt to new examples and features of phishing URLs.

Download Full-text