Method Based on Floating Car Data and Gradient-Boosted Decision Tree Classification for the Detection of Auxiliary Through Lanes at Intersections

Xiaolong Li; Yuzhen Wu; Yongbin Tan; Penggen Cheng; Jing Wu; Yuqian Wang

doi:10.3390/ijgi7080317

Method Based on Floating Car Data and Gradient-Boosted Decision Tree Classification for the Detection of Auxiliary Through Lanes at Intersections

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi7080317 ◽

2018 ◽

Vol 7 (8) ◽

pp. 317

Author(s):

Xiaolong Li ◽

Yuzhen Wu ◽

Yongbin Tan ◽

Penggen Cheng ◽

Jing Wu ◽

...

Keyword(s):

Decision Tree ◽

Detection Methods ◽

Tree Model ◽

Lane Changes ◽

The Road ◽

Navigation Data ◽

Floating Car Data ◽

Redundant Data ◽

Boosted Decision Tree ◽

Using Data

The rapid detection of information on continuously changing intersection auxiliary through lane is a major task of lane-level navigation data updates. However, existing lane number detection methods possess long update cycles and high computational costs. Therefore, this study proposes a novel method based on floating car data (FCD) for the detection of auxiliary through lane changes at road intersections. First, roads near intersections are divided into three sections and the spatial distribution characteristics of the FCD of each section are analyzed. Second, the FCD is preprocessed to obtain a standardized FCD dataset by removing redundant data through an improved amplitude-limiting average filtering method. Third, a basic classifier for the number of lanes is constructed. Fourth, the final number of lanes of the road section is determined by combining the basic classifier and the gradient-boosted decision tree model. Finally, the presence of an auxiliary through lane at the intersection is determined in accordance with the change in the number of intersection lanes. The method was tested using data for a road in Wuchang District, Wuhan City. Experimental results show that this method can rapidly obtain auxiliary through lane information from the FCD and is superior to other classification methods.

Download Full-text

Spam Mail Filtering Using Data Mining Approach

Handling Priority Inversion in Time-Constrained Distributed Databases - Advances in Data Mining and Database Management ◽

10.4018/978-1-7998-2491-6.ch015 ◽

2020 ◽

pp. 253-282 ◽

Cited By ~ 3

Author(s):

Ajay Kumar Gupta

Keyword(s):

Decision Tree ◽

Classification Accuracy ◽

Time Complexity ◽

Identification Accuracy ◽

Tree Model ◽

Swarm Optimization ◽

Spam Filter ◽

Data Mining Approach ◽

Lower Complexity ◽

Using Data

This chapter presents an overview of spam email as a serious problem in our internet world and creates a spam filter that reduces the previous weaknesses and provides better identification accuracy with less complexity. Since J48 decision tree is a widely used classification technique due to its simple structure, higher classification accuracy, and lower time complexity, it is used as a spam mail classifier here. Now, with lower complexity, it becomes difficult to get higher accuracy in the case of large number of records. In order to overcome this problem, particle swarm optimization is used here to optimize the spam base dataset, thus optimizing the decision tree model as well as reducing the time complexity. Once the records have been standardized, the decision tree is again used to check the accuracy of the classification. The chapter presents a study on various spam-related issues, various filters used, related work, and potential spam-filtering scope.

Download Full-text

Classification of algal bloom species from remote sensing data using an extreme gradient boosted decision tree model

International Journal of Remote Sensing ◽

10.1080/01431161.2019.1633696 ◽

2019 ◽

Vol 40 (24) ◽

pp. 9412-9438 ◽

Cited By ~ 9

Author(s):

Jayesh Ganpat Ghatkar ◽

Rakesh Kumar Singh ◽

Palanisamy Shanmugam

Keyword(s):

Remote Sensing ◽

Decision Tree ◽

Algal Bloom ◽

Remote Sensing Data ◽

Decision Tree Model ◽

Tree Model ◽

Sensing Data ◽

Boosted Decision Tree

Download Full-text

Penerapan Algoritma C4.5 Dalam Memprediksi Ketersediaan Uang Pada Mesin ATM

JURNAL MEDIA INFORMATIKA BUDIDARMA ◽

10.30865/mib.v5i2.2933 ◽

2021 ◽

Vol 5 (2) ◽

pp. 556

Author(s):

Firman Syahputra ◽

Hartono Hartono ◽

Rika Rosnelly

Keyword(s):

Data Mining ◽

Decision Tree ◽

Travel Time ◽

Tree Model ◽

Optimal Service ◽

C4.5 Algorithm ◽

Cash Transaction ◽

Auc Value ◽

Using Data ◽

Balance Variable

This study aims to provide an evaluation of the availability of money in ATM machines using data mining. Data mining with the C4.5 algorithm is used to predict cash demand or total cash withdrawals at ATMs. To determine the need for ATM cash based on cash transaction data. It is hoped that this forecasting can help the monitoring department in making decisions about the money requirements that must be allocated to each ATM machine. The results of this study are expected to assist the ATM management unit in optimizing and monitoring the availability of money at an ATM machine for cash needs, so that it can provide optimal service to customers. Algortima C4.5 is an algorithm that is able to form a decision tree, where the decision tree will then generate new knowledge. The results of the test matched the data on the availability of money at the ATM machine. The results of implementing the C4.5 method on the availability of money at the ATM machine are seen from the travel time to the ATM location and also the remaining balance in the machine. The resulting decision tree model is to make the balance variable as the root, then the travel time as a branch at Level 1 with the variables fast, medium, long, and the bank becomes a branch at the last level (Level 2). Then the C4.5 algorithm was tested using the K-Fold Cross validation method with the value of fold = 10, it can be seen that the accuracy rate is 85%, the Precision value is 80% and the Recall value is 66.67%. While the AUC (Area Under Curve) value is 0.833, this shows that if the AUC value approaches the value 1, the accuracy level is getting better

Download Full-text

A Boosted Decision Tree Model for Predicting Loan Default in P2P Lending Communities

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a9626.109119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 1257-1261

Keyword(s):

Small Business ◽

Decision Tree ◽

Decision Tree Classifier ◽

Tree Model ◽

Loan Default ◽

Accuracy Profile ◽

Default Prediction ◽

Tree Classifier ◽

Social Lending ◽

Boosted Decision Tree

Loan Default Prediction For Social Lending Is An Emerging Area Of Research In Predictive Analytics. The Need For Large Amount Of Data And Few Available Studies In The Current Loan Default Prediction Models For Social Lending Suggest That Other Viable And Easily Implementable Models Should Be Investigated And Developed. In View Of This, This Study Developed A Data Mining Model For Predicting Loan Default Among Social Lending Patrons, Specifically The Small Business Owners, Using Boosted Decision Tree Model. The United States Small Business Administration (Usba) PubliclyAvailable Loan Administration Dataset Of 27 Features And 899164 Data Instances Was Used In 80:20 Ratios For The Training And Testing Of The Model. 16 Data Features Were Finally Used As Predictors After Data Cleaning And Feature Engineering. The Gradient Boosting Decision Tree Classifier Recorded 99% Accuracy Compared To The Basic Decision Tree Classifier Of 98%. The Model Is Further Evaluated With (A) Receiver Operating Characteristics (Roc) And Area Under Curve (Auc), (B) Cumulative Accuracy Profile (Cap), And (C) Cumulative Accuracy Profile (Cap) Under Auc. Each Of These Model Performance Evaluation Metrics, Especially Roc-Auc, Showed The Relationship Between The True Positives And False Positives That Implies The Model Is A Good Fit.

Download Full-text

An Educational Data Mining Application by Using Multiple Intelligences

Examining Multiple Intelligences and Digital Technologies for Enhanced Learning Opportunities - Advances in Educational Technologies and Instructional Design ◽

10.4018/978-1-7998-0249-5.ch005 ◽

2020 ◽

pp. 93-110

Author(s):

Esra Aksoy ◽

Serkan Narli ◽

Mehmet Akif Aksoy

Keyword(s):

Data Mining ◽

Decision Tree ◽

Learning Styles ◽

Multiple Intelligences ◽

Gifted Students ◽

Personality Types ◽

Decision Tree Model ◽

Tree Model ◽

Educational Domain ◽

Using Data

The aim of this chapter is to illustrate both uses of data mining methods and the way of these methods can be applied in education by using students' multiple intelligences. Data mining is a data analysis methodology that has been successfully used in different areas including the educational domain. In this context, in this study, an application of EDM will be illustrated by using multiple intelligence and some other variables (e.g., learning styles and personality types). The decision tree model was implemented using students' learning styles, multiple intelligences, and personality types to identify gifted students. The sample size was 735 middle school students. The constructed decision tree model with 70% validity revealed that examination of mathematically gifted students using data mining techniques may be possible if specific characteristics are included.

Download Full-text

TNT: An Interpretable Tree-Network-Tree Learning Framework using Knowledge Distillation

Entropy ◽

10.3390/e22111203 ◽

2020 ◽

Vol 22 (11) ◽

pp. 1203

Author(s):

Jiawei Li ◽

Yiming Li ◽

Xingchun Xiang ◽

Shu-Tao Xia ◽

Siyi Dong ◽

...

Keyword(s):

Decision Making ◽

Decision Tree ◽

Low Frequency ◽

Test Case ◽

Tree Model ◽

Tree Network ◽

High Performing ◽

Learning Framework ◽

Boosted Decision Tree ◽

Network Tree

Deep Neural Networks (DNNs) usually work in an end-to-end manner. This makes the trained DNNs easy to use, but they remain an ambiguous decision process for every test case. Unfortunately, the interpretability of decisions is crucial in some scenarios, such as medical or financial data mining and decision-making. In this paper, we propose a Tree-Network-Tree (TNT) learning framework for explainable decision-making, where the knowledge is alternately transferred between the tree model and DNNs. Specifically, the proposed TNT learning framework exerts the advantages of different models at different stages: (1) a novel James–Stein Decision Tree (JSDT) is proposed to generate better knowledge representations for DNNs, especially when the input data are in low-frequency or low-quality; (2) the DNNs output high-performing prediction result from the knowledge embedding inputs and behave as a teacher model for the following tree model; and (3) a novel distillable Gradient Boosted Decision Tree (dGBDT) is proposed to learn interpretable trees from the soft labels and make a comparable prediction as DNNs do. Extensive experiments on various machine learning tasks demonstrated the effectiveness of the proposed method.

Download Full-text

A Decision Tree Model to Analyze the Characteristics of the Elderly with ADL Limitation Using Data Mining

Convergence and Hybrid Information Technology - Lecture Notes in Computer Science ◽

10.1007/978-3-642-32645-5_64 ◽

2012 ◽

pp. 508-515

Author(s):

Myonghwa Park ◽

Sungjin Kim

Keyword(s):

Data Mining ◽

Decision Tree ◽

The Elderly ◽

Decision Tree Model ◽

Tree Model ◽

Using Data

Download Full-text

Analysis and Monitoring of the Traffic Suburban Road Accidents Using Data Mining Techniques; A Case Study of Isfahan Province in Iran

The Open Transportation Journal ◽

10.2174/1874447801408010039 ◽

2014 ◽

Vol 8 (1) ◽

pp. 39-49 ◽

Cited By ~ 3

Author(s):

Mehdi Mansouri ◽

Mohammad Javad Kargar

Keyword(s):

Data Mining ◽

Decision Tree ◽

Developed Countries ◽

Road Accidents ◽

Local Data ◽

The Road ◽

Isfahan Province ◽

Using Data ◽

The Developed Countries ◽

Causes Of Deaths

Driving accidents have been always counted as one of the most ostensible causes of deaths in the societies today. Statistics and reports indicate that the road accidents in Iran rank several times more than the ones in the developed countries. In the current paper, the rules and factors influencing the traffic road accidents of Iran have been extracted along with extracting a local data model after collecting the data from a variety of sources followed by data aggregation and combination, data cleaning, and separating the inappropriate data. This was done by employing appropriate data mining methods, such as clustering and decision tree. The utilized data was based on 10000 accidents during 2011 to 2013 in Isfahan Province, Iran. The experimental results have revealed that of the Decision Tree approaches, C5.0 algorithm outperforms the other algorithms with a lower error rate and a higher accuracy rate. Our research analysis also shows that in determining the accident type, three most important attributes include the type of the faulty vehicle, type of the vehicle hit, and the accident reason. The results and findings obtained in this study are significant and interesting which can provide the authorities with invaluable information on reducing the road accidents.

Download Full-text

Mathematical modelling of evapotranspiration by using remote sensing and data mining

10.21203/rs.3.rs-174771/v1 ◽

2021 ◽

Author(s):

Lamya Neissi ◽

Mona Golabi ◽

Mohammad Albaji ◽

Abd Ali Naseri

Keyword(s):

Remote Sensing ◽

Data Mining ◽

Decision Tree ◽

Specific Area ◽

Decision Tree Model ◽

Tree Model ◽

Water Index ◽

Model Builder ◽

Using Data ◽

Extended Area

Abstract Precise evaluation of evapotranspiration in an extended area is crucial for water requirement. By using remote sensing evapotranspiration algorithms, many climatological variables are needed. In case of using climatological variable measurements, many climatic stations must be established in that specific area. By using data mining method integrated with remote sensing, evapotranspiration can be calculated with high accuracy. A physical-based SEBAL evapotranspiration algorithm was modeled by GIS model builder for ET calculations. Albedo, emissivity, and Normalized Difference Water Index (NDWI) were considered as M5 decision tree model inputs. Evapotranspiration was evaluated for 3 April 2020 to 17 September 2020 and the equations were extracted in the M5 decision tree model and these equations were modeled in GIS by using python scripts for 3 April 2020 to 17 September 2020. The results make clear that the mathematical decision tree model can estimate the evapotranspiration gained by physical-based SEBAL algorithm in high accurately.

Download Full-text

A Road Mishaps Analysis using Decision Tree and Random Forest Algorithms

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.f4161.049620 ◽

2020 ◽

Vol 9 (4) ◽

pp. 2067-2069

Keyword(s):

Random Forest ◽

Decision Tree ◽

Personal Information ◽

Machine Learning Algorithms ◽

Road Accidents ◽

Training Set ◽

Tree Model ◽

Major Drawback ◽

Middle Income ◽

The Road

AI (ML) is the investigation of calculations and factual models that PC frameworks use to play out a particular activity without utilizing guidelines and depending on designs. It is communicated as subset of man-made brainpower. In this, the sample data is split into test set and the training set. Major drawback for the deaths in world is recorded by the road accidents. Most of the deaths are occurred in the middle-income countries. These studies result in finding the major factors for road accidents using decision tree and random forests. Decision tree is a choice help device that is a like a tree model which contains just control explanations. Random forest corrects the decision tree for overfitting to their training set. In this, the decision tree and the random forest algorithms are used to find the severity and the factors for the road-accidents using driver’s personal information. Results conclude that the possibilities for the road accidents using the machine learning algorithms.

Download Full-text