scholarly journals Method Based on Floating Car Data and Gradient-Boosted Decision Tree Classification for the Detection of Auxiliary Through Lanes at Intersections

2018 ◽  
Vol 7 (8) ◽  
pp. 317
Author(s):  
Xiaolong Li ◽  
Yuzhen Wu ◽  
Yongbin Tan ◽  
Penggen Cheng ◽  
Jing Wu ◽  
...  

The rapid detection of information on continuously changing intersection auxiliary through lane is a major task of lane-level navigation data updates. However, existing lane number detection methods possess long update cycles and high computational costs. Therefore, this study proposes a novel method based on floating car data (FCD) for the detection of auxiliary through lane changes at road intersections. First, roads near intersections are divided into three sections and the spatial distribution characteristics of the FCD of each section are analyzed. Second, the FCD is preprocessed to obtain a standardized FCD dataset by removing redundant data through an improved amplitude-limiting average filtering method. Third, a basic classifier for the number of lanes is constructed. Fourth, the final number of lanes of the road section is determined by combining the basic classifier and the gradient-boosted decision tree model. Finally, the presence of an auxiliary through lane at the intersection is determined in accordance with the change in the number of intersection lanes. The method was tested using data for a road in Wuchang District, Wuhan City. Experimental results show that this method can rapidly obtain auxiliary through lane information from the FCD and is superior to other classification methods.

Author(s):  
Ajay Kumar Gupta

This chapter presents an overview of spam email as a serious problem in our internet world and creates a spam filter that reduces the previous weaknesses and provides better identification accuracy with less complexity. Since J48 decision tree is a widely used classification technique due to its simple structure, higher classification accuracy, and lower time complexity, it is used as a spam mail classifier here. Now, with lower complexity, it becomes difficult to get higher accuracy in the case of large number of records. In order to overcome this problem, particle swarm optimization is used here to optimize the spam base dataset, thus optimizing the decision tree model as well as reducing the time complexity. Once the records have been standardized, the decision tree is again used to check the accuracy of the classification. The chapter presents a study on various spam-related issues, various filters used, related work, and potential spam-filtering scope.


2021 ◽  
Vol 5 (2) ◽  
pp. 556
Author(s):  
Firman Syahputra ◽  
Hartono Hartono ◽  
Rika Rosnelly

This study aims to provide an evaluation of the availability of money in ATM machines using data mining. Data mining with the C4.5 algorithm is used to predict cash demand or total cash withdrawals at ATMs. To determine the need for ATM cash based on cash transaction data. It is hoped that this forecasting can help the monitoring department in making decisions about the money requirements that must be allocated to each ATM machine. The results of this study are expected to assist the ATM management unit in optimizing and monitoring the availability of money at an ATM machine for cash needs, so that it can provide optimal service to customers. Algortima C4.5 is an algorithm that is able to form a decision tree, where the decision tree will then generate new knowledge. The results of the test matched the data on the availability of money at the ATM machine. The results of implementing the C4.5 method on the availability of money at the ATM machine are seen from the travel time to the ATM location and also the remaining balance in the machine. The resulting decision tree model is to make the balance variable as the root, then the travel time as a branch at Level 1 with the variables fast, medium, long, and the bank becomes a branch at the last level (Level 2). Then the C4.5 algorithm was tested using the K-Fold Cross validation method with the value of fold = 10, it can be seen that the accuracy rate is 85%, the Precision value is 80% and the Recall value is 66.67%. While the AUC (Area Under Curve) value is 0.833, this shows that if the AUC value approaches the value 1, the accuracy level is getting better


Loan Default Prediction For Social Lending Is An Emerging Area Of Research In Predictive Analytics. The Need For Large Amount Of Data And Few Available Studies In The Current Loan Default Prediction Models For Social Lending Suggest That Other Viable And Easily Implementable Models Should Be Investigated And Developed. In View Of This, This Study Developed A Data Mining Model For Predicting Loan Default Among Social Lending Patrons, Specifically The Small Business Owners, Using Boosted Decision Tree Model. The United States Small Business Administration (Usba) PubliclyAvailable Loan Administration Dataset Of 27 Features And 899164 Data Instances Was Used In 80:20 Ratios For The Training And Testing Of The Model. 16 Data Features Were Finally Used As Predictors After Data Cleaning And Feature Engineering. The Gradient Boosting Decision Tree Classifier Recorded 99% Accuracy Compared To The Basic Decision Tree Classifier Of 98%. The Model Is Further Evaluated With (A) Receiver Operating Characteristics (Roc) And Area Under Curve (Auc), (B) Cumulative Accuracy Profile (Cap), And (C) Cumulative Accuracy Profile (Cap) Under Auc. Each Of These Model Performance Evaluation Metrics, Especially Roc-Auc, Showed The Relationship Between The True Positives And False Positives That Implies The Model Is A Good Fit.


Author(s):  
Esra Aksoy ◽  
Serkan Narli ◽  
Mehmet Akif Aksoy

The aim of this chapter is to illustrate both uses of data mining methods and the way of these methods can be applied in education by using students' multiple intelligences. Data mining is a data analysis methodology that has been successfully used in different areas including the educational domain. In this context, in this study, an application of EDM will be illustrated by using multiple intelligence and some other variables (e.g., learning styles and personality types). The decision tree model was implemented using students' learning styles, multiple intelligences, and personality types to identify gifted students. The sample size was 735 middle school students. The constructed decision tree model with 70% validity revealed that examination of mathematically gifted students using data mining techniques may be possible if specific characteristics are included.


Entropy ◽  
2020 ◽  
Vol 22 (11) ◽  
pp. 1203
Author(s):  
Jiawei Li ◽  
Yiming Li ◽  
Xingchun Xiang ◽  
Shu-Tao Xia ◽  
Siyi Dong ◽  
...  

Deep Neural Networks (DNNs) usually work in an end-to-end manner. This makes the trained DNNs easy to use, but they remain an ambiguous decision process for every test case. Unfortunately, the interpretability of decisions is crucial in some scenarios, such as medical or financial data mining and decision-making. In this paper, we propose a Tree-Network-Tree (TNT) learning framework for explainable decision-making, where the knowledge is alternately transferred between the tree model and DNNs. Specifically, the proposed TNT learning framework exerts the advantages of different models at different stages: (1) a novel James–Stein Decision Tree (JSDT) is proposed to generate better knowledge representations for DNNs, especially when the input data are in low-frequency or low-quality; (2) the DNNs output high-performing prediction result from the knowledge embedding inputs and behave as a teacher model for the following tree model; and (3) a novel distillable Gradient Boosted Decision Tree (dGBDT) is proposed to learn interpretable trees from the soft labels and make a comparable prediction as DNNs do. Extensive experiments on various machine learning tasks demonstrated the effectiveness of the proposed method.


2014 ◽  
Vol 8 (1) ◽  
pp. 39-49 ◽  
Author(s):  
Mehdi Mansouri ◽  
Mohammad Javad Kargar

Driving accidents have been always counted as one of the most ostensible causes of deaths in the societies today. Statistics and reports indicate that the road accidents in Iran rank several times more than the ones in the developed countries. In the current paper, the rules and factors influencing the traffic road accidents of Iran have been extracted along with extracting a local data model after collecting the data from a variety of sources followed by data aggregation and combination, data cleaning, and separating the inappropriate data. This was done by employing appropriate data mining methods, such as clustering and decision tree. The utilized data was based on 10000 accidents during 2011 to 2013 in Isfahan Province, Iran. The experimental results have revealed that of the Decision Tree approaches, C5.0 algorithm outperforms the other algorithms with a lower error rate and a higher accuracy rate. Our research analysis also shows that in determining the accident type, three most important attributes include the type of the faulty vehicle, type of the vehicle hit, and the accident reason. The results and findings obtained in this study are significant and interesting which can provide the authorities with invaluable information on reducing the road accidents.


2021 ◽  
Author(s):  
Lamya Neissi ◽  
Mona Golabi ◽  
Mohammad Albaji ◽  
Abd Ali Naseri

Abstract Precise evaluation of evapotranspiration in an extended area is crucial for water requirement. By using remote sensing evapotranspiration algorithms, many climatological variables are needed. In case of using climatological variable measurements, many climatic stations must be established in that specific area. By using data mining method integrated with remote sensing, evapotranspiration can be calculated with high accuracy. A physical-based SEBAL evapotranspiration algorithm was modeled by GIS model builder for ET calculations. Albedo, emissivity, and Normalized Difference Water Index (NDWI) were considered as M5 decision tree model inputs. Evapotranspiration was evaluated for 3 April 2020 to 17 September 2020 and the equations were extracted in the M5 decision tree model and these equations were modeled in GIS by using python scripts for 3 April 2020 to 17 September 2020. The results make clear that the mathematical decision tree model can estimate the evapotranspiration gained by physical-based SEBAL algorithm in high accurately.


AI (ML) is the investigation of calculations and factual models that PC frameworks use to play out a particular activity without utilizing guidelines and depending on designs. It is communicated as subset of man-made brainpower. In this, the sample data is split into test set and the training set. Major drawback for the deaths in world is recorded by the road accidents. Most of the deaths are occurred in the middle-income countries. These studies result in finding the major factors for road accidents using decision tree and random forests. Decision tree is a choice help device that is a like a tree model which contains just control explanations. Random forest corrects the decision tree for overfitting to their training set. In this, the decision tree and the random forest algorithms are used to find the severity and the factors for the road-accidents using driver’s personal information. Results conclude that the possibilities for the road accidents using the machine learning algorithms.


Sign in / Sign up

Export Citation Format

Share Document