scholarly journals Machine Learning Based Short-Term Travel Time Prediction: Numerical Results and Comparative Analyses

2021 ◽  
Vol 13 (13) ◽  
pp. 7454
Author(s):  
Bo Qiu ◽  
Wei (David) Fan

Due to the increasing traffic volume in metropolitan areas, short-term travel time prediction (TTP) can be an important and useful tool for both travelers and traffic management. Accurate and reliable short-term travel time prediction can greatly help vehicle routing and congestion mitigation. One of the most challenging tasks in TTP is developing and selecting the most appropriate prediction algorithm using the available data. In this study, the travel time data was provided and collected from the Regional Integrated Transportation Information System (RITIS). Then, the travel times were predicted for short horizons (ranging from 15 to 60 min) on the selected freeway corridors by applying four different machine learning algorithms, which are Decision Trees (DT), Random Forest (RF), Extreme Gradient Boosting (XGBoost), and Long Short-Term Memory neural network (LSTM). Many spatial and temporal characteristics that may affect travel time were used when developing the models. The performance of prediction accuracy and reliability are compared. Numerical results suggest that RF can achieve a better prediction performance result than any of the other methods not only in accuracy but also with stability.

Logistics ◽  
2019 ◽  
Vol 4 (1) ◽  
pp. 1 ◽  
Author(s):  
Nikolaos Servos ◽  
Xiaodi Liu ◽  
Michael Teucke ◽  
Michael Freitag

Accurate travel time prediction is of high value for freight transports, as it allows supply chain participants to increase their logistics quality and efficiency. It requires both sufficient input data, which can be generated, e.g., by mobile sensors, and adequate prediction methods. Machine Learning (ML) algorithms are well suited to solve non-linear and complex relationships in the collected tracking data. Despite that, only a minority of recent publications use ML for travel time prediction in multimodal transports. We apply the ML algorithms extremely randomized trees (ExtraTrees), adaptive boosting (AdaBoost), and support vector regression (SVR) to this problem because of their ability to deal with low data volumes and their low processing times. Using different combinations of features derived from the data, we have built several models for travel time prediction. Tracking data from a real-world multimodal container transport relation from Germany to the USA are used for evaluation of the established models. We show that SVR provides the best prediction accuracy, with a mean absolute error of 17 h for a transport time of up to 30 days. We also show that our model performs better than average-based approaches.


2020 ◽  
Author(s):  
Homa Taghipour ◽  
Amir Bahador Parsa ◽  
Abolfazl Mohammadian

Having access to accurate travel time is of great importance for both highway network users and traffic engineers. The travel time which is currently reported on several highways is estimated by employing naïve methods and using limited sources of data. This results in unreliable and inaccurate travel time prediction and could impose delay on travelers. Therefore, the main objective of this study is short-term prediction of travel time for highways using multiple data sources including loop detectors, probe vehicles, weather condition, network, accidents, road works, and special events in order to consider the effect of different factors on travel time. To this end, two machine learning methods, K-Nearest Neighbors and Random Forest, are employed. After applying data cleaning process on datasets and combining them, the models are trained to predict and compare short-term harmonic average speed as a representative of travel time for 5-minute prediction horizons in one hour ahead. The travel time is calculated as the ratio of the length of each link and the harmonic average speed for all reporting vehicles. Hence, a model is trained for each technique to predict travel time 5 minutes ahead, 10 minutes ahead, and all the way down to 60 minutes ahead. The results confirm satisfying performance of both models in short-term travel time prediction with slightly outperformance of Random Forest model. A feature importance and sensitivity analysis also applied for the Random Forest model, and traffic variables are found as the most effective variables in predicting the travel time.


2019 ◽  
Author(s):  
Kasper Van Mens ◽  
Joran Lokkerbol ◽  
Richard Janssen ◽  
Robert de Lange ◽  
Bea Tiemens

BACKGROUND It remains a challenge to predict which treatment will work for which patient in mental healthcare. OBJECTIVE In this study we compare machine algorithms to predict during treatment which patients will not benefit from brief mental health treatment and present trade-offs that must be considered before an algorithm can be used in clinical practice. METHODS Using an anonymized dataset containing routine outcome monitoring data from a mental healthcare organization in the Netherlands (n = 2,655), we applied three machine learning algorithms to predict treatment outcome. The algorithms were internally validated with cross-validation on a training sample (n = 1,860) and externally validated on an unseen test sample (n = 795). RESULTS The performance of the three algorithms did not significantly differ on the test set. With a default classification cut-off at 0.5 predicted probability, the extreme gradient boosting algorithm showed the highest positive predictive value (ppv) of 0.71(0.61 – 0.77) with a sensitivity of 0.35 (0.29 – 0.41) and area under the curve of 0.78. A trade-off can be made between ppv and sensitivity by choosing different cut-off probabilities. With a cut-off at 0.63, the ppv increased to 0.87 and the sensitivity dropped to 0.17. With a cut-off of at 0.38, the ppv decreased to 0.61 and the sensitivity increased to 0.57. CONCLUSIONS Machine learning can be used to predict treatment outcomes based on routine monitoring data.This allows practitioners to choose their own trade-off between being selective and more certain versus inclusive and less certain.


Author(s):  
Harsha A K

Abstract: Since the advent of encryption, there has been a steady increase in malware being transmitted over encrypted networks. Traditional approaches to detect malware like packet content analysis are inefficient in dealing with encrypted data. In the absence of actual packet contents, we can make use of other features like packet size, arrival time, source and destination addresses and other such metadata to detect malware. Such information can be used to train machine learning classifiers in order to classify malicious and benign packets. In this paper, we offer an efficient malware detection approach using classification algorithms in machine learning such as support vector machine, random forest and extreme gradient boosting. We employ an extensive feature selection process to reduce the dimensionality of the chosen dataset. The dataset is then split into training and testing sets. Machine learning algorithms are trained using the training set. These models are then evaluated against the testing set in order to assess their respective performances. We further attempt to tune the hyper parameters of the algorithms, in order to achieve better results. Random forest and extreme gradient boosting algorithms performed exceptionally well in our experiments, resulting in area under the curve values of 0.9928 and 0.9998 respectively. Our work demonstrates that malware traffic can be effectively classified using conventional machine learning algorithms and also shows the importance of dimensionality reduction in such classification problems. Keywords: Malware Detection, Extreme Gradient Boosting, Random Forest, Feature Selection.


2021 ◽  
pp. 1-29
Author(s):  
Fikrewold H. Bitew ◽  
Corey S. Sparks ◽  
Samuel H. Nyarko

Abstract Objective: Child undernutrition is a global public health problem with serious implications. In this study, estimate predictive algorithms for the determinants of childhood stunting by using various machine learning (ML) algorithms. Design: This study draws on data from the Ethiopian Demographic and Health Survey of 2016. Five machine learning algorithms including eXtreme gradient boosting (xgbTree), k-nearest neighbors (K-NN), random forest (RF), neural network (NNet), and the generalized linear models (GLM) were considered to predict the socio-demographic risk factors for undernutrition in Ethiopia. Setting: Households in Ethiopia. Participants: A total of 9,471 children below five years of age. Results: The descriptive results show substantial regional variations in child stunting, wasting, and underweight in Ethiopia. Also, among the five ML algorithms, xgbTree algorithm shows a better prediction ability than the generalized linear mixed algorithm. The best predicting algorithm (xgbTree) shows diverse important predictors of undernutrition across the three outcomes which include time to water source, anemia history, child age greater than 30 months, small birth size, and maternal underweight, among others. Conclusions: The xgbTree algorithm was a reasonably superior ML algorithm for predicting childhood undernutrition in Ethiopia compared to other ML algorithms considered in this study. The findings support improvement in access to water supply, food security, and fertility regulation among others in the quest to considerably improve childhood nutrition in Ethiopia.


2020 ◽  
Vol 9 (9) ◽  
pp. 507
Author(s):  
Sanjiwana Arjasakusuma ◽  
Sandiaga Swahyu Kusuma ◽  
Stuart Phinn

Machine learning has been employed for various mapping and modeling tasks using input variables from different sources of remote sensing data. For feature selection involving high- spatial and spectral dimensionality data, various methods have been developed and incorporated into the machine learning framework to ensure an efficient and optimal computational process. This research aims to assess the accuracy of various feature selection and machine learning methods for estimating forest height using AISA (airborne imaging spectrometer for applications) hyperspectral bands (479 bands) and airborne light detection and ranging (lidar) height metrics (36 metrics), alone and combined. Feature selection and dimensionality reduction using Boruta (BO), principal component analysis (PCA), simulated annealing (SA), and genetic algorithm (GA) in combination with machine learning algorithms such as multivariate adaptive regression spline (MARS), extra trees (ET), support vector regression (SVR) with radial basis function, and extreme gradient boosting (XGB) with trees (XGbtree and XGBdart) and linear (XGBlin) classifiers were evaluated. The results demonstrated that the combinations of BO-XGBdart and BO-SVR delivered the best model performance for estimating tropical forest height by combining lidar and hyperspectral data, with R2 = 0.53 and RMSE = 1.7 m (18.4% of nRMSE and 0.046 m of bias) for BO-XGBdart and R2 = 0.51 and RMSE = 1.8 m (15.8% of nRMSE and −0.244 m of bias) for BO-SVR. Our study also demonstrated the effectiveness of BO for variables selection; it could reduce 95% of the data to select the 29 most important variables from the initial 516 variables from lidar metrics and hyperspectral data.


Sign in / Sign up

Export Citation Format

Share Document