scholarly journals Forecasting Air Travel Demand for Selected Destinations Using Machine Learning Methods

2021 ◽  
Vol 27 (6) ◽  
pp. 564-581
Author(s):  
Murat Firat ◽  
Derya Yiltas-Kaplan ◽  
Ruya Samli

Over the past decades, air transportation has expanded and big data for transportation era has emerged. Accurate travel demand information is an important issue for the transportation systems, especially for airline industry. So, “optimal seat capacity problem between origin and destination pairs” which is related to the load factor must be solved. In this study, a method for determining optimal seat capacity that can supply the highest load factor for the flight operation between any two countries has been introduced. The machine learning methods of Artificial Neural Network (ANN), Linear Regression (LR), Gradient Boosting (GB), and Random Forest (RF) have been applied and a software has been developed to solve the problem. The data set generated from The World Bank Database, which consists of thousands of features for all countries, has been used and a case study has been done for the period of 2014-2019 with Turkish Airlines. To the best of our knowledge, this is the first time that 1983 features have been used to forecast air travel demand in the literature within a model that covers all countries while previous studies cover only a few countries using far fewer features. Another valuable point of this study is the usage of the last regular data about the air transportation before COVID-19 pandemic. In other words, since many airline companies have experienced a decline in the air travel operation in 2020 due to COVID-19 pandemic, this study covers the most recent period (2014-2019) when flight operation performed on a regular basis. As a result, it has been observed that the developed model has forecasted the passenger load factor by an average error rate of 6.741% with GB, 6.763% with RF, 8.161% with ANN, and 9.619 % with LR.

Author(s):  
Pavel Kikin ◽  
Alexey Kolesnikov ◽  
Alexey Portnov ◽  
Denis Grischenko

The state of ecological systems, along with their general characteristics, is almost always described by indicators that vary in space and time, which leads to a significant complication of constructing mathematical models for predicting the state of such systems. One of the ways to simplify and automate the construction of mathematical models for predicting the state of such systems is the use of machine learning methods. The article provides a comparison of traditional and based on neural networks, algorithms and machine learning methods for predicting spatio-temporal series representing ecosystem data. Analysis and comparison were carried out among the following algorithms and methods: logistic regression, random forest, gradient boosting on decision trees, SARIMAX, neural networks of long-term short-term memory (LSTM) and controlled recurrent blocks (GRU). To conduct the study, data sets were selected that have both spatial and temporal components: the values of the number of mosquitoes, the number of dengue infections, the physical condition of tropical grove trees, and the water level in the river. The article discusses the necessary steps for preliminary data processing, depending on the algorithm used. Also, Kolmogorov complexity was calculated as one of the parameters that can help formalize the choice of the most optimal algorithm when constructing mathematical models of spatio-temporal data for the sets used. Based on the results of the analysis, recommendations are given on the application of certain methods and specific technical solutions, depending on the characteristics of the data set that describes a particular ecosystem


Author(s):  
Antônio Diogo Forte Martins ◽  
José Maria Monteiro ◽  
Javam Machado

During the coronavirus pandemic, the problem of misinformation arose once again, quite intensely, through social networks. In Brazil, one of the primary sources of misinformation is the messaging application WhatsApp. However, due to WhatsApp's private messaging nature, there still few methods of misinformation detection developed specifically for this platform. In this context, the automatic misinformation detection (MID) about COVID-19 in Brazilian Portuguese WhatsApp messages becomes a crucial challenge. In this work, we present the COVID-19.BR, a data set of WhatsApp messages about coronavirus in Brazilian Portuguese, collected from Brazilian public groups and manually labeled. Then, we are investigating different machine learning methods in order to build an efficient MID for WhatsApp messages. So far, our best result achieved an F1 score of 0.774 due to the predominance of short texts. However, when texts with less than 50 words are filtered, the F1 score rises to 0.85.


2021 ◽  
Author(s):  
Polash Banerjee

Abstract Wildfires in limited extent and intensity can be a boon for the forest ecosystem. However, recent episodes of wildfires of 2019 in Australia and Brazil are sad reminders of their heavy ecological and economical costs. Understanding the role of environmental factors in the likelihood of wildfires in a spatial context would be instrumental in mitigating it. In this study, 14 environmental features encompassing meteorological, topographical, ecological, in situ and anthropogenic factors have been considered for preparing the wildfire likelihood map of Sikkim Himalaya. A comparative study on the efficiency of machine learning methods like Generalized Linear Model (GLM), Support Vector Machine (SVM), Random Forest (RF) and Gradient Boosting Model (GBM) has been performed to identify the best performing algorithm in wildfire prediction. The study indicates that all the machine learning methods are good at predicting wildfires. However, RF has outperformed, followed by GBM in the prediction. Also, environmental features like average temperature, average wind speed, proximity to roadways and tree cover percentage are the most important determinants of wildfires in Sikkim Himalaya. This study can be considered as a decision support tool for preparedness, efficient resource allocation and sensitization of people towards mitigation of wildfires in Sikkim.


2019 ◽  
Vol 23 (1) ◽  
pp. 125-142
Author(s):  
Helle Hein ◽  
Ljubov Jaanuska

In this paper, the Haar wavelet discrete transform, the artificial neural networks (ANNs), and the random forests (RFs) are applied to predict the location and severity of a crack in an Euler–Bernoulli cantilever subjected to the transverse free vibration. An extensive investigation into two data collection sets and machine learning methods showed that the depth of a crack is more difficult to predict than its location. The data set of eight natural frequency parameters produces more accurate predictions on the crack depth; meanwhile, the data set of eight Haar wavelet coefficients produces more precise predictions on the crack location. Furthermore, the analysis of the results showed that the ensemble of 50 ANN trained by Bayesian regularization and Levenberg–Marquardt algorithms slightly outperforms RF.


2021 ◽  
Vol 3 ◽  
pp. 47-57
Author(s):  
I. N. Myagkova ◽  
◽  
V. R. Shirokii ◽  
Yu. S. Shugai ◽  
O. G. Barinov ◽  
...  

The ways are studied to improve the quality of prediction of the time series of hourly mean fluxes and daily total fluxes (fluences) of relativistic electrons in the outer radiation belt of the Earth 1 to 24 hours ahead and 1 to 4 days ahead, respectively. The prediction uses an approximation approach based on various machine learning methods, namely, artificial neural networks (ANNs), decision tree (random forest), and gradient boosting. A comparison of the skill scores of short-range forecasts with the lead time of 1 to 24 hours showed that the best results were demonstrated by ANNs. For medium-range forecasting, the accuracy of prediction of the fluences of relativistic electrons in the Earth’s outer radiation belt three to four days ahead increases significantly when the predicted values of the solar wind velocity near the Earth obtained from the UV images of the Sun of the AIA (Atmospheric Imaging Assembly) instrument of the SDO (Solar Dynamics Observatory) are included to the list of the input parameters.


2019 ◽  
pp. 089443931988844
Author(s):  
Ranjith Vijayakumar ◽  
Mike W.-L. Cheung

Machine learning methods have become very popular in diverse fields due to their focus on predictive accuracy, but little work has been conducted on how to assess the replicability of their findings. We introduce and adapt replication methods advocated in psychology to the aims and procedural needs of machine learning research. In Study 1, we illustrate these methods with the use of an empirical data set, assessing the replication success of a predictive accuracy measure, namely, R 2 on the cross-validated and test sets of the samples. We introduce three replication aims. First, tests of inconsistency examine whether single replications have successfully rejected the original study. Rejection will be supported if the 95% confidence interval (CI) of R 2 difference estimates between replication and original does not contain zero. Second, tests of consistency help support claims of successful replication. We can decide apriori on a region of equivalence, where population values of the difference estimates are considered equivalent for substantive reasons. The 90% CI of a different estimate lying fully within this region supports replication. Third, we show how to combine replications to construct meta-analytic intervals for better precision of predictive accuracy measures. In Study 2, R 2 is reduced from the original in a subset of replication studies to examine the ability of the replication procedures to distinguish true replications from nonreplications. We find that when combining studies sampled from same population to form meta-analytic intervals, random-effects methods perform best for cross-validated measures while fixed-effects methods work best for test measures. Among machine learning methods, regression was comparable to many complex methods, while support vector machine performed most reliably across a variety of scenarios. Social scientists who use machine learning to model empirical data can use these methods to enhance the reliability of their findings.


2021 ◽  
Author(s):  
Qifei Zhao ◽  
Xiaojun Li ◽  
Yunning Cao ◽  
Zhikun Li ◽  
Jixin Fan

Abstract Collapsibility of loess is a significant factor affecting engineering construction in loess area, and testing the collapsibility of loess is costly. In this study, A total of 4,256 loess samples are collected from the north, east, west and middle regions of Xining. 70% of the samples are used to generate training data set, and the rest are used to generate verification data set, so as to construct and validate the machine learning models. The most important six factors are selected from thirteen factors by using Grey Relational analysis and multicollinearity analysis: burial depth、water content、specific gravity of soil particles、void rate、geostatic stress and plasticity limit. In order to predict the collapsibility of loess, four machine learning methods: Support Vector Machine (SVM), Random Subspace Based Support Vector Machine (RSSVM), Random Forest (RF) and Naïve Bayes Tree (NBTree), are studied and compared. The receiver operating characteristic (ROC) curve indicators, standard error (SD) and 95% confidence interval (CI) are used to verify and compare the models in different research areas. The results show that: RF model is the most efficient in predicting the collapsibility of loess in Xining, and its AUC average is above 80%, which can be used in engineering practice.


Author(s):  
Artem Salamatov ◽  
Elena Gafarova ◽  
Vladimir Belevitin ◽  
Maxim Gafarov ◽  
Darya Gordeeva

The relevance of environmental and economic activity requires professional training of specialists and, accordingly, new organizational and pedagogical conditions for effective education. It is also necessary to develop control and measuring materials that would have all the qualities (validity, reliability, consistency, significance and objectivity) to obtain the most reliable results in justifying the need and sufficiency of the identified conditions. The intensification of information processes in vocational education leads researchers to the need to find optimal conditions and tools to achieve pedagogical goals. Among these tools are machine learning methods and mathematical models built on their basis for quantitative assessment of the quality of vocational training in the field of environmental and economic activities. The use of the qualimetric approach in pedagogy is possible in the presence of a certain array of observational data for one or another criterion related to learning conditions, personal qualities of students, etc. The construction of an algorithmic model allows one to operate with conditions in mental experiments, test hypotheses, and since pedagogical research is quite long in time, the choice of conditions based on the most favorable forecast built using the model allows one to optimize pedagogical resources to achieve the planned results. Rational selection of effective control and measuring materials (CMMs) allows one to determine the need and sufficiency of organizational and pedagogical conditions. While mathematical modeling allows one to quickly adjust the organizational and pedagogical conditions as a set of opportunities for content, forms, teaching methods, information and communication technologies (ICTs) and CMMs used to achieve the planned educational results in the sphere of environmental and economic activity. Interpretation of the derived features in the context of the pedagogical research performed with a cross-validation accuracy of 72% made it possible to reveal the dominant significance of intersubjective connections between the disciplines studied by the sample of students in the bachelor's and master's programs. Namely, programs 44.03.04 and 44.04.04 "Professional training (by industry)", which are the most significant in terms of the formation of competence in the field of environmental and economic activities. The designed mathematical model of the Gradient Boosting Classifier allows making predictive expectations of the studied competency types and testing hypotheses for the inclusion or exclusion of certain significant organizational and pedagogical conditions for the effective implementation of the educational process. A necessary and sufficient organizational and pedagogical condition for the effective formation of competence in the field of environmental and economic activity is to ensure continuity between significant disciplines and the actualization of interdisciplinary relationships based on the development of interdisciplinary courses.


Sign in / Sign up

Export Citation Format

Share Document