Forecasting Air Travel Demand for Selected Destinations Using Machine Learning Methods

Murat Firat; Derya Yiltas-Kaplan; Ruya Samli

doi:10.3897/jucs.68185

Forecasting Air Travel Demand for Selected Destinations Using Machine Learning Methods

JUCS - Journal of Universal Computer Science ◽

10.3897/jucs.68185 ◽

2021 ◽

Vol 27 (6) ◽

pp. 564-581

Author(s):

Murat Firat ◽

Derya Yiltas-Kaplan ◽

Ruya Samli

Keyword(s):

Machine Learning ◽

Travel Demand ◽

Air Transportation ◽

Air Travel ◽

Gradient Boosting ◽

Load Factor ◽

Learning Methods ◽

Data Set ◽

Machine Learning Methods ◽

Air Travel Demand

Over the past decades, air transportation has expanded and big data for transportation era has emerged. Accurate travel demand information is an important issue for the transportation systems, especially for airline industry. So, “optimal seat capacity problem between origin and destination pairs” which is related to the load factor must be solved. In this study, a method for determining optimal seat capacity that can supply the highest load factor for the flight operation between any two countries has been introduced. The machine learning methods of Artificial Neural Network (ANN), Linear Regression (LR), Gradient Boosting (GB), and Random Forest (RF) have been applied and a software has been developed to solve the problem. The data set generated from The World Bank Database, which consists of thousands of features for all countries, has been used and a case study has been done for the period of 2014-2019 with Turkish Airlines. To the best of our knowledge, this is the first time that 1983 features have been used to forecast air travel demand in the literature within a model that covers all countries while previous studies cover only a few countries using far fewer features. Another valuable point of this study is the usage of the last regular data about the air transportation before COVID-19 pandemic. In other words, since many airline companies have experienced a decline in the air travel operation in 2020 due to COVID-19 pandemic, this study covers the most recent period (2014-2019) when flight operation performed on a regular basis. As a result, it has been observed that the developed model has forecasted the passenger load factor by an average error rate of 6.741% with GB, 6.763% with RF, 8.161% with ANN, and 9.619 % with LR.

Download Full-text

Natural language processing systems for data extraction and mapping on the basis of unstructured text blocks

Proceedings of the International conference “InterCarto/InterGIS” ◽

10.35595/2414-9179-2020-3-26-53-61 ◽

2020 ◽

Vol 26 (3) ◽

pp. 53-61

Author(s):

Pavel Kikin ◽

Alexey Kolesnikov ◽

Alexey Portnov ◽

Denis Grischenko

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Mathematical Models ◽

Optimal Algorithm ◽

The State ◽

Gradient Boosting ◽

Learning Methods ◽

Data Set ◽

Machine Learning Methods ◽

Spatio Temporal

The state of ecological systems, along with their general characteristics, is almost always described by indicators that vary in space and time, which leads to a significant complication of constructing mathematical models for predicting the state of such systems. One of the ways to simplify and automate the construction of mathematical models for predicting the state of such systems is the use of machine learning methods. The article provides a comparison of traditional and based on neural networks, algorithms and machine learning methods for predicting spatio-temporal series representing ecosystem data. Analysis and comparison were carried out among the following algorithms and methods: logistic regression, random forest, gradient boosting on decision trees, SARIMAX, neural networks of long-term short-term memory (LSTM) and controlled recurrent blocks (GRU). To conduct the study, data sets were selected that have both spatial and temporal components: the values of the number of mosquitoes, the number of dengue infections, the physical condition of tropical grove trees, and the water level in the river. The article discusses the necessary steps for preliminary data processing, depending on the algorithm used. Also, Kolmogorov complexity was calculated as one of the parameters that can help formalize the choice of the most optimal algorithm when constructing mathematical models of spatio-temporal data for the sets used. Based on the results of the analysis, recommendations are given on the application of certain methods and specific technical solutions, depending on the characteristics of the data set that describes a particular ecosystem

Download Full-text

Automatic Misinformation Detection About COVID-19 in Brazilian Portuguese WhatsApp Messages

10.5753/sbbd_estendido.2021.18173 ◽

2021 ◽

Author(s):

Antônio Diogo Forte Martins ◽

José Maria Monteiro ◽

Javam Machado

Keyword(s):

Machine Learning ◽

Social Networks ◽

Brazilian Portuguese ◽

Primary Sources ◽

Learning Methods ◽

Data Set ◽

Machine Learning Methods

During the coronavirus pandemic, the problem of misinformation arose once again, quite intensely, through social networks. In Brazil, one of the primary sources of misinformation is the messaging application WhatsApp. However, due to WhatsApp's private messaging nature, there still few methods of misinformation detection developed specifically for this platform. In this context, the automatic misinformation detection (MID) about COVID-19 in Brazilian Portuguese WhatsApp messages becomes a crucial challenge. In this work, we present the COVID-19.BR, a data set of WhatsApp messages about coronavirus in Brazilian Portuguese, collected from Brazilian public groups and manually labeled. Then, we are investigating different machine learning methods in order to build an efficient MID for WhatsApp messages. So far, our best result achieved an F1 score of 0.774 due to the predominance of short texts. However, when texts with less than 50 words are filtered, the F1 score rises to 0.85.

Download Full-text

MODIS-FIRMS and ground-truthing based wildfire likelihood mapping of Sikkim Himalaya using machine learning algorithms.

10.21203/rs.3.rs-750123/v1 ◽

2021 ◽

Author(s):

Polash Banerjee

Keyword(s):

Machine Learning ◽

Machine Learning Algorithms ◽

Tree Cover ◽

Anthropogenic Factors ◽

Gradient Boosting ◽

Support Vector ◽

Learning Methods ◽

Sikkim Himalaya ◽

Environmental Features ◽

Machine Learning Methods

Abstract Wildfires in limited extent and intensity can be a boon for the forest ecosystem. However, recent episodes of wildfires of 2019 in Australia and Brazil are sad reminders of their heavy ecological and economical costs. Understanding the role of environmental factors in the likelihood of wildfires in a spatial context would be instrumental in mitigating it. In this study, 14 environmental features encompassing meteorological, topographical, ecological, in situ and anthropogenic factors have been considered for preparing the wildfire likelihood map of Sikkim Himalaya. A comparative study on the efficiency of machine learning methods like Generalized Linear Model (GLM), Support Vector Machine (SVM), Random Forest (RF) and Gradient Boosting Model (GBM) has been performed to identify the best performing algorithm in wildfire prediction. The study indicates that all the machine learning methods are good at predicting wildfires. However, RF has outperformed, followed by GBM in the prediction. Also, environmental features like average temperature, average wind speed, proximity to roadways and tree cover percentage are the most important determinants of wildfires in Sikkim Himalaya. This study can be considered as a decision support tool for preparedness, efficient resource allocation and sensitization of people towards mitigation of wildfires in Sikkim.

Download Full-text

Comparison of machine learning methods for crack localization

Acta et Commentationes Universitatis Tartuensis de Mathematica ◽

10.12697/acutm.2019.23.13 ◽

2019 ◽

Vol 23 (1) ◽

pp. 125-142

Author(s):

Helle Hein ◽

Ljubov Jaanuska

Keyword(s):

Machine Learning ◽

Random Forests ◽

Crack Depth ◽

Haar Wavelet ◽

Extensive Investigation ◽

Learning Methods ◽

Data Set ◽

Crack Location ◽

Machine Learning Methods ◽

Discrete Transform

In this paper, the Haar wavelet discrete transform, the artificial neural networks (ANNs), and the random forests (RFs) are applied to predict the location and severity of a crack in an Euler–Bernoulli cantilever subjected to the transverse free vibration. An extensive investigation into two data collection sets and machine learning methods showed that the depth of a crack is more difficult to predict than its location. The data set of eight natural frequency parameters produces more accurate predictions on the crack depth; meanwhile, the data set of eight Haar wavelet coefficients produces more precise predictions on the crack location. Furthermore, the analysis of the results showed that the ensemble of 50 ANN trained by Bayesian regularization and Levenberg–Marquardt algorithms slightly outperforms RF.

Download Full-text

Modelling of diesel engine performance using advanced machine learning methods under scarce and exponential data set

Applied Soft Computing ◽

10.1016/j.asoc.2013.06.006 ◽

2013 ◽

Vol 13 (11) ◽

pp. 4428-4441 ◽

Cited By ~ 25

Author(s):

Ka In Wong ◽

Pak Kin Wong ◽

Chun Shun Cheung ◽

Chi Man Vong

Keyword(s):

Machine Learning ◽

Diesel Engine ◽

Engine Performance ◽

Learning Methods ◽

Data Set ◽

Machine Learning Methods

Download Full-text

Short- and Medium-range Prediction of Relativistic Electron Flux in the Earth’s Outer Radiation Belt by Machine Learning Methods

Meteorologiya i Gidrologiya ◽

10.52002/0130-2906-2021-3-47-57 ◽

2021 ◽

Vol 3 ◽

pp. 47-57

Author(s):

I. N. Myagkova ◽

◽

V. R. Shirokii ◽

Yu. S. Shugai ◽

O. G. Barinov ◽

...

Keyword(s):

Machine Learning ◽

Radiation Belt ◽

Gradient Boosting ◽

Relativistic Electrons ◽

Learning Methods ◽

Outer Radiation Belt ◽

Machine Learning Methods ◽

The Earth ◽

Skill Scores ◽

Medium Range

The ways are studied to improve the quality of prediction of the time series of hourly mean fluxes and daily total fluxes (fluences) of relativistic electrons in the outer radiation belt of the Earth 1 to 24 hours ahead and 1 to 4 days ahead, respectively. The prediction uses an approximation approach based on various machine learning methods, namely, artificial neural networks (ANNs), decision tree (random forest), and gradient boosting. A comparison of the skill scores of short-range forecasts with the lead time of 1 to 24 hours showed that the best results were demonstrated by ANNs. For medium-range forecasting, the accuracy of prediction of the fluences of relativistic electrons in the Earth’s outer radiation belt three to four days ahead increases significantly when the predicted values of the solar wind velocity near the Earth obtained from the UV images of the Sun of the AIA (Atmospheric Imaging Assembly) instrument of the SDO (Solar Dynamics Observatory) are included to the list of the input parameters.

Download Full-text

Analysis of Cancer Data Set with Statistical and Unsupervised Machine Learning Methods

Smart Intelligent Computing and Applications - Smart Innovation, Systems and Technologies ◽

10.1007/978-981-13-1921-1_27 ◽

2018 ◽

pp. 267-276

Author(s):

T. Panduranga Vital ◽

K. Dileep Kumar ◽

H. V. Bhagya Sri ◽

M. Murali Krishna

Keyword(s):

Machine Learning ◽

Learning Methods ◽

Data Set ◽

Unsupervised Machine Learning ◽

Cancer Data ◽

Machine Learning Methods

Download Full-text

Assessing Replicability of Machine Learning Results: An Introduction to Methods on Predictive Accuracy in Social Sciences

Social Science Computer Review ◽

10.1177/0894439319888445 ◽

2019 ◽

pp. 089443931988844

Author(s):

Ranjith Vijayakumar ◽

Mike W.-L. Cheung

Keyword(s):

Machine Learning ◽

Empirical Data ◽

Fixed Effects ◽

Predictive Accuracy ◽

Support Vector ◽

Learning Methods ◽

Data Set ◽

Replication Studies ◽

Machine Learning Methods ◽

Accuracy Measure

Machine learning methods have become very popular in diverse fields due to their focus on predictive accuracy, but little work has been conducted on how to assess the replicability of their findings. We introduce and adapt replication methods advocated in psychology to the aims and procedural needs of machine learning research. In Study 1, we illustrate these methods with the use of an empirical data set, assessing the replication success of a predictive accuracy measure, namely, R 2 on the cross-validated and test sets of the samples. We introduce three replication aims. First, tests of inconsistency examine whether single replications have successfully rejected the original study. Rejection will be supported if the 95% confidence interval (CI) of R 2 difference estimates between replication and original does not contain zero. Second, tests of consistency help support claims of successful replication. We can decide apriori on a region of equivalence, where population values of the difference estimates are considered equivalent for substantive reasons. The 90% CI of a different estimate lying fully within this region supports replication. Third, we show how to combine replications to construct meta-analytic intervals for better precision of predictive accuracy measures. In Study 2, R 2 is reduced from the original in a subset of replication studies to examine the ability of the replication procedures to distinguish true replications from nonreplications. We find that when combining studies sampled from same population to form meta-analytic intervals, random-effects methods perform best for cross-validated measures while fixed-effects methods work best for test measures. Among machine learning methods, regression was comparable to many complex methods, while support vector machine performed most reliably across a variety of scenarios. Social scientists who use machine learning to model empirical data can use these methods to enhance the reliability of their findings.

Download Full-text

Prediction of Collapsibility of Loess of Construction Sites in Xining Based on Machine Learning Methods

10.21203/rs.3.rs-307514/v1 ◽

2021 ◽

Author(s):

Qifei Zhao ◽

Xiaojun Li ◽

Yunning Cao ◽

Zhikun Li ◽

Jixin Fan

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Training Data ◽

Support Vector ◽

Engineering Practice ◽

Burial Depth ◽

Learning Methods ◽

Data Set ◽

Machine Learning Methods ◽

North East

Abstract Collapsibility of loess is a significant factor affecting engineering construction in loess area, and testing the collapsibility of loess is costly. In this study, A total of 4,256 loess samples are collected from the north, east, west and middle regions of Xining. 70% of the samples are used to generate training data set, and the rest are used to generate verification data set, so as to construct and validate the machine learning models. The most important six factors are selected from thirteen factors by using Grey Relational analysis and multicollinearity analysis: burial depth、water content、specific gravity of soil particles、void rate、geostatic stress and plasticity limit. In order to predict the collapsibility of loess, four machine learning methods: Support Vector Machine (SVM), Random Subspace Based Support Vector Machine (RSSVM), Random Forest (RF) and Naïve Bayes Tree (NBTree), are studied and compared. The receiver operating characteristic (ROC) curve indicators, standard error (SD) and 95% confidence interval (CI) are used to verify and compare the models in different research areas. The results show that: RF model is the most efficient in predicting the collapsibility of loess in Xining, and its AUC average is above 80%, which can be used in engineering practice.

Download Full-text

Machine Learning Methods and Qualimetric Approach to Determine the Conditions for Train Students in the Field of Environmental and Economic Activities

International Journal of Emerging Technologies in Learning (iJET) ◽

10.3991/ijet.v16i03.17715 ◽

2021 ◽

Vol 16 (03) ◽

pp. 72

Author(s):

Artem Salamatov ◽

Elena Gafarova ◽

Vladimir Belevitin ◽

Maxim Gafarov ◽

Darya Gordeeva

Keyword(s):

Machine Learning ◽

Economic Activity ◽

Professional Training ◽

Educational Process ◽

Effective Control ◽

Gradient Boosting ◽

Economic Activities ◽

Learning Methods ◽

Machine Learning Methods ◽

Pedagogical Research

The relevance of environmental and economic activity requires professional training of specialists and, accordingly, new organizational and pedagogical conditions for effective education. It is also necessary to develop control and measuring materials that would have all the qualities (validity, reliability, consistency, significance and objectivity) to obtain the most reliable results in justifying the need and sufficiency of the identified conditions. The intensification of information processes in vocational education leads researchers to the need to find optimal conditions and tools to achieve pedagogical goals. Among these tools are machine learning methods and mathematical models built on their basis for quantitative assessment of the quality of vocational training in the field of environmental and economic activities. The use of the qualimetric approach in pedagogy is possible in the presence of a certain array of observational data for one or another criterion related to learning conditions, personal qualities of students, etc. The construction of an algorithmic model allows one to operate with conditions in mental experiments, test hypotheses, and since pedagogical research is quite long in time, the choice of conditions based on the most favorable forecast built using the model allows one to optimize pedagogical resources to achieve the planned results. Rational selection of effective control and measuring materials (CMMs) allows one to determine the need and sufficiency of organizational and pedagogical conditions. While mathematical modeling allows one to quickly adjust the organizational and pedagogical conditions as a set of opportunities for content, forms, teaching methods, information and communication technologies (ICTs) and CMMs used to achieve the planned educational results in the sphere of environmental and economic activity. Interpretation of the derived features in the context of the pedagogical research performed with a cross-validation accuracy of 72% made it possible to reveal the dominant significance of intersubjective connections between the disciplines studied by the sample of students in the bachelor's and master's programs. Namely, programs 44.03.04 and 44.04.04 "Professional training (by industry)", which are the most significant in terms of the formation of competence in the field of environmental and economic activities. The designed mathematical model of the Gradient Boosting Classifier allows making predictive expectations of the studied competency types and testing hypotheses for the inclusion or exclusion of certain significant organizational and pedagogical conditions for the effective implementation of the educational process. A necessary and sufficient organizational and pedagogical condition for the effective formation of competence in the field of environmental and economic activity is to ensure continuity between significant disciplines and the actualization of interdisciplinary relationships based on the development of interdisciplinary courses.

Download Full-text