Performance Comparison of Machine Learning Models for Annual Precipitation Prediction Using Different Decomposition Methods

Chao Song; Xiaohong Chen

doi:10.3390/rs13051018

Performance Comparison of Machine Learning Models for Annual Precipitation Prediction Using Different Decomposition Methods

Remote Sensing ◽

10.3390/rs13051018 ◽

2021 ◽

Vol 13 (5) ◽

pp. 1018

Author(s):

Chao Song ◽

Xiaohong Chen

Keyword(s):

Machine Learning ◽

Wavelet Transform ◽

Empirical Mode Decomposition ◽

Prediction Accuracy ◽

Decomposition Methods ◽

Prediction Performance ◽

Learning Models ◽

Precipitation Prediction ◽

Mode Decomposition ◽

Machine Learning Models

It has become increasingly difficult in recent years to predict precipitation scientifically and accurately due to the dual effects of human activities and climatic conditions. This paper focuses on four aspects to improve precipitation prediction accuracy. Five decomposition methods (time-varying filter-based empirical mode decomposition (TVF-EMD), robust empirical mode decomposition (REMD), complementary ensemble empirical mode decomposition (CEEMD), wavelet transform (WT), and extreme-point symmetric mode decomposition (ESMD) combined with the Elman neural network (ENN)) are used to construct five prediction models, i.e., TVF-EMD-ENN, REMD-ENN, CEEMD-ENN, WT-ENN, and ESMD-ENN. The variance contribution rate (VCR) and Pearson correlation coefficient (PCC) are utilized to compare the performances of the five decomposition methods. The wavelet transform coherence (WTC) is used to determine the reason for the poor prediction performance of machine learning algorithms in individual years and the relationship with climate indicators. A secondary decomposition of the TVF-EMD is used to improve the prediction accuracy of the models. The proposed methods are used to predict the annual precipitation in Guangzhou. The subcomponents obtained from the TVF-EMD are the most stable among the four decomposition methods, and the North Atlantic Oscillation (NAO) index, the Nino 3.4 index, and sunspots have a smaller influence on the first subcomponent (Sc-1) than the other subcomponents. The TVF-EMD-ENN model has the best prediction performance and outperforms traditional machine learning models. The secondary decomposition of the Sc-1 of the TVF-EMD model significantly improves the prediction accuracy.

Download Full-text

Online learning behavior analysis based on machine learning

Asian Association of Open Universities Journal ◽

10.1108/aaouj-08-2019-0029 ◽

2019 ◽

Vol 14 (2) ◽

pp. 97-106

Author(s):

Ning Yan ◽

Oliver Tat-Sheung Au

Keyword(s):

Machine Learning ◽

Online Learning ◽

Correlation Analysis ◽

Prediction Accuracy ◽

Classification Models ◽

Limited Data ◽

Learning Models ◽

Learning Behavior ◽

Content Type ◽

Machine Learning Models

Purpose The purpose of this paper is to make a correlation analysis between students’ online learning behavior features and course grade, and to attempt to build some effective prediction model based on limited data. Design/methodology/approach The prediction label in this paper is the course grade of students, and the eigenvalues available are student age, student gender, connection time, hits count and days of access. The machine learning model used in this paper is the classical three-layer feedforward neural networks, and the scaled conjugate gradient algorithm is adopted. Pearson correlation analysis method is used to find the relationships between course grade and the student eigenvalues. Findings Days of access has the highest correlation with course grade, followed by hits count, and connection time is less relevant to students’ course grade. Student age and gender have the lowest correlation with course grade. Binary classification models have much higher prediction accuracy than multi-class classification models. Data normalization and data discretization can effectively improve the prediction accuracy of machine learning models, such as ANN model in this paper. Originality/value This paper may help teachers to find some clue to identify students with learning difficulties in advance and give timely help through the online learning behavior data. It shows that acceptable prediction models based on machine learning can be built using a small and limited data set. However, introducing external data into machine learning models to improve its prediction accuracy is still a valuable and hard issue.

Download Full-text

A Comparative Assessment of Six Machine Learning Models for Prediction of Bending Force in Hot Strip Rolling Process

Metals ◽

10.3390/met10050685 ◽

2020 ◽

Vol 10 (5) ◽

pp. 685 ◽

Cited By ~ 2

Author(s):

Xu Li ◽

Feng Luan ◽

Yan Wu

Keyword(s):

Prediction Accuracy ◽

Computational Cost ◽

Regression Tree ◽

Prediction Performance ◽

Learning Models ◽

Hot Strip Rolling ◽

Strip Rolling ◽

Bending Force ◽

Hot Strip ◽

Machine Learning Models

In the hot strip rolling (HSR) process, accurate prediction of bending force can improve the control accuracy of the strip crown and flatness, and further improve the strip shape quality. In this paper, six machine learning models, including Artificial Neural Network (ANN), Support Vector Machine (SVR), Classification and Regression Tree (CART), Bagging Regression Tree (BRT), Least Absolute Shrinkage and Selection operator (LASSO), and Gaussian Process Regression (GPR), were applied to predict the bending force in the HSR process. A comparative experiment was carried out based on a real-life dataset, and the prediction performance of the six models was analyzed from prediction accuracy, stability, and computational cost. The prediction performance of the six models was assessed using three evaluation metrics of root mean square error (RMSE), mean absolute error (MAE), and coefficient of determination (R2). The results show that the GPR model is considered as the optimal model for bending force prediction with the best prediction accuracy, better stability, and acceptable computational cost. The prediction accuracy and stability of CART and ANN are slightly lower than that of GPR. Although BRT also shows a good combination of prediction accuracy and computational cost, the stability of BRT is the worst in the six models. SVM not only has poor prediction accuracy, but also has the highest computational cost while LASSO showed the worst prediction accuracy.

Download Full-text

Comparative analysis of surface water quality prediction performance and identification of key water parameters using different machine learning models based on big data

Water Research ◽

10.1016/j.watres.2019.115454 ◽

2020 ◽

Vol 171 ◽

pp. 115454 ◽

Cited By ~ 9

Author(s):

Kangyang Chen ◽

Hexia Chen ◽

Chuanlong Zhou ◽

Yichao Huang ◽

Xiangyang Qi ◽

...

Keyword(s):

Machine Learning ◽

Water Quality ◽

Big Data ◽

Surface Water Quality ◽

Prediction Performance ◽

Quality Prediction ◽

Learning Models ◽

Water Parameters ◽

Water Quality Prediction ◽

Machine Learning Models

Download Full-text

Machine learning-based prediction model for responses of bDMARDs in patients with rheumatoid arthritis and ankylosing spondylitis

Arthritis Research & Therapy ◽

10.1186/s13075-021-02635-3 ◽

2021 ◽

Vol 23 (1) ◽

Author(s):

Seulkee Lee ◽

Seonyoung Kang ◽

Yeonghee Eun ◽

Hong-Hee Won ◽

Hyungjin Kim ◽

...

Keyword(s):

Rheumatoid Arthritis ◽

Machine Learning ◽

Logistic Regression ◽

Ankylosing Spondylitis ◽

Regression Model ◽

Logistic Regression Model ◽

Prediction Performance ◽

Learning Models ◽

Independent Test ◽

Machine Learning Models

Abstract Background Few studies on rheumatoid arthritis (RA) have generated machine learning models to predict biologic disease-modifying antirheumatic drugs (bDMARDs) responses; however, these studies included insufficient analysis on important features. Moreover, machine learning is yet to be used to predict bDMARD responses in ankylosing spondylitis (AS). Thus, in this study, machine learning was used to predict such responses in RA and AS patients. Methods Data were retrieved from the Korean College of Rheumatology Biologics therapy (KOBIO) registry. The number of RA and AS patients in the training dataset were 625 and 611, respectively. We prepared independent test datasets that did not participate in any process of generating machine learning models. Baseline clinical characteristics were used as input features. Responders were defined as those who met the ACR 20% improvement response criteria (ACR20) and ASAS 20% improvement response criteria (ASAS20) in RA and AS, respectively, at the first follow-up. Multiple machine learning methods, including random forest (RF-method), were used to generate models to predict bDMARD responses, and we compared them with the logistic regression model. Results The RF-method model had superior prediction performance to logistic regression model (accuracy: 0.726 [95% confidence interval (CI): 0.725–0.730] vs. 0.689 [0.606–0.717], area under curve (AUC) of the receiver operating characteristic curve (ROC) 0.638 [0.576–0.658] vs. 0.565 [0.493–0.605], F1 score 0.841 [0.837–0.843] vs. 0.803 [0.732–0.828], AUC of the precision-recall curve 0.808 [0.763–0.829] vs. 0.754 [0.714–0.789]) with independent test datasets in patients with RA. However, machine learning and logistic regression exhibited similar prediction performance in AS patients. Furthermore, the patient self-reporting scales, which are patient global assessment of disease activity (PtGA) in RA and Bath Ankylosing Spondylitis Functional Index (BASFI) in AS, were revealed as the most important features in both diseases. Conclusions RF-method exhibited superior prediction performance for responses of bDMARDs to a conventional statistical method, i.e., logistic regression, in RA patients. In contrast, despite the comparable size of the dataset, machine learning did not outperform in AS patients. The most important features of both diseases, according to feature importance analysis were patient self-reporting scales.

Download Full-text

An Evaluation of Wearable Inertial Sensor Configuration and Supervised Machine Learning Models for Automatic Punch Classification in Boxing

IoT ◽

10.3390/iot1020021 ◽

2020 ◽

Vol 1 (2) ◽

pp. 360-381

Author(s):

Matthew T. O. Worsey ◽

Hugo G. Espinosa ◽

Jonathan B. Shepherd ◽

David V. Thiel

Keyword(s):

Machine Learning ◽

Prediction Accuracy ◽

Inertial Sensors ◽

Inertial Sensor ◽

Supervised Machine Learning ◽

Learning Models ◽

Significant Statistical Difference ◽

Sensor Configuration ◽

Wearable Inertial Sensors ◽

Machine Learning Models

Machine learning is a powerful tool for data classification and has been used to classify movement data recorded by wearable inertial sensors in general living and sports. Inertial sensors can provide valuable biofeedback in combat sports such as boxing; however, the use of such technology has not had a global uptake. If simple inertial sensor configurations can be used to automatically classify strike type, then cumbersome tasks such as video labelling can be bypassed and the foundation for automated workload monitoring of combat sport athletes is set. This investigation evaluates the classification performance of six different supervised machine learning models (tuned and untuned) when using two simple inertial sensor configurations (configuration 1—inertial sensor worn on both wrists; configuration 2—inertial sensor worn on both wrists and third thoracic vertebrae [T3]). When trained on one athlete, strike prediction accuracy was good using both configurations (sensor configuration 1 mean overall accuracy: 0.90 ± 0.12; sensor configuration 2 mean overall accuracy: 0.87 ± 0.09). There was no significant statistical difference in prediction accuracy between both configurations and tuned and untuned models (p > 0.05). Moreover, there was no significant statistical difference in computational training time for tuned and untuned models (p > 0.05). For sensor configuration 1, a support vector machine (SVM) model with a Gaussian rbf kernel performed the best (accuracy = 0.96), for sensor configuration 2, a multi-layered perceptron neural network (MLP-NN) model performed the best (accuracy = 0.98). Wearable inertial sensors can be used to accurately classify strike-type in boxing pad work, this means that cumbersome tasks such as video and notational analysis can be bypassed. Additionally, automated workload and performance monitoring of athletes throughout training camp is possible. Future investigations will evaluate the performance of this algorithm on a greater sample size and test the influence of impact window-size on prediction accuracy. Additionally, supervised machine learning models should be trained on data collected during sparring to see if high accuracy holds in a competition setting. This can help move closer towards automatic scoring in boxing.

Download Full-text

Semi-empirical prediction method for monthly precipitation prediction based on environmental factors and comparison with stochastic and machine learning models

Hydrological Sciences Journal ◽

10.1080/02626667.2020.1784901 ◽

2020 ◽

Vol 65 (11) ◽

pp. 1928-1942 ◽

Cited By ~ 1

Author(s):

Huihui Zhang ◽

Hugo A. Loáiciga ◽

Fu Ren ◽

Qingyun Du ◽

Da Ha

Keyword(s):

Machine Learning ◽

Environmental Factors ◽

Prediction Method ◽

Monthly Precipitation ◽

Learning Models ◽

Empirical Prediction ◽

Precipitation Prediction ◽

Semi Empirical ◽

Machine Learning Models

Download Full-text

Predicting unstable software benchmarks using static source code features

Empirical Software Engineering ◽

10.1007/s10664-021-09996-y ◽

2021 ◽

Vol 26 (6) ◽

Author(s):

Christoph Laaber ◽

Mikael Basmaci ◽

Pasquale Salza

Keyword(s):

Machine Learning ◽

Source Code ◽

Prediction Performance ◽

Repeated Measurements ◽

Good Prediction ◽

Testing Time ◽

Learning Models ◽

Actual Performance ◽

Meta Information ◽

Machine Learning Models

AbstractSoftware benchmarks are only as good as the performance measurements they yield. Unstable benchmarks show high variability among repeated measurements, which causes uncertainty about the actual performance and complicates reliable change assessment. However, if a benchmark is stable or unstable only becomes evident after it has been executed and its results are available. In this paper, we introduce a machine-learning-based approach to predict a benchmark’s stability without having to execute it. Our approach relies on 58 statically-computed source code features, extracted for benchmark code and code called by a benchmark, related to (1) meta information, e.g., lines of code (LOC), (2) programming language elements, e.g., conditionals or loops, and (3) potentially performance-impacting standard library calls, e.g., file and network input/output (I/O). To assess our approach’s effectiveness, we perform a large-scale experiment on 4,461 Go benchmarks coming from 230 open-source software (OSS) projects. First, we assess the prediction performance of our machine learning models using 11 binary classification algorithms. We find that Random Forest performs best with good prediction performance from 0.79 to 0.90, and 0.43 to 0.68, in terms of AUC and MCC, respectively. Second, we perform feature importance analyses for individual features and feature categories. We find that 7 features related to meta-information, slice usage, nested loops, and synchronization application programming interfaces (APIs) are individually important for good predictions; and that the combination of all features of the called source code is paramount for our model, while the combination of features of the benchmark itself is less important. Our results show that although benchmark stability is affected by more than just the source code, we can effectively utilize machine learning models to predict whether a benchmark will be stable or not ahead of execution. This enables spending precious testing time on reliable benchmarks, supporting developers to identify unstable benchmarks during development, allowing unstable benchmarks to be repeated more often, estimating stability in scenarios where repeated benchmark execution is infeasible or impossible, and warning developers if new benchmarks or existing benchmarks executed in new environments will be unstable.

Download Full-text

A Dynamic Convolutional Neural Network Based Shared-Bike Demand Forecasting Model

ACM Transactions on Intelligent Systems and Technology ◽

10.1145/3447988 ◽

2021 ◽

Vol 12 (6) ◽

pp. 1-24

Author(s):

Shaojie Qiao ◽

Nan Han ◽

Jianbin Huang ◽

Kun Yue ◽

Rui Mao ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Convolutional Neural Network ◽

Prediction Accuracy ◽

Demand Forecasting ◽

Forecasting Model ◽

Learning Models ◽

Bike Sharing ◽

Demand Forecasting Model ◽

Machine Learning Models

Bike-sharing systems are becoming popular and generate a large volume of trajectory data. In a bike-sharing system, users can borrow and return bikes at different stations. In particular, a bike-sharing system will be affected by weather, the time period, and other dynamic factors, which challenges the scheduling of shared bikes. In this article, a new shared-bike demand forecasting model based on dynamic convolutional neural networks, called SDF , is proposed to predict the demand of shared bikes. SDF chooses the most relevant weather features from real weather data by using the Pearson correlation coefficient and transforms them into a two-dimensional dynamic feature matrix, taking into account the states of stations from historical data. The feature information in the matrix is extracted, learned, and trained with a newly proposed dynamic convolutional neural network to predict the demand of shared bikes in a dynamical and intelligent fashion. The phase of parameter update is optimized from three aspects: the loss function, optimization algorithm, and learning rate. Then, an accurate shared-bike demand forecasting model is designed based on the basic idea of minimizing the loss value. By comparing with classical machine learning models, the weight sharing strategy employed by SDF reduces the complexity of the network. It allows a high prediction accuracy to be achieved within a relatively short period of time. Extensive experiments are conducted on real-world bike-sharing datasets to evaluate SDF. The results show that SDF significantly outperforms classical machine learning models in prediction accuracy and efficiency.

Download Full-text

Evaluating three different adaptive decomposition methods for EEG signal seizure detection and classification

10.1101/691055 ◽

2019 ◽

Author(s):

Vinícius R. Carvalho ◽

Márcio F.D. Moraes ◽

Antônio P. Braga ◽

Eduardo M.A.M. Mendes

Keyword(s):

Wavelet Transform ◽

Real Time ◽

Empirical Mode Decomposition ◽

Decomposition Methods ◽

Seizure Detection ◽

Variational Mode Decomposition ◽

Real Time Processing ◽

Mode Decomposition ◽

Empirical Wavelet Transform ◽

Adaptive Decomposition

AbstractSignal processing and machine learning methods are valuable tools in epilepsy research, potentially assisting in diagnosis, seizure detection, prediction and real-time event detection during long term monitoring. Recent approaches involve the decomposition of these signals in different modes or functions in a data-dependent and adaptive way. These approaches may provide advantages over commonly used Fourier based methods due to their ability to work with nonlinear and non-stationary data. In this work, three adaptive decomposition methods (Empirical Mode Decomposition, Empirical Wavelet Transform and Variational Mode Decomposition) are evaluated for the classification of normal, ictal and inter-ictal EEG signals using a freely available database. We provide a previously unavailable common methodology for comparing the performance of these methods for EEG seizure detection, with the use of the same classifiers, parameters and spectral and time domain features. It is shown that the outcomes using the three methods are quite similar, with maximum accuracies of 97.5% for Empirical Mode Decomposition, 96.7% for Empirical Wavelet Transform and 98.2% for Variational Mode Decomposition. Features were also extracted from the original non-decomposed signals, yielding inferior, but still fairly accurate (95.3%) results. The evaluated decomposition methods are promising approaches for seizure detection, but their use should be judiciously analysed, especially in situations that require real-time processing and computational power is an issue. An additional methodological contribution of this work is the development of two python packages, already available at the PyPI repository: One for the Empirical Wavelet Transform (ewtpy) and another for Variational Mode Decomposition (vmdpy).

Download Full-text

Machine Learning Method for TOC Prediction: Taking Wufeng and Longmaxi Shales in the Sichuan Basin, Southwest China as an Example

Geofluids ◽

10.1155/2021/6794213 ◽

2021 ◽

Vol 2021 ◽

pp. 1-13

Author(s):

Jia Rong ◽

Zongyuan Zheng ◽

Xiaorong Luo ◽

Chao Li ◽

Yuping Li ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Shale Gas ◽

Sichuan Basin ◽

Prediction Accuracy ◽

Random Search ◽

Organic Carbon Content ◽

Learning Models ◽

The Sichuan Basin ◽

Machine Learning Models

The total organic carbon content (TOC) is a core indicator for shale gas reservoir evaluations. Machine learning-based models can quickly and accurately predict TOC, which is of great significance for the production of shale gas. Based on conventional logs, the measured TOC values, and other data of 9 typical wells in the Jiaoshiba area of the Sichuan Basin, this paper performed a Bayesian linear regression and applied a random forest machine learning model to predict TOC values of the shale from the Wufeng Formation and the lower part of the Longmaxi Formation. The results showed that the TOC value prediction accuracy was improved by more than 50% by using the well-trained machine learning models compared with the traditional Δ Log R method in an overmature and tight shale. Using the halving random search cross-validation method to optimize hyperparameters can greatly improve the speed of building the model. Furthermore, excluding the factors that affect the log value other than the TOC and taking the corrected data as input data for training could improve the prediction accuracy of the random forest model by approximately 5%. Data can be easily updated with machine learning models, which is of primary importance for improving the efficiency of shale gas exploration and development.

Download Full-text