scholarly journals Modeling Pan Evaporation Using Gaussian Process Regression K-Nearest Neighbors Random Forest and Support Vector Machines; Comparative Analysis

Atmosphere ◽  
2020 ◽  
Vol 11 (1) ◽  
pp. 66 ◽  
Author(s):  
Sevda Shabani ◽  
Saeed Samadianfard ◽  
Mohammad Taghi Sattari ◽  
Amir Mosavi ◽  
Shahaboddin Shamshirband ◽  
...  

Evaporation is a very important process; it is one of the most critical factors in agricultural, hydrological, and meteorological studies. Due to the interactions of multiple climatic factors, evaporation is considered as a complex and nonlinear phenomenon to model. Thus, machine learning methods have gained popularity in this realm. In the present study, four machine learning methods of Gaussian Process Regression (GPR), K-Nearest Neighbors (KNN), Random Forest (RF) and Support Vector Regression (SVR) were used to predict the pan evaporation (PE). Meteorological data including PE, temperature (T), relative humidity (RH), wind speed (W), and sunny hours (S) collected from 2011 through 2017. The accuracy of the studied methods was determined using the statistical indices of Root Mean Squared Error (RMSE), correlation coefficient (R) and Mean Absolute Error (MAE). Furthermore, the Taylor charts utilized for evaluating the accuracy of the mentioned models. The results of this study showed that at Gonbad-e Kavus, Gorgan and Bandar Torkman stations, GPR with RMSE of 1.521 mm/day, 1.244 mm/day, and 1.254 mm/day, KNN with RMSE of 1.991 mm/day, 1.775 mm/day, and 1.577 mm/day, RF with RMSE of 1.614 mm/day, 1.337 mm/day, and 1.316 mm/day, and SVR with RMSE of 1.55 mm/day, 1.262 mm/day, and 1.275 mm/day had more appropriate performances in estimating PE values. It was found that GPR for Gonbad-e Kavus Station with input parameters of T, W and S and GPR for Gorgan and Bandar Torkmen stations with input parameters of T, RH, W and S had the most accurate predictions and were proposed for precise estimation of PE. The findings of the current study indicated that the PE values may be accurately estimated with few easily measured meteorological parameters.

2020 ◽  
Vol 44 (4) ◽  
pp. 646-652
Author(s):  
A.A. Borodinov

The paper considers a problem of determining the user preferred stops in a public transport recommender system. The effectiveness of using various machine learning methods to solve this problem in a system of personalized recommendations is compared, including a support vector method, a decision tree, a random forest, AdaBoost, a k-nearest neighbors algorithm, and a multi-layer perceptron. The described traditional methods of machine learning are also compared with the method proposed herein and based on an estimate calculation algorithm. The efficiency and the effectiveness of the proposed method are confirmed in the work.


Animals ◽  
2020 ◽  
Vol 10 (5) ◽  
pp. 771
Author(s):  
Toshiya Arakawa

Mammalian behavior is typically monitored by observation. However, direct observation requires a substantial amount of effort and time, if the number of mammals to be observed is sufficiently large or if the observation is conducted for a prolonged period. In this study, machine learning methods as hidden Markov models (HMMs), random forests, support vector machines (SVMs), and neural networks, were applied to detect and estimate whether a goat is in estrus based on the goat’s behavior; thus, the adequacy of the method was verified. Goat’s tracking data was obtained using a video tracking system and used to estimate whether they, which are in “estrus” or “non-estrus”, were in either states: “approaching the male”, or “standing near the male”. Totally, the PC of random forest seems to be the highest. However, The percentage concordance (PC) value besides the goats whose data were used for training data sets is relatively low. It is suggested that random forest tend to over-fit to training data. Besides random forest, the PC of HMMs and SVMs is high. However, considering the calculation time and HMM’s advantage in that it is a time series model, HMM is better method. The PC of neural network is totally low, however, if the more goat’s data were acquired, neural network would be an adequate method for estimation.


2019 ◽  
Vol 22 (03) ◽  
pp. 1950021 ◽  
Author(s):  
Huei-Wen Teng ◽  
Michael Lee

Machine learning has successful applications in credit risk management, portfolio management, automatic trading, and fraud detection, to name a few, in the domain of finance technology. Reformulating and solving these topics adequately and accurately is problem specific and challenging along with the availability of complex and voluminous data. In credit risk management, one major problem is to predict the default of credit card holders using real dataset. We review five machine learning methods: the [Formula: see text]-nearest neighbors decision trees, boosting, support vector machine, and neural networks, and apply them to the above problem. In addition, we give explicit Python scripts to conduct analysis using a dataset of 29,999 instances with 23 features collected from a major bank in Taiwan, downloadable in the UC Irvine Machine Learning Repository. We show that the decision tree performs best among others in terms of validation curves.


2020 ◽  
Vol 18 (1) ◽  
Author(s):  
Kerry E. Poppenberg ◽  
Vincent M. Tutino ◽  
Lu Li ◽  
Muhammad Waqas ◽  
Armond June ◽  
...  

Abstract Background Intracranial aneurysms (IAs) are dangerous because of their potential to rupture. We previously found significant RNA expression differences in circulating neutrophils between patients with and without unruptured IAs and trained machine learning models to predict presence of IA using 40 neutrophil transcriptomes. Here, we aim to develop a predictive model for unruptured IA using neutrophil transcriptomes from a larger population and more robust machine learning methods. Methods Neutrophil RNA extracted from the blood of 134 patients (55 with IA, 79 IA-free controls) was subjected to next-generation RNA sequencing. In a randomly-selected training cohort (n = 94), the Least Absolute Shrinkage and Selection Operator (LASSO) selected transcripts, from which we constructed prediction models via 4 well-established supervised machine-learning algorithms (K-Nearest Neighbors, Random Forest, and Support Vector Machines with Gaussian and cubic kernels). We tested the models in the remaining samples (n = 40) and assessed model performance by receiver-operating-characteristic (ROC) curves. Real-time quantitative polymerase chain reaction (RT-qPCR) of 9 IA-associated genes was used to verify gene expression in a subset of 49 neutrophil RNA samples. We also examined the potential influence of demographics and comorbidities on model prediction. Results Feature selection using LASSO in the training cohort identified 37 IA-associated transcripts. Models trained using these transcripts had a maximum accuracy of 90% in the testing cohort. The testing performance across all methods had an average area under ROC curve (AUC) = 0.97, an improvement over our previous models. The Random Forest model performed best across both training and testing cohorts. RT-qPCR confirmed expression differences in 7 of 9 genes tested. Gene ontology and IPA network analyses performed on the 37 model genes reflected dysregulated inflammation, cell signaling, and apoptosis processes. In our data, demographics and comorbidities did not affect model performance. Conclusions We improved upon our previous IA prediction models based on circulating neutrophil transcriptomes by increasing sample size and by implementing LASSO and more robust machine learning methods. Future studies are needed to validate these models in larger cohorts and further investigate effect of covariates.


PeerJ ◽  
2020 ◽  
Vol 8 ◽  
pp. e8764 ◽  
Author(s):  
Siroj Bakoev ◽  
Lyubov Getmantseva ◽  
Maria Kolosova ◽  
Olga Kostyunina ◽  
Duane R. Chartier ◽  
...  

Industrial pig farming is associated with negative technological pressure on the bodies of pigs. Leg weakness and lameness are the sources of significant economic loss in raising pigs. Therefore, it is important to identify the predictors of limb condition. This work presents assessments of the state of limbs using indicators of growth and meat characteristics of pigs based on machine learning algorithms. We have evaluated and compared the accuracy of prediction for nine ML classification algorithms (Random Forest, K-Nearest Neighbors, Artificial Neural Networks, C50Tree, Support Vector Machines, Naive Bayes, Generalized Linear Models, Boost, and Linear Discriminant Analysis) and have identified the Random Forest and K-Nearest Neighbors as the best-performing algorithms for predicting pig leg weakness using a small set of simple measurements that can be taken at an early stage of animal development. Measurements of Muscle Thickness, Back Fat amount, and Average Daily Gain were found to be significant predictors of the conformation of pig limbs. Our work demonstrates the utility and relative ease of using machine learning algorithms to assess the state of limbs in pigs based on growth rate and meat characteristics.


Energies ◽  
2021 ◽  
Vol 14 (18) ◽  
pp. 5782
Author(s):  
Dimitrios Mouchtaris ◽  
Emmanouil Sofianos ◽  
Periklis Gogas ◽  
Theophilos Papadimitriou

The ability to accurately forecast the spot price of natural gas benefits stakeholders and is a valuable tool for all market participants in the competitive gas market. In this paper, we attempt to forecast the natural gas spot price 1, 3, 5, and 10 days ahead using machine learning methods: support vector machines (SVM), regression trees, linear regression, Gaussian process regression (GPR), and ensemble of trees. These models are trained with a set of 21 explanatory variables in a 5-fold cross-validation scheme with 90% of the dataset used for training and the remaining 10% used for testing the out-of-sample generalization ability. The results show that these machine learning methods all have different forecasting accuracy for every time frame when it comes to forecasting natural gas spot prices. However, the bagged trees (belonging to the ensemble of trees method) and the linear SVM models have superior forecasting performance compared to the rest of the models.


2021 ◽  
Vol 12 (1) ◽  
pp. 26-33
Author(s):  
Stephen Chiang ◽  
Matthew Eschbach ◽  
Robert Knapp ◽  
Brian Holden ◽  
Andrew Miesse ◽  
...  

Abstract The incorporation of sensors onto the stapling platform has been investigated to overcome the disconnect in our understanding of tissue handling by surgical staplers. The goal of this study was to explore the feasibility of in vivo porcine tissue differentiation using bioimpedance data and machine learning methods. In vivo electrical impedance measurements were obtained in 7 young domestic pigs, using a logarithmic sweep of 50 points over a frequency range of 100 Hz to 1 MHz. Tissues studied included lung, liver, small bowel, colon, and stomach, which was further segmented into fundus, body, and antrum. The data was then parsed through MATLAB's classification learner to identify the best algorithm for tissue type differentiation. The most effective classification scheme was found to be cubic support vector machines with 86.96% accuracy. When fundus, body and antrum were aggregated together as stomach, the accuracy improved to 88.03%. The combination of stomach, small bowel, and colon together as GI tract improved accuracy to 99.79% using fine k nearest neighbors. The results suggest that bioimpedance data can be effectively used to differentiate tissue types in vivo. This study is one of the first that combines in vivo bioimpedance tissue data across multiple tissue types with machine learning methods.


Author(s):  
Furkan Bilek ◽  
Ferhat Balgetir ◽  
Caner Feyzi Demir ◽  
Gökhan Alkan ◽  
Seda Arslan-Tuncer

Abstract Background and Objective Multiple sclerosis (MS) is a chronic, progressive, and autoimmune disease of the central nervous system (CNS) characterized by inflammation, demyelination, and axonal injury. In patients with newly diagnosed MS (ndMS), ataxia can present either as mild or severe and can be difficult to diagnose in the absence of clinical disability. Such difficulties can be eliminated by using decision support systems supported by machine learning methods. The present study aimed to achieve early diagnosis of ataxia in ndMS patients by using machine learning methods with spatiotemporal parameters. Materials and Methods The prospective study included 32 ndMS patients with an Expanded Disability Status Scale (EDSS) score of≤2.0 and 32 healthy volunteers. A total of 14 parameters were elicited by using a Win-Track platform. The ndMS patients were differentiated from healthy individuals using multiple classifiers including Artificial Neural Network (ANN), Support Vector Machine (SVM), the k-nearest neighbors (K-NN) algorithm, and Decision Tree Learning (DTL). To improve the performance of the classification, a Relief-based feature selection algorithm was applied to select the subset that best represented the whole dataset. Performance evaluation was achieved based on several criteria such as Accuracy (ACC), Sensitivity (SN), Specificity (SP), and Precision (PREC). Results ANN had a higher classification performance compared to other classifiers, whereby it provided an accuracy, sensitivity, and specificity of 89, 87.8, 90.3% with the use of all parameters and provided the values of 93.7, 96.6%, and 91.1% with the use of parameters selected by the Relief algorithm, respectively. Significance To our knowledge, this is the first study of its kind in the literature to investigate the diagnosis of ataxia in ndMS patients by using machine learning methods with spatiotemporal parameters. The proposed method, i. e. Relief-based ANN method, successfully diagnosed ataxia by using a lower number of parameters compared to the numbers of parameters reported in clinical studies, thereby reducing the costs and increasing the performance of the diagnosis. The method also provided higher rates of accuracy, sensitivity, and specificity in the diagnosis of ataxia in ndMS patients compared to other methods. Taken together, these findings indicate that the proposed method could be helpful in the diagnosis of ataxia in minimally impaired ndMS patients and could be a pathfinder for future studies.


2018 ◽  
Vol 8 (10) ◽  
pp. 1927 ◽  
Author(s):  
Zuzana Dankovičová ◽  
Dávid Sovák ◽  
Peter Drotár ◽  
Liberios Vokorokos

This paper addresses the processing of speech data and their utilization in a decision support system. The main aim of this work is to utilize machine learning methods to recognize pathological speech, particularly dysphonia. We extracted 1560 speech features and used these to train the classification model. As classifiers, three state-of-the-art methods were used: K-nearest neighbors, random forests, and support vector machine. We analyzed the performance of classifiers with and without gender taken into account. The experimental results showed that it is possible to recognize pathological speech with as high as a 91.3% classification accuracy.


Sign in / Sign up

Export Citation Format

Share Document