Prediction of Relative Humidity in a High Elevated Basin of Western Karakoram by Using Different Machine Learning Models

Mapping Intimacies ◽

10.5772/intechopen.98226 ◽

2021 ◽

Author(s):

Muhammad Adnan ◽

Rana Muhammad Adnan ◽

Shiyin Liu ◽

Muhammad Saifullah ◽

Yasir Latif ◽

...

Keyword(s):

Machine Learning ◽

Relative Humidity ◽

Global Climate ◽

Absolute Error ◽

Coefficient Of Determination ◽

Future Research ◽

Learning Tools ◽

Mars Model ◽

Testing Stage ◽

Adaptive Regression

Accurate and reliable prediction of relative humidity is of great importance in all fields concerning global climate change. The current study has employed Multivariate Adaptive Regression Spline (MARS) and M5 Tree (M5T) models to predict the relative humidity in the Hunza River basin, Pakistan. Both the models provided the best prediction for the input scenario S6 (RHt-1, RHt-2, RHt-3, Tt-1, Tt-2, Tt-3). The statistical analysis displayed that the MARS model provided a better prediction of relative humidity as compared to M5T at all meteorological stations, especially, at Ziarat followed by Khunjerab and Naltar. The values of root mean square error (RMSE), mean absolute error (MAE), and coefficient of determination (R2) were (5.98%, 5.43%, and 0.808) for Khunjerab; (6.58%, 5.08%, and 0.806) for Naltar; and (5.86%, 4.97%, 0.815) for Ziarat during the testing of MARS model whereas, the values were (6.14%, 5.56%, and 0.772) for Khunjerab; (6.19%, 5.58% and 0.762) for Naltar and (6.08%, 5.46%, 0.783) for Ziarat during the testing of M5T model. Both the models performed slightly better in training as compared to the testing stage. The current study encourages future research to be conducted at high altitude basins for the prediction of other meteorological variables using machine learning tools.

Download Full-text

Forecasting COVID 19 Confirmed Cases Using Machine Learning: the Case of America

10.20944/preprints202009.0228.v1 ◽

2020 ◽

Author(s):

Mario Jojoa ◽

Begoña Garcia-Zapirain

Keyword(s):

United States ◽

Machine Learning ◽

Open Data ◽

Absolute Error ◽

The United States ◽

Percentage Error ◽

Support Vector ◽

Learning Tools ◽

The European Union ◽

Testing Stage

This paper presents a Multilayer Perceptron and Support Vector Machine algorithms approach to predict the number of COVID19 infections in different countries of America. It intends to serve as a tool for decision-making and tackling the pandemic that the world is currently facing. The models were trained and tested using open data from the European Union repository where a time series of confirmed contagious cases was modeled until May 25, 2020. The hyperparameters as number of neurons per layer were set up using a tabu list algorithm. The countries selected to carry out the study were Brazil, Chile, Colombia, Mexico, Peru and the United States. The metrics used are Pearson's correlation coefficient (CP), Mean Absolute Error (MAE), and Mean Percentage Error (MPE). For the testing stage we obtained the following results: Brazil, CP=0.65, MAE=2508 and MPE=17%; Chile, CP=0.64, MAE=504, MPE=16%; Colombia, CP=0.83, MAE=76, MPE=9%; Mexico, CP=0.77, MAE=231, MPE=9%; Peru, CP=0.76, MAE=686, MPE=18% and the United States of America, CP=0.93, MAE=799, MPE=4%. This resulted in powerful machine learning tools although it is necessary to use specific algorithms depending on the data and the stage of the country’s pandemic.

Download Full-text

Machine learning and Grad-Cam based vascular aging assessment using photoplethysmogram (Preprint)

10.2196/preprints.31709 ◽

2021 ◽

Author(s):

Hangsik Shin

Keyword(s):

Machine Learning ◽

Correlation Coefficient ◽

Age Estimation ◽

Mean Squared Error ◽

Mean Absolute Error ◽

Absolute Error ◽

Coefficient Of Determination ◽

Vascular Aging ◽

Squared Error ◽

Vascular Age

BACKGROUND Arterial stiffness due to vascular aging is a major indicator for evaluating cardiovascular risk. OBJECTIVE In this study, we propose a method of estimating age by applying machine learning to photoplethysmogram for non-invasive vascular age assessment. METHODS The machine learning-based age estimation model that consists of three convolutional layers and two-layer fully connected layers, was developed using segmented photoplethysmogram by pulse from a total of 752 adults aged 19–87 years. The performance of the developed model was quantitatively evaluated using mean absolute error, root-mean-squared-error, Pearson’s correlation coefficient, coefficient of determination. The Grad-Cam was used to explain the contribution of photoplethysmogram waveform characteristic in vascular age estimation. RESULTS Mean absolute error of 8.03, root mean squared error of 9.96, 0.62 of correlation coefficient, and 0.38 of coefficient of determination were shown through 10-fold cross validation. Grad-Cam, used to determine the weight that the input signal contributes to the result, confirmed that the contribution to the age estimation of the photoplethysmogram segment was high around the systolic peak. CONCLUSIONS The machine learning-based vascular aging analysis method using the PPG waveform showed comparable or superior performance compared to previous studies without complex feature detection in evaluating vascular aging. CLINICALTRIAL 2015-0104

Download Full-text

Optimization of probiotic therapeutics using machine learning in an artificial human gastrointestinal tract

Scientific Reports ◽

10.1038/s41598-020-79947-y ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Susan Westfall ◽

Francesca Carracci ◽

Molly Estill ◽

Danyue Zhao ◽

Qing-li Wu ◽

...

Keyword(s):

Machine Learning ◽

Gastrointestinal Tract ◽

Metabolic Activity ◽

Therapeutic Potential ◽

Multivariate Adaptive Regression Splines ◽

Bioactive Metabolites ◽

Learning Tools ◽

Human Gastrointestinal Tract ◽

Therapeutic Benefits ◽

Adaptive Regression

AbstractThe gut microbiota’s metabolome is composed of bioactive metabolites that confer disease resilience. Probiotics’ therapeutic potential hinges on their metabolome altering ability; however, characterizing probiotics’ metabolic activity remains a formidable task. In order to solve this problem, an artificial model of the human gastrointestinal tract is introduced coined the ABIOME (A Bioreactor Imitation of the Microbiota Environment) and used to predict probiotic formulations’ metabolic activity and hence therapeutic potential with machine learning tools. The ABIOME is a modular yet dynamic system with real-time monitoring of gastrointestinal conditions that support complex cultures representative of the human microbiota and its metabolome. The fecal-inoculated ABIOME was supplemented with a polyphenol-rich prebiotic and combinations of novel probiotics that altered the output of bioactive metabolites previously shown to invoke anti-inflammatory effects. To dissect the synergistic interactions between exogenous probiotics and the autochthonous microbiota a multivariate adaptive regression splines (MARS) model was implemented towards the development of optimized probiotic combinations with therapeutic benefits. Using this algorithm, several probiotic combinations were identified that stimulated synergistic production of bioavailable metabolites, each with a different therapeutic capacity. Based on these results, the ABIOME in combination with the MARS algorithm could be used to create probiotic formulations with specific therapeutic applications based on their signature metabolic activity.

Download Full-text

Mining Health-Related Issues in Consumer Product Reviews by Using Scalable Text Analytics

Biomedical Informatics Insights ◽

10.4137/bii.s37791 ◽

2016 ◽

Vol 8s1 ◽

pp. BII.S37791 ◽

Cited By ~ 5

Author(s):

Manabu Torii ◽

Sameer S. Tilak ◽

Son Doan ◽

Daniel S. Zisook ◽

Jung-wei Fan

Keyword(s):

Machine Learning ◽

Language Processing ◽

Consumer Product ◽

Future Research ◽

Product Reviews ◽

Learning Tools ◽

Text Analytics ◽

Related Information ◽

Online Product Reviews ◽

Health Related

In an era when most of our life activities are digitized and recorded, opportunities abound to gain insights about population health. Online product reviews present a unique data source that is currently underexplored. Health-related information, although scarce, can be systematically mined in online product reviews. Leveraging natural language processing and machine learning tools, we were able to mine 1.3 million grocery product reviews for health-related information. The objectives of the study were as follows: (1) conduct quantitative and qualitative analysis on the types of health issues found in consumer product reviews; (2) develop a machine learning classifier to detect reviews that contain health-related issues; and (3) gain insights about the task characteristics and challenges for text analytics to guide future research.

Download Full-text

Machine Learning Modelling of the Relationship between Weather and Paddy Yield in Sri Lanka

Journal of Mathematics ◽

10.1155/2021/9941899 ◽

2021 ◽

Vol 2021 ◽

pp. 1-14

Author(s):

Piyal Ekanayake ◽

Windhya Rankothge ◽

Rukmal Weliwatta ◽

Jeevani W. Jayasinghe

Keyword(s):

Machine Learning ◽

Sri Lanka ◽

Relative Humidity ◽

Mean Squared Error ◽

Absolute Error ◽

Maximum Temperature ◽

Percentage Error ◽

Pairwise Correlation ◽

Geographical Regions ◽

Paddy Yield

This paper presents the development of crop-weather models for the paddy yield in Sri Lanka based on nine weather indices, namely, rainfall, relative humidity (minimum and maximum), temperature (minimum and maximum), wind speed (morning and evening), evaporation, and sunshine hours. The statistics of seven geographical regions, which contribute to about two-thirds of the country’s total paddy production, were used for this study. The significance of the weather indices on the paddy yield was explored by employing Random Forest (RF) and the variable importance of each of them was determined. Pearson’s correlation and Spearman’s correlation were used to identify the behavior of correlation in a positive or negative direction. Further, the pairwise correlation among the weather indices was examined. The results indicate that the minimum relative humidity and the maximum temperature during the paddy cultivation period are the most influential weather indices. Moreover, RF was used to develop a paddy yield prediction model and four more techniques, namely, Power Regression (PR), Multiple Linear Regression (MLR) with stepwise selection, forward (step-up) selection, and backward (step-down) elimination, were used to benchmark the performance of the machine learning technique. Their performances were compared in terms of the Root Mean Squared Error (RMSE), Correlation Coefficient (R), Mean Absolute Error (MAE), and the Mean Absolute Percentage Error (MAPE). As per the results, RF is a reliable and accurate model for the prediction of paddy yield in Sri Lanka, demonstrating a very high R of 0.99 and the least MAPE of 1.4%.

Download Full-text

The role of ambient parameters on transmission rates of the COVID-19 outbreak: A machine learning model

Work ◽

10.3233/wor-210463 ◽

2021 ◽

pp. 1-9

Author(s):

Amir Jamshidnezhad ◽

Seyed Ahmad Hosseini ◽

Leila Ibrahimi Ghavamabadi ◽

Seyed Mahdi Hossaeini Marashi ◽

Hediye Mousavi ◽

...

Keyword(s):

Machine Learning ◽

Climatic Factors ◽

Ambient Air ◽

Coefficient Of Determination ◽

Ann Model ◽

Machine Learning Model ◽

Testing Stage ◽

Artificial Neural Network Ann ◽

Air Conditioning Systems

BACKGROUND: In recent years the relationship between ambient air temperature and the prevalence of viral infection has been under investigation. OBJECTIVE: The study was aimed at providing the statistical and machine learning-based analysis to investigate the influence of climatic factors on frequency of COVID-19 confirmed cases in Iran. METHOD: The data of confirmed cases of COVID-19 and some climatic factors related to 31 provinces of Iran between 04/03/2020 and 05/05/2020 was gathered from official resources. In order to investigate the important climatic factors on the frequency of confirmed cases of COVID-19 in all studied cities, a model based on an artificial neural network (ANN) was developed. RESULTS: The proposed ANN model showed accuracy rates of 87.25%and 86.4%in the training and testing stage, respectively, for classification of COVID-19 confirmed cases. The results showed that in the city of Ahvaz, despite the increase in temperature, the coefficient of determination R2 has been increasing. CONCLUSION: This study clearly showed that, with increasing outdoor temperature, the use of air conditioning systems to set a comfort zone temperature is unavoidable. Thus, the number of positive cases of COVID-19 increases. Also, this study shows the role of closed-air cycle condition in the indoor environment of tropical cities.

Download Full-text

The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation

PeerJ Computer Science ◽

10.7717/peerj-cs.623 ◽

2021 ◽

Vol 7 ◽

pp. e623

Author(s):

Davide Chicco ◽

Matthijs J. Warrens ◽

Giuseppe Jurman

Keyword(s):

Machine Learning ◽

Regression Analysis ◽

Binary Classification ◽

Ground Truth ◽

Absolute Error ◽

Supervised Machine Learning ◽

Coefficient Of Determination ◽

Target Range ◽

Percentage Error ◽

The Mean

Regression analysis makes up a large part of supervised machine learning, and consists of the prediction of a continuous independent target from a set of other predictor variables. The difference between binary classification and regression is in the target range: in binary classification, the target can have only two values (usually encoded as 0 and 1), while in regression the target can have multiple values. Even if regression analysis has been employed in a huge number of machine learning studies, no consensus has been reached on a single, unified, standard metric to assess the results of the regression itself. Many studies employ the mean square error (MSE) and its rooted variant (RMSE), or the mean absolute error (MAE) and its percentage variant (MAPE). Although useful, these rates share a common drawback: since their values can range between zero and +infinity, a single value of them does not say much about the performance of the regression with respect to the distribution of the ground truth elements. In this study, we focus on two rates that actually generate a high score only if the majority of the elements of a ground truth group has been correctly predicted: the coefficient of determination (also known as R-squared or R2) and the symmetric mean absolute percentage error (SMAPE). After showing their mathematical properties, we report a comparison between R2 and SMAPE in several use cases and in two real medical scenarios. Our results demonstrate that the coefficient of determination (R-squared) is more informative and truthful than SMAPE, and does not have the interpretability limitations of MSE, RMSE, MAE and MAPE. We therefore suggest the usage of R-squared as standard metric to evaluate regression analyses in any scientific domain.

Download Full-text

Using Machine Learning to Estimate Concentrations of Non-Targeted Chemicals Without Analytical Standards

10.26434/chemrxiv.10263296 ◽

2019 ◽

Author(s):

Dimitri Abrahamsson ◽

June-Soo Park ◽

Marina Sirota ◽

Tracey Woodruff

Keyword(s):

Neural Network ◽

Machine Learning ◽

Mean Absolute Error ◽

Absolute Error ◽

Coefficient Of Determination ◽

Ionization Mass ◽

Response Factors ◽

Relative Response ◽

Artificial Neural Network Ann ◽

Analytical Standards

We developed two in silico quantification methods for chemicals analyzed with capillary electrophoresis electrospray ionization-mass spectrometry (CE-ESI-MS) using machine learning - a random forest (RF) and an artificial neural network (ANN). The algorithms can be used to predict chemical concentrations based on the chemicals’ relative response factors (RRFs) and their physicochemical properties. The RF and ANN predicted the measured concentrations with a mean absolute error of 0.2 log units and a coefficient of determination (R2) of about 0.85 for the testing set.

Download Full-text

The Role of Ambient parameters on Transmission Rates of the COVID-19 Outbreak: A machine learning model

10.21203/rs.3.rs-273519/v1 ◽

2021 ◽

Author(s):

Amir Jamshidnezhad ◽

Seyed Ahmad Hosseini ◽

Seyed Mahdi Hossaeini Marashic ◽

Leila Ibrahimi Ghavamabadi ◽

Hediye Mousavi ◽

...

Keyword(s):

Machine Learning ◽

Climatic Factors ◽

Ambient Air ◽

Coefficient Of Determination ◽

Ann Model ◽

Machine Learning Model ◽

Testing Stage ◽

Artificial Neural Network Ann ◽

Air Conditioning Systems

Abstract The relation between ambient air temperature and prevalence of viral infection has been under investigation in recent years. The study was aimed at providing the statistical and machine learning-based analysis to investigate the influence of climatic factors on frequency of COVID-19 confirmed cases in Iran. The data of confirmed cases of COVID-19 and some climatic factors related to 31 provinces of Iran during 04/03/2020 to 05/05/2020 was gathered from the official resources. In order to investigate the important climatic factors on the frequency of confirmed cases of COVID-19 in all studied cities, a model based on an artificial neural network (ANN) was developed. The proposed ANN model showed accuracy rates of 87.25% and 86.4% in the training and testing stage, respectively, for classification of COVID-19 confirmed cases. The results showed that in the city of Ahvaz, despite the increase in temperature, the coefficient of determination R2 has been increasing. This study clearly showed that, with increasing outdoor temperature, the use of air conditioning systems to set a comfort zone temperature is unavoidable; thus, the number of positive cases of COVID-19 increases. Also, this study shows the role of closed-air cycle condition in the indoor environment of tropical cities.

Download Full-text

Estimation of regression-based model with bulk noisy data

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v9i5.pp3649-3656 ◽

2019 ◽

Vol 9 (5) ◽

pp. 3649

Author(s):

Chanintorn Jittawiriyanukoon

Keyword(s):

Machine Learning ◽

Signal To Noise Ratio ◽

Noisy Data ◽

Absolute Error ◽

Average Error ◽

Practical Case ◽

Mean Square ◽

Learning Tools ◽

Signal To Noise ◽

Practical Applications

<span>The bulk noise has been provoking a contributed data due to a communication network with a tremendously low signal to noise ratio. An appreciated method for revising massive noise of individuals through information theory is widely discussed. One of the practical applications of this approach for bulk noise estimation is analyzed using intelligent automation and machine learning tools, dealing the case of bulk noise existence or nonexistence. A regression-based model is employed for the investigation and experiment. Estimation for the practical case with bulk noisy datasets is proposed. The proposed method applies slice-and-dice technique to partition a body of datasets down into slighter portions so that it can be carried out. The average error, correlation, absolute error and mean square error are computed to validate the estimation. Results from massive online analysis will be verified with data collected in the following period. In many cases, the prediction with bulk noisy data through MOA simulation reveals Random Imputation minimizes the average error.</span>

Download Full-text