Development and Application of a Genetic Algorithm for Variable Optimization and Predictive Modeling of Five-Year Mortality Using Questionnaire Data

Bioinformatics and Biology Insights ◽

10.4137/bbi.s29469 ◽

2015 ◽

Vol 9s3 ◽

pp. BBI.S29469 ◽

Cited By ~ 6

Author(s):

Lucas J. Adams ◽

Ghalib Bello ◽

Gerard G. Dumancas

Keyword(s):

Machine Learning ◽

Genetic Algorithm ◽

Predictive Modeling ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Questionnaire Data ◽

Special Equipment ◽

Learning Techniques ◽

Specific Outcome ◽

Performance Area

The problem of selecting important variables for predictive modeling of a specific outcome of interest using questionnaire data has rarely been addressed in clinical settings. In this study, we implemented a genetic algorithm (GA) technique to select optimal variables from questionnaire data for predicting a five-year mortality. We examined 123 questions (variables) answered by 5,444 individuals in the National Health and Nutrition Examination Survey. The GA iterations selected the top 24 variables, including questions related to stroke, emphysema, and general health problems requiring the use of special equipment, for use in predictive modeling by various parametric and nonparametric machine learning techniques. Using these top 24 variables, gradient boosting yielded the nominally highest performance (area under curve [AUC] = 0.7654), although there were other techniques with lower but not significantly different AUC. This study shows how GA in conjunction with various machine learning techniques could be used to examine questionnaire data to predict a binary outcome.

Download Full-text

A Gradient Boosting Model Optimized by a Genetic Algorithm for Short-term Riverflow Forecast

Revista Mundi Engenharia Tecnologia e Gestão (ISSN 2525-4782) ◽

10.21575/25254782rmetg2019vol4n3845 ◽

2019 ◽

Vol 4 (3) ◽

Author(s):

Tales Lima Fonseca ◽

Yulia Gorodetskaya ◽

Gisele Goulart Tavares ◽

Celso Bandeira de Melo Ribeiro ◽

Leonardo Goliatt da Fonseca

Keyword(s):

Machine Learning ◽

Genetic Algorithm ◽

Predictive Performance ◽

Energy Generation ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Short Term ◽

Learning Techniques ◽

Hydrometeorological Parameters ◽

Paraíba Do Sul River

The short-term streamflow forecast is an important parameter in studies related to energy generation and the prediction of possible floods. Flowing through three Brazilian states, the Paraíba do Sul river is responsible for the supply and energy generation in several municipalities. Machine learning techniques have been studied with the aim of improving these predictions through the use of hydrological and hydrometeorological parameters. Furthermore, the predictive performance of the machine learning techniques are directly related to the quality of the training base and, moreover, to the set of hyperparameters used. The present study explores the combination of the Gradient Boosting technique coupled with a Genetic Algorithm to found the best set of hyperparameter to maximize the predicting performance of the Paraíba do Sul river streamflow.

Download Full-text

Feasibility of Machine Learning Algorithms for Predicting the Deformation of Anodic Titanium Films by Modulating Anodization Processes

Materials ◽

10.3390/ma14051089 ◽

2021 ◽

Vol 14 (5) ◽

pp. 1089

Author(s):

Sung-Hee Kim ◽

Chanyoung Jeong

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Multiclass Classification ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Smart Manufacturing ◽

Gradient Boosting ◽

Experimental Conditions ◽

Learning Techniques ◽

Tio2 Nanostructures

This study aims to demonstrate the feasibility of applying eight machine learning algorithms to predict the classification of the surface characteristics of titanium oxide (TiO2) nanostructures with different anodization processes. We produced a total of 100 samples, and we assessed changes in TiO2 nanostructures’ thicknesses by performing anodization. We successfully grew TiO2 films with different thicknesses by one-step anodization in ethylene glycol containing NH4F and H2O at applied voltage differences ranging from 10 V to 100 V at various anodization durations. We found that the thicknesses of TiO2 nanostructures are dependent on anodization voltages under time differences. Therefore, we tested the feasibility of applying machine learning algorithms to predict the deformation of TiO2. As the characteristics of TiO2 changed based on the different experimental conditions, we classified its surface pore structure into two categories and four groups. For the classification based on granularity, we assessed layer creation, roughness, pore creation, and pore height. We applied eight machine learning techniques to predict classification for binary and multiclass classification. For binary classification, random forest and gradient boosting algorithm had relatively high performance. However, all eight algorithms had scores higher than 0.93, which signifies high prediction on estimating the presence of pore. In contrast, decision tree and three ensemble methods had a relatively higher performance for multiclass classification, with an accuracy rate greater than 0.79. The weakest algorithm used was k-nearest neighbors for both binary and multiclass classifications. We believe that these results show that we can apply machine learning techniques to predict surface quality improvement, leading to smart manufacturing technology to better control color appearance, super-hydrophobicity, super-hydrophilicity or batter efficiency.

Download Full-text

Predictors of outpatients’ no-show: big data analytics using apache spark

Journal Of Big Data ◽

10.1186/s40537-020-00384-9 ◽

2020 ◽

Vol 7 (1) ◽

Author(s):

Tahani Daghistani ◽

Huda AlGhamdi ◽

Riyad Alshammari ◽

Raed H. AlHazme

Keyword(s):

Machine Learning ◽

Big Data ◽

Negative Impact ◽

Big Data Analytics ◽

Quality Of Healthcare ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Healthcare Organizations ◽

Data Framework ◽

Learning Techniques

AbstractOutpatients who fail to attend their appointments have a negative impact on the healthcare outcome. Thus, healthcare organizations facing new opportunities, one of them is to improve the quality of healthcare. The main challenges is predictive analysis using techniques capable of handle the huge data generated. We propose a big data framework for identifying subject outpatients’ no-show via feature engineering and machine learning (MLlib) in the Spark platform. This study evaluates the performance of five machine learning techniques, using the (2,011,813‬) outpatients’ visits data. Conducting several experiments and using different validation methods, the Gradient Boosting (GB) performed best, resulting in an increase of accuracy and ROC to 79% and 81%, respectively. In addition, we showed that exploring and evaluating the performance of the machine learning models using various evaluation methods is critical as the accuracy of prediction can significantly differ. The aim of this paper is exploring factors that affect no-show rate and can be used to formulate predictions using big data machine learning techniques.

Download Full-text

Learning from Imbalanced Educational Data Using Ensemble Machine Learning Algorithms

Webology ◽

10.14704/web/v18si01/web18053 ◽

2021 ◽

Vol 18 (Special Issue 01) ◽

pp. 183-195

Author(s):

Thingbaijam Lenin ◽

N. Chandrasekaran

Keyword(s):

Machine Learning ◽

Random Forest ◽

Missing Values ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Adaptive Boosting ◽

Stochastic Gradient Boosting ◽

Ensemble Machine Learning ◽

Learning Techniques ◽

Student’S Performance

Student’s academic performance is one of the most important parameters for evaluating the standard of any institute. It has become a paramount importance for any institute to identify the student at risk of underperforming or failing or even drop out from the course. Machine Learning techniques may be used to develop a model for predicting student’s performance as early as at the time of admission. The task however is challenging as the educational data required to explore for modelling are usually imbalanced. We explore ensemble machine learning techniques namely bagging algorithm like random forest (rf) and boosting algorithms like adaptive boosting (adaboost), stochastic gradient boosting (gbm), extreme gradient boosting (xgbTree) in an attempt to develop a model for predicting the student’s performance of a private university at Meghalaya using three categories of data namely demographic, prior academic record, personality. The collected data are found to be highly imbalanced and also consists of missing values. We employ k-nearest neighbor (knn) data imputation technique to tackle the missing values. The models are developed on the imputed data with 10 fold cross validation technique and are evaluated using precision, specificity, recall, kappa metrics. As the data are imbalanced, we avoid using accuracy as the metrics of evaluating the model and instead use balanced accuracy and F-score. We compare the ensemble technique with single classifier C4.5. The best result is provided by random forest and adaboost with F-score of 66.67%, balanced accuracy of 75%, and accuracy of 96.94%.

Download Full-text

Statistical and Machine Learning-Driven Optimization of Mechanical Properties in Designing Durable HDPE Nanobiocomposites

Polymers ◽

10.3390/polym13183100 ◽

2021 ◽

Vol 13 (18) ◽

pp. 3100

Author(s):

Anusha Mairpady ◽

Abdel-Hamid I. Mourad ◽

Mohammad Sayem Mozumder

Keyword(s):

Machine Learning ◽

Mechanical Properties ◽

Genetic Algorithm ◽

Machine Learning Techniques ◽

Hybrid Technique ◽

Learning Techniques ◽

Artificial Neural Network Ann ◽

Major Factors ◽

The Cost ◽

Design Factors

The selection of nanofillers and compatibilizing agents, and their size and concentration, are always considered to be crucial in the design of durable nanobiocomposites with maximized mechanical properties (i.e., fracture strength (FS), yield strength (YS), Young’s modulus (YM), etc). Therefore, the statistical optimization of the key design factors has become extremely important to minimize the experimental runs and the cost involved. In this study, both statistical (i.e., analysis of variance (ANOVA) and response surface methodology (RSM)) and machine learning techniques (i.e., artificial intelligence-based techniques (i.e., artificial neural network (ANN) and genetic algorithm (GA)) were used to optimize the concentrations of nanofillers and compatibilizing agents of the injection-molded HDPE nanocomposites. Initially, through ANOVA, the concentrations of TiO2 and cellulose nanocrystals (CNCs) and their combinations were found to be the major factors in improving the durability of the HDPE nanocomposites. Further, the data were modeled and predicted using RSM, ANN, and their combination with a genetic algorithm (i.e., RSM-GA and ANN-GA). Later, to minimize the risk of local optimization, an ANN-GA hybrid technique was implemented in this study to optimize multiple responses, to develop the nonlinear relationship between the factors (i.e., the concentration of TiO2 and CNCs) and responses (i.e., FS, YS, and YM), with minimum error and with regression values above 95%.

Download Full-text

Metaheuristic based MLP-SARIMAX HybridizationOne Hour Ahead Solar Radiation Forecasting

10.21528/cbic2021-9 ◽

2021 ◽

Author(s):

Hugo Abreu Mendes ◽

João Fausto Lorenzato Oliveira ◽

Paulo Salgado Gomes Mattos Neto ◽

Alex Coutinho Pereira ◽

Eduardo Boudoux Jatoba ◽

...

Keyword(s):

Machine Learning ◽

Genetic Algorithm ◽

Time Series ◽

Solar Radiation ◽

Statistical Models ◽

Clean Energy ◽

Machine Learning Techniques ◽

Exogenous Variables ◽

Learning Techniques ◽

Photovoltaic Plants

Within the context of clean energy generation, solar radiation forecast is applied for photovoltaic plants to increase maintainability and reliability. Statistical models of time series like ARIMA and machine learning techniques help to improve the results. Hybrid Statistical + ML are found in all sorts of time series forecasting applications. This work presents a new way to automate the SARIMAX modeling, nesting PSO and ACO optimization algorithms, differently from R's AutoARIMA, its searches optimal seasonality parameter and combination of the exogenous variables available. This work presents 2 distinct hybrid models that have MLPs as their main elements, optimizing the architecture with Genetic Algorithm. A methodology was used to obtain the results, which were compared to LSTM, CLSTM, MMFF and NARNN-ARMAX topologies found in recent works. The obtained results for the presented models is promising for use in automatic radiation forecasting systems since it outperformed the compared models on at least two metrics.

Download Full-text

Prediction of probable backorder scenarios in the supply chain using Distributed Random Forest and Gradient Boosting Machine learning techniques

Journal Of Big Data ◽

10.1186/s40537-020-00345-2 ◽

2020 ◽

Vol 7 (1) ◽

Cited By ~ 1

Author(s):

Samiul Islam ◽

Saman Hassanzadeh Amin

Keyword(s):

Machine Learning ◽

Supply Chain ◽

Random Forest ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Learning Techniques ◽

Gradient Boosting Machine

Download Full-text

Estimating Warehouse Rental Price using Machine Learning Techniques

International Journal of Computers Communications & Control ◽

10.15837/ijccc.2018.2.3034 ◽

2018 ◽

Vol 13 (2) ◽

pp. 235-250 ◽

Cited By ~ 3

Author(s):

Yixuan Ma ◽

Zhenji Zhang ◽

Alexander Ihler ◽

Baoxiang Pan

Keyword(s):

Machine Learning ◽

Random Forest ◽

Real Estate ◽

Rapid Development ◽

Supply And Demand ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Logistics Industry ◽

Real Estate Price ◽

Learning Techniques

Boosted by the growing logistics industry and digital transformation, the sharing warehouse market is undergoing a rapid development. Both supply and demand sides in the warehouse rental business are faced with market perturbations brought by unprecedented peer competitions and information transparency. A key question faced by the participants is how to price warehouses in the open market. To understand the pricing mechanism, we built a real world warehouse dataset using data collected from the classified advertisements websites. Based on the dataset, we applied machine learning techniques to relate warehouse price with its relevant features, such as warehouse size, location and nearby real estate price. Four candidate models are used here: Linear Regression, Regression Tree, Random Forest Regression and Gradient Boosting Regression Trees. The case study in the Beijing area shows that warehouse rent is closely related to its location and land price. Models considering multiple factors have better skill in estimating warehouse rent, compared to singlefactor estimation. Additionally, tree models have better performance than the linear model, with the best model (Random Forest) achieving correlation coefficient of 0.57 in the test set. Deeper investigation of feature importance illustrates that distance from the city center plays the most important role in determining warehouse price in Beijing, followed by nearby real estate price and warehouse size.

Download Full-text

Machine learning techniques to predict daily rainfall amount

Journal Of Big Data ◽

10.1186/s40537-021-00545-4 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Chalachew Muluken Liyew ◽

Haileyesus Amsaya Melese

Keyword(s):

Machine Learning ◽

Pearson Correlation ◽

Daily Rainfall ◽

Learning Model ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Correlation Technique ◽

Learning Techniques ◽

Machine Learning Model ◽

Extreme Gradient Boosting

AbstractPredicting the amount of daily rainfall improves agricultural productivity and secures food and water supply to keep citizens healthy. To predict rainfall, several types of research have been conducted using data mining and machine learning techniques of different countries’ environmental datasets. An erratic rainfall distribution in the country affects the agriculture on which the economy of the country depends on. Wise use of rainfall water should be planned and practiced in the country to minimize the problem of the drought and flood occurred in the country. The main objective of this study is to identify the relevant atmospheric features that cause rainfall and predict the intensity of daily rainfall using machine learning techniques. The Pearson correlation technique was used to select relevant environmental variables which were used as an input for the machine learning model. The dataset was collected from the local meteorological office at Bahir Dar City, Ethiopia to measure the performance of three machine learning techniques (Multivariate Linear Regression, Random Forest, and Extreme Gradient Boost). Root mean squared error and Mean absolute Error methods were used to measure the performance of the machine learning model. The result of the study revealed that the Extreme Gradient Boosting machine learning algorithm performed better than others.

Download Full-text

Breast Cancer Prognosis Using Machine Learning Techniques and Genetic Algorithm: Experiment on Six Different Datasets

Evolutionary Computing and Mobile Sustainable Networks - Lecture Notes on Data Engineering and Communications Technologies ◽

10.1007/978-981-15-5258-8_65 ◽

2020 ◽

pp. 703-711

Author(s):

S. Jijitha ◽

Thangavel Amudha

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Genetic Algorithm ◽

Breast Cancer Prognosis ◽

Cancer Prognosis ◽

Machine Learning Techniques ◽

Learning Techniques

Download Full-text