Machine Learning for the Diagnosis of Orthodontic Extractions: A Computational Analysis Using Ensemble Learning

Yasir Suhail; Madhur Upadhyay; Aditya Chhibber;  Kshitiz

doi:10.3390/bioengineering7020055

Machine Learning for the Diagnosis of Orthodontic Extractions: A Computational Analysis Using Ensemble Learning

Bioengineering ◽

10.3390/bioengineering7020055 ◽

2020 ◽

Vol 7 (2) ◽

pp. 55

Author(s):

Yasir Suhail ◽

Madhur Upadhyay ◽

Aditya Chhibber ◽

Kshitiz

Keyword(s):

Machine Learning ◽

Human Error ◽

Ensemble Methods ◽

Treatment Decision ◽

Learning Models ◽

Error Training ◽

Treatment Plans ◽

Suitable Treatment ◽

Using Data ◽

Machine Learning Models

Extraction of teeth is an important treatment decision in orthodontic practice. An expert system that is able to arrive at suitable treatment decisions can be valuable to clinicians for verifying treatment plans, minimizing human error, training orthodontists, and improving reliability. In this work, we train a number of machine learning models for this prediction task using data for 287 patients, evaluated independently by five different orthodontists. We demonstrate why ensemble methods are particularly suited for this task. We evaluate the performance of the machine learning models and interpret the training behavior. We show that the results for our model are close to the level of agreement between different orthodontists.

Download Full-text

Data-Driven Approach for Predicting and Explaining the Risk of Long-Term Unemployment

E3S Web of Conferences ◽

10.1051/e3sconf/202021401023 ◽

2020 ◽

Vol 214 ◽

pp. 01023

Author(s):

Linan (Frank) Zhao

Keyword(s):

Machine Learning ◽

Age Groups ◽

Learning Models ◽

Public Authorities ◽

Ensemble Machine Learning ◽

European Public ◽

Data Driven Approach ◽

Using Data ◽

Machine Learning Models

Long-term unemployment has significant societal impact and is of particular concerns for policymakers with regard to economic growth and public finances. This paper constructs advanced ensemble machine learning models to predict citizens’ risks of becoming long-term unemployed using data collected from European public authorities for employment service. The proposed model achieves 81.2% accuracy on identifying citizens with high risks of long-term unemployment. This paper also examines how to dissect black-box machine learning models by offering explanations at both a local and global level using SHAP, a state-of-the-art model-agnostic approach to explain factors that contribute to long-term unemployment. Lastly, this paper addresses an under-explored question when applying machine learning in the public domain, that is, the inherent bias in model predictions. The results show that popular models such as gradient boosted trees may produce unfair predictions against senior age groups and immigrants. Overall, this paper sheds light on the recent increasing shift for governments to adopt machine learning models to profile and prioritize employment resources to reduce the detrimental effects of long-term unemployment and improve public welfare.

Download Full-text

Benchmarking of Machine Learning Models to Assist the Prognosis of Tuberculosis

10.20944/preprints202103.0284.v2 ◽

2021 ◽

Author(s):

Maicon Herverton Lino Ferreira da Silva Barros ◽

Geovanne Oliveira Alves ◽

Lubnnia Morais Florêncio Souza ◽

Élisson da Silva Rocha ◽

João Fausto Lorenzato de Oliveira ◽

...

Keyword(s):

Machine Learning ◽

Clinical Symptoms ◽

Treatment Decision ◽

Gradient Boosting ◽

Original Form ◽

Learning Models ◽

Data Set ◽

Risk Of Death ◽

Increased Risk ◽

Machine Learning Models

Tuberculosis (TB) is an airborne infectious disease caused by organisms in the Mycobacterium tuberculosis (Mtb) complex. In many low and middle-income countries, TB remains a major cause of morbidity and mortality. Once a patient has been diagnosed with TB, it is critical that healthcare workers make the most appropriate treatment decision given the individual conditions of the patient and the likely course of the disease based on medical experience. Depending on the prognosis, delayed or inappropriate treatment can result in unsatisfactory results including the exacerbation of clinical symptoms, poor quality of life, and increased risk of death. This work benchmarks machine learning models to aid TB prognosis using a Brazilian health database of confirmed cases and deaths related to TB in the State of Amazonas. The goal is to predict the probability of death by TB thus aiding the prognosis of TB and associated treatment decision making process. In its original form, the data set comprised 36,228 records and 130 fields but suffered from missing, incomplete, or incorrect data. Following data cleaning and preprocessing, a revised data set was generated comprising 24,015 records and 38 fields, including 22,876 reported cured TB patients and 1,139 deaths by TB. To explore how the data imbalance impacts model performance, two controlled experiments were designed using (1) imbalanced and (2) balanced data sets. The best result is achieved by the Gradient Boosting (GB) model using the balanced data set to predict TB-mortality, and the ensemble model composed by the Random Forest (RF), GB and Multi-layer Perceptron (MLP) models is the best model to predict the cure class.

Download Full-text

Machine-Learning Models for Multicenter Prostate Cancer Treatment Plans

Journal of Computational Biology ◽

10.1089/cmb.2020.0188 ◽

2020 ◽

Author(s):

Khajamoinuddin Syed ◽

William Sleeman ◽

Payal Soni ◽

Michael Hagan ◽

Jatinder Palta ◽

...

Keyword(s):

Prostate Cancer ◽

Machine Learning ◽

Cancer Treatment ◽

Learning Models ◽

Prostate Cancer Treatment ◽

Treatment Plans ◽

Machine Learning Models

Download Full-text

Machine learning can accurately predict pre-admission baseline hemoglobin and creatinine in intensive care patients

npj Digital Medicine ◽

10.1038/s41746-019-0192-z ◽

2019 ◽

Vol 2 (1) ◽

Cited By ~ 2

Author(s):

Antonin Dauvin ◽

Carolina Donado ◽

Patrik Bachtiger ◽

Ke-Chun Huang ◽

Christopher Martin Sauer ◽

...

Keyword(s):

Machine Learning ◽

Intensive Care Unit ◽

Intensive Care ◽

Impaired Renal Function ◽

Kidney Injury ◽

Learning Models ◽

Intensive Care Patients ◽

Using Data ◽

Machine Learning Models ◽

Baseline Hemoglobin

AbstractPatients admitted to the intensive care unit frequently have anemia and impaired renal function, but often lack historical blood results to contextualize the acuteness of these findings. Using data available within two hours of ICU admission, we developed machine learning models that accurately (AUC 0.86–0.89) classify an individual patient’s baseline hemoglobin and creatinine levels. Compared to assuming the baseline to be the same as the admission lab value, machine learning performed significantly better at classifying acute kidney injury regardless of initial creatinine value, and significantly better at predicting baseline hemoglobin value in patients with admission hemoglobin of <10 g/dl.

Download Full-text

Benchmarking Machine Learning Models to Assist in the Prognosis of Tuberculosis

Informatics ◽

10.3390/informatics8020027 ◽

2021 ◽

Vol 8 (2) ◽

pp. 27

Author(s):

Maicon Herverton Lino Ferreira da Silva Barros ◽

Geovanne Oliveira Alves ◽

Lubnnia Morais Florêncio Souza ◽

Elisson da Silva Rocha ◽

João Fausto Lorenzato de Oliveira ◽

...

Keyword(s):

Machine Learning ◽

Clinical Symptoms ◽

Treatment Decision ◽

Gradient Boosting ◽

Original Form ◽

Learning Models ◽

Data Set ◽

Risk Of Death ◽

Increased Risk ◽

Machine Learning Models

Tuberculosis (TB) is an airborne infectious disease caused by organisms in the Mycobacterium tuberculosis (Mtb) complex. In many low and middle-income countries, TB remains a major cause of morbidity and mortality. Once a patient has been diagnosed with TB, it is critical that healthcare workers make the most appropriate treatment decision given the individual conditions of the patient and the likely course of the disease based on medical experience. Depending on the prognosis, delayed or inappropriate treatment can result in unsatisfactory results including the exacerbation of clinical symptoms, poor quality of life, and increased risk of death. This work benchmarks machine learning models to aid TB prognosis using a Brazilian health database of confirmed cases and deaths related to TB in the State of Amazonas. The goal is to predict the probability of death by TB thus aiding the prognosis of TB and associated treatment decision making process. In its original form, the data set comprised 36,228 records and 130 fields but suffered from missing, incomplete, or incorrect data. Following data cleaning and preprocessing, a revised data set was generated comprising 24,015 records and 38 fields, including 22,876 reported cured TB patients and 1139 deaths by TB. To explore how the data imbalance impacts model performance, two controlled experiments were designed using (1) imbalanced and (2) balanced data sets. The best result is achieved by the Gradient Boosting (GB) model using the balanced data set to predict TB-mortality, and the ensemble model composed by the Random Forest (RF), GB and Multi-Layer Perceptron (MLP) models is the best model to predict the cure class.

Download Full-text

Prediction of Employee Attrition Using Machine Learning and Ensemble Methods

International Journal of Machine Learning and Computing ◽

10.18178/ijmlc.2021.11.2.1022 ◽

2021 ◽

Vol 11 (2) ◽

pp. 110-114

Author(s):

Aseel Qutub ◽

◽

Asmaa Al-Mehmadi ◽

Munirah Al-Hssan ◽

Ruyan Aljohani ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Professional Training ◽

Ensemble Methods ◽

Gradient Boosting ◽

Learning Models ◽

Retention Strategies ◽

Employee Attrition ◽

The Cost ◽

Machine Learning Models

Employees are the most valuable resources for any organization. The cost associated with professional training, the developed loyalty over the years and the sensitivity of some organizational positions, all make it very essential to identify who might leave the organization. Many reasons can lead to employee attrition. In this paper, several machine learning models are developed to automatically and accurately predict employee attrition. IBM attrition dataset is used in this work to train and evaluate machine learning models; namely Decision Tree, Random Forest Regressor, Logistic Regressor, Adaboost Model, and Gradient Boosting Classifier models. The ultimate goal is to accurately detect attrition to help any company to improve different retention strategies on crucial employees and boost those employee satisfactions.

Download Full-text

Shear stress distribution prediction in symmetric compound channels using data mining and machine learning models

Frontiers of Structural and Civil Engineering ◽

10.1007/s11709-020-0634-3 ◽

2020 ◽

Vol 14 (5) ◽

pp. 1097-1109

Author(s):

Zohreh Sheikh Khozani ◽

Khabat Khosravi ◽

Mohammadamin Torabi ◽

Amir Mosavi ◽

Bahram Rezaei ◽

...

Keyword(s):

Machine Learning ◽

Data Mining ◽

Shear Stress ◽

Stress Distribution ◽

Shear Stress Distribution ◽

Learning Models ◽

Compound Channels ◽

Using Data ◽

Machine Learning Models

Download Full-text

S&P 500 Stock Price Prediction Using Technical, Fundamental and Text Data

Statistics Optimization & Information Computing ◽

10.19139/soic-2310-5070-1362 ◽

2021 ◽

Vol 9 (4) ◽

pp. 769-788

Author(s):

Shan Zhong ◽

David Hitchcock

Keyword(s):

Machine Learning ◽

Stock Prices ◽

Stock Price ◽

Language Models ◽

News Item ◽

Learning Models ◽

Stock Price Prediction ◽

Price Prediction ◽

Using Data ◽

Machine Learning Models

We summarized both common and novel predictive models used for stock price prediction and combined them with technical indices, fundamental characteristics and text-based sentiment data to predict S&P stock prices. A 66.18% accuracy in S&P 500 index directional prediction and 62.09% accuracy in individual stock directional prediction was achieved by combining different machine learning models such as Random Forest and LSTM together into state-of-the-art ensemble models. The data we use contains weekly historical prices, finance reports, and text information from news items associated with 518 different common stocks issued by current and former S&P 500 large-cap companies, from January 1, 2000 to December 31, 2019. Our study's innovation includes utilizing deep language models to categorize and infer financial news item sentiment; fusing different models containing different combinations of variables and stocks to jointly make predictions; and overcoming the insufficient data problem for machine learning models in time series by using data across different stocks.

Download Full-text

Predicting Next-Day Perceived and Physiological Stress of Pregnant Women Using Machine Learning and Explainability: Algorithm Development and Validation (Preprint)

10.2196/preprints.33850 ◽

2021 ◽

Author(s):

Ada Ng ◽

Boyang Wei ◽

Jaya Jain ◽

Erin Ward ◽

Darius Tandon ◽

...

Keyword(s):

Machine Learning ◽

Data Collection ◽

Pregnant Women ◽

Perceived Stress ◽

Physiological Stress ◽

Behavioral Therapy ◽

Learning Models ◽

Adverse Health Effects ◽

Using Data ◽

Machine Learning Models

BACKGROUND Cognitive behavioral therapy (CBT)-based interventions are effective in reducing prenatal stress, which can have severe adverse health effects on mother and newborn if unaddressed. Predicting next-day physiologic or perceived stress can help to inform and enable preemptive interventions for a likely physiologically and/or perceptibly stressful day. Machine learning models are useful tools that can be developed to predict next-day physiologic and perceived stress using data collected the previous day. Such models can improve our understanding of the specific factors that predict physiologic and perceived stress and will also allow researchers to develop systems that collect selected features for assessment for clinical trials in order to minimize the burden of data collection. OBJECTIVE To build and evaluate a machine-learned model that predicts next-day physiologic and perceived stress using sensor-based, ecological momentary assessment (EMA)-based, and intervention-based features and to explain the prediction results. METHODS We enrolled pregnant women into a prospective proof-of-concept study and collected electrocardiography, EMA, and CBT intervention data over 12 weeks. We used the data to train and evaluate six machine learning models to predict next-day physiologic and perceived stress. After selecting the best performing model, SHapley Additive exPlanations (SHAP) were used to identify feature importance and explainability of each feature. RESULTS A total of 16 pregnant women enrolled in the study. Overall, 4157.18 hours of data were collected, and participants answered 2838 EMAs. After applying feature selection, 8 and 10 features were found to positively predict next-day physiologic and perceived stress, respectively. A random forest classifier performed the best in predicting next-day physiologic (F1-score 0.84) and next-day perceived stress (F1-score 0.74) using all features. While any subset of sensor-based, EMA-based, and/or intervention-based features could reliably predict next-day physiologic stress, EMA-based features were necessary to predict next-day perceived stress. Analysis of explainability metrics showed that prolonged duration of physiologic stress was highly predictive of next-day physiologic stress and that physiologic stress and perceived stress were temporally divergent. CONCLUSIONS In this study we were able to build interpretable machine learning models to predict next-day physiologic and perceived stress, and we identify unique features that were highly predictive of next-day stress that can help reduce the burden of data collection.

Download Full-text

Machine Learning Models for the Prediction of Postpartum Depression: Application and Comparison Based on a Cohort Study (Preprint)

10.2196/preprints.15516 ◽

2019 ◽

Author(s):

Weina Zhang ◽

Han Liu ◽

Vincent Michael Bernard Silenzio ◽

Peiyuan Qiu ◽

Wenjie Gong

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Random Forest ◽

Postpartum Depression ◽

Support Vector ◽

Selection Methods ◽

Learning Models ◽

Expert Consultation ◽

Using Data ◽

Machine Learning Models

BACKGROUND Postpartum depression (PPD) is a serious public health problem. Building a predictive model for PPD using data during pregnancy can facilitate earlier identification and intervention. OBJECTIVE The aims of this study are to compare the effects of four different machine learning models using data during pregnancy to predict PPD and explore which factors in the model are the most important for PPD prediction. METHODS Information on the pregnancy period from a cohort of 508 women, including demographics, social environmental factors, and mental health, was used as predictors in the models. The Edinburgh Postnatal Depression Scale score within 42 days after delivery was used as the outcome indicator. Using two feature selection methods (expert consultation and random forest-based filter feature selection [FFS-RF]) and two algorithms (support vector machine [SVM] and random forest [RF]), we developed four different machine learning PPD prediction models and compared their prediction effects. RESULTS There was no significant difference in the effectiveness of the two feature selection methods in terms of model prediction performance, but 10 fewer factors were selected with the FFS-RF than with the expert consultation method. The model based on SVM and FFS-RF had the best prediction effects (sensitivity=0.69, area under the curve=0.78). In the feature importance ranking output by the RF algorithm, psychological elasticity, depression during the third trimester, and income level were the most important predictors. CONCLUSIONS In contrast to the expert consultation method, FFS-RF was important in dimension reduction. When the sample size is small, the SVM algorithm is suitable for predicting PPD. In the prevention of PPD, more attention should be paid to the psychological resilience of mothers.

Download Full-text