Machine learning approaches to identify and design low thermal conductivity oxides for thermoelectric applications

Data-Centric Engineering ◽

10.1017/dce.2020.7 ◽

2020 ◽

Vol 1 ◽

Cited By ~ 1

Author(s):

Abhishek Tewari ◽

Siddharth Dixit ◽

Niteesh Sahni ◽

Stéphane P.A. Bordas

Keyword(s):

Machine Learning ◽

Thermal Conductivity ◽

Transition Metal ◽

Predictive Accuracy ◽

Mass Density ◽

Learning Approaches ◽

Tree Model ◽

Step Process ◽

Tree Classifier ◽

Boosted Tree

Abstract The search space for new thermoelectric oxides has been limited to the alloys of a few known systems, such as ZnO, SrTiO3, and CaMnO3. Notwithstanding the high power factor, their high thermal conductivity is a roadblock in achieving higher efficiency. In this paper, we apply machine learning (ML) models for discovering novel transition metal oxides with low lattice thermal conductivity ( $ {k}_L $ ). A two-step process is proposed to address the problem of small datasets frequently encountered in material informatics. First, a gradient-boosted tree classifier is learnt to categorize unknown compounds into three categories of $ {k}_L $ : low, medium, and high. In the second step, we fit regression models on the targeted class (i.e., low $ {k}_L $ ) to estimate $ {k}_L $ with an $ {R}^2>0.9 $ . Gradient boosted tree model was also used to identify key material properties influencing classification of $ {k}_L $ , namely lattice energy per atom, atom density, band gap, mass density, and ratio of oxygen by transition metal atoms. Only fundamental materials properties describing the crystal symmetry, compound chemistry, and interatomic bonding were used in the classification process, which can be readily used in the initial phases of materials design. The proposed two-step process addresses the problem of small datasets and improves the predictive accuracy. The ML approach adopted in the present work is generic in nature and can be combined with high-throughput computing for the rapid discovery of new materials for specific applications.

Download Full-text

A Machine Learning Prediction Model of Respiratory Failure Within 48 Hours of Patient Admission for COVID-19: Model Development and Validation

Journal of Medical Internet Research ◽

10.2196/24246 ◽

2021 ◽

Vol 23 (2) ◽

pp. e24246 ◽

Cited By ~ 1

Author(s):

Siavash Bolourani ◽

Max Brenner ◽

Ping Wang ◽

Thomas McGinn ◽

Jamie S Hirsch ◽

...

Keyword(s):

Machine Learning ◽

Emergency Department ◽

Respiratory Failure ◽

Early Warning ◽

Clinical Decision Making ◽

Predictive Accuracy ◽

Invasive Mechanical Ventilation ◽

Laboratory Data ◽

Early Warning Score ◽

Learning Approaches

Background Predicting early respiratory failure due to COVID-19 can help triage patients to higher levels of care, allocate scarce resources, and reduce morbidity and mortality by appropriately monitoring and treating the patients at greatest risk for deterioration. Given the complexity of COVID-19, machine learning approaches may support clinical decision making for patients with this disease. Objective Our objective is to derive a machine learning model that predicts respiratory failure within 48 hours of admission based on data from the emergency department. Methods Data were collected from patients with COVID-19 who were admitted to Northwell Health acute care hospitals and were discharged, died, or spent a minimum of 48 hours in the hospital between March 1 and May 11, 2020. Of 11,525 patients, 933 (8.1%) were placed on invasive mechanical ventilation within 48 hours of admission. Variables used by the models included clinical and laboratory data commonly collected in the emergency department. We trained and validated three predictive models (two based on XGBoost and one that used logistic regression) using cross-hospital validation. We compared model performance among all three models as well as an established early warning score (Modified Early Warning Score) using receiver operating characteristic curves, precision-recall curves, and other metrics. Results The XGBoost model had the highest mean accuracy (0.919; area under the curve=0.77), outperforming the other two models as well as the Modified Early Warning Score. Important predictor variables included the type of oxygen delivery used in the emergency department, patient age, Emergency Severity Index level, respiratory rate, serum lactate, and demographic characteristics. Conclusions The XGBoost model had high predictive accuracy, outperforming other early warning scores. The clinical plausibility and predictive ability of XGBoost suggest that the model could be used to predict 48-hour respiratory failure in admitted patients with COVID-19.

Download Full-text

Comparing Statistical and Machine Learning Classifiers: Alternatives for Predictive Modeling in Human Factors Research

Human Factors The Journal of the Human Factors and Ergonomics Society ◽

10.1518/hfes.45.3.408.27248 ◽

2003 ◽

Vol 45 (3) ◽

pp. 408-423 ◽

Cited By ~ 6

Author(s):

Brian Carnahan ◽

Gérard Meyer ◽

Lois-Ann Kuntz

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Discriminant Analysis ◽

Human Factors ◽

Predictive Accuracy ◽

Performance Outcomes ◽

Learning Approaches ◽

Classification Models ◽

Machine Learning Classification ◽

Human Factors Research

Multivariate classification models play an increasingly important role in human factors research. In the past, these models have been based primarily on discriminant analysis and logistic regression. Models developed from machine learning research offer the human factors professional a viable alternative to these traditional statistical classification methods. To illustrate this point, two machine learning approaches - genetic programming and decision tree induction - were used to construct classification models designed to predict whether or not a student truck driver would pass his or her commercial driver license (CDL) examination. The models were developed and validated using the curriculum scores and CDL exam performances of 37 student truck drivers who had completed a 320-hr driver training course. Results indicated that the machine learning classification models were superior to discriminant analysis and logistic regression in terms of predictive accuracy. Actual or potential applications of this research include the creation of models that more accurately predict human performance outcomes.

Download Full-text

Performance Analysis of Microarray Data Classification using Machine Learning Techniques

International Journal of Knowledge Discovery in Bioinformatics ◽

10.4018/ijkdb.2015070104 ◽

2015 ◽

Vol 5 (2) ◽

pp. 43-54

Author(s):

Subhendu Kumar Pani ◽

Bikram Kesari Ratha ◽

Ajay Kumar Mishra

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Microarray Data ◽

Predictive Accuracy ◽

Machine Learning Techniques ◽

Learning Approaches ◽

Data Mining Technique ◽

Single Experiment ◽

Learning Techniques ◽

Microarray Datasets

Microarray technology of DNA permits simultaneous monitoring and determining of thousands of gene expression activation levels in a single experiment. Data mining technique such as classification is extensively used on microarray data for medical diagnosis and gene analysis. However, high dimensionality of the data affects the performance of classification and prediction. Consequently, a key issue in microarray data is feature selection and dimensionality reduction in order to achieve better classification and predictive accuracy. There are several machine learning approaches available for feature selection. In this study, the authors use Particle Swarm Organization (PSO) and Genetic Algorithm (GA) to find the performance of several popular classifiers on a set of microarray datasets. Experimental results conclude that feature selection affects the performance.

Download Full-text

A maximum flow-based network approach for identification of stable noncoding biomarkers associated with the multigenic neurological condition, autism

BioData Mining ◽

10.1186/s13040-021-00262-x ◽

2021 ◽

Vol 14 (1) ◽

Author(s):

Maya Varma ◽

Kelley M. Paskov ◽

Brianna S. Chrisman ◽

Min Woo Sun ◽

Jae-Yoon Jung ◽

...

Keyword(s):

Machine Learning ◽

Predictive Accuracy ◽

Disease Risk ◽

Genetic Disorders ◽

Maximum Flow ◽

Autism Spectrum ◽

Whole Genome Sequence ◽

Learning Approaches ◽

Model Stability ◽

Improve Model

Abstract Background Machine learning approaches for predicting disease risk from high-dimensional whole genome sequence (WGS) data often result in unstable models that can be difficult to interpret, limiting the identification of putative sets of biomarkers. Here, we design and validate a graph-based methodology based on maximum flow, which leverages the presence of linkage disequilibrium (LD) to identify stable sets of variants associated with complex multigenic disorders. Results We apply our method to a previously published logistic regression model trained to identify variants in simple repeat sequences associated with autism spectrum disorder (ASD); this L1-regularized model exhibits high predictive accuracy yet demonstrates great variability in the features selected from over 230,000 possible variants. In order to improve model stability, we extract the variants assigned non-zero weights in each of 5 cross-validation folds and then assemble the five sets of features into a flow network subject to LD constraints. The maximum flow formulation allowed us to identify 55 variants, which we show to be more stable than the features identified by the original classifier. Conclusion Our method allows for the creation of machine learning models that can identify predictive variants. Our results help pave the way towards biomarker-based diagnosis methods for complex genetic disorders.

Download Full-text

Ensemble-AMPPred: Robust AMP Prediction and Recognition Using the Ensemble Learning Method with a New Hybrid Feature for Differentiating AMPs

Genes ◽

10.3390/genes12020137 ◽

2021 ◽

Vol 12 (2) ◽

pp. 137

Author(s):

Supatcha Lertampaiporn ◽

Tayvich Vorapreeda ◽

Apiradee Hongsthong ◽

Chinae Thammarongtham

Keyword(s):

Machine Learning ◽

High Performance ◽

Predictive Accuracy ◽

Antimicrobial Activities ◽

Ensemble Model ◽

Learning Approaches ◽

Ensemble Machine Learning ◽

Screening And Identification ◽

Feature Based ◽

Natural Peptides

Antimicrobial peptides (AMPs) are natural peptides possessing antimicrobial activities. These peptides are important components of the innate immune system. They are found in various organisms. AMP screening and identification by experimental techniques are laborious and time-consuming tasks. Alternatively, computational methods based on machine learning have been developed to screen potential AMP candidates prior to experimental verification. Although various AMP prediction programs are available, there is still a need for improvement to reduce false positives (FPs) and to increase the predictive accuracy. In this work, several well-known single and ensemble machine learning approaches have been explored and evaluated based on balanced training datasets and two large testing datasets. We have demonstrated that the developed program with various predictive models has high performance in differentiating between AMPs and non-AMPs. Thus, we describe the development of a program for the prediction and recognition of AMPs using MaxProbVote, which is an ensemble model. Moreover, to increase prediction efficiency, the ensemble model was integrated with a new hybrid feature based on logistic regression. The ensemble model integrated with the hybrid feature can effectively increase the prediction sensitivity of the developed program called Ensemble-AMPPred, resulting in overall improvements in terms of both sensitivity and specificity compared to those of currently available programs.

Download Full-text

Detailed Analysis of Intrusion Detection using Machine Learning Algorithms

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.a2127.059120 ◽

2020 ◽

Vol 9 (1) ◽

pp. 1894-1899 ◽

Cited By ~ 1

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Svm Classifier ◽

Learning Approaches ◽

Decision Tree Classifier ◽

Internet Users ◽

Tree Classifier ◽

Challenging Tasks

The number of internet users has increased exponentially over the years and so have increased intrusive activities significantly. To detect an intrusion attack in a system connected over a network is one of the most challenging tasks in today’s world. A significant number of techniques have been developed which are based on machine learning approaches to detect these intrusion attacks. Even though these techniques are good, they are not good enough to detect all kinds of attacks. In this paper, the analysis of different machine learning algorithm will be performed on the NSL-KDD dataset with pre-processing steps like One-hot encoding, feature selection and random sampling to use in different machine learning models to find the best performing model to detect these attacks. The attacks are from the datasets are classified into four types of attacks: Probe, DoS, U2R, R2L while the non- attack is the Normal. The dataset is in two parts: KDD-Train and KDD-Test. The dataset is trained and tested to find accuracy and understand the performance of different machine learning algorithms and compare them. The Machine Learning algorithms used are Naive Bayes Classifier, Decision Tree Classifier, Random Forest Classifier, KNeighbours Classifier, Logistic Regression, SVM Classifier, Voting Classifier. These techniques are compared according to their capability to detect the attacks. This comparison will help to find the algorithm which would work the best to detect different kinds of intrusion attacks.

Download Full-text

Performance and clinical utility of supervised machine-learning approaches in detecting familial hypercholesterolaemia in primary care

npj Digital Medicine ◽

10.1038/s41746-020-00349-5 ◽

2020 ◽

Vol 3 (1) ◽

Author(s):

Ralph K. Akyea ◽

Nadeem Qureshi ◽

Joe Kai ◽

Stephen F. Weng

Keyword(s):

Machine Learning ◽

Primary Care ◽

Logistic Regression ◽

Heart Disease ◽

Ensemble Learning ◽

Clinical Utility ◽

Familial Hypercholesterolaemia ◽

Predictive Accuracy ◽

Gradient Boosting ◽

Learning Approaches

Abstract Familial hypercholesterolaemia (FH) is a common inherited disorder, causing lifelong elevated low-density lipoprotein cholesterol (LDL-C). Most individuals with FH remain undiagnosed, precluding opportunities to prevent premature heart disease and death. Some machine-learning approaches improve detection of FH in electronic health records, though clinical impact is under-explored. We assessed performance of an array of machine-learning approaches for enhancing detection of FH, and their clinical utility, within a large primary care population. A retrospective cohort study was done using routine primary care clinical records of 4,027,775 individuals from the United Kingdom with total cholesterol measured from 1 January 1999 to 25 June 2019. Predictive accuracy of five common machine-learning algorithms (logistic regression, random forest, gradient boosting machines, neural networks and ensemble learning) were assessed for detecting FH. Predictive accuracy was assessed by area under the receiver operating curves (AUC) and expected vs observed calibration slope; with clinical utility assessed by expected case-review workload and likelihood ratios. There were 7928 incident diagnoses of FH. In addition to known clinical features of FH (raised total cholesterol or LDL-C and family history of premature coronary heart disease), machine-learning (ML) algorithms identified features such as raised triglycerides which reduced the likelihood of FH. Apart from logistic regression (AUC, 0.81), all four other ML approaches had similarly high predictive accuracy (AUC > 0.89). Calibration slope ranged from 0.997 for gradient boosting machines to 1.857 for logistic regression. Among those screened, high probability cases requiring clinical review varied from 0.73% using ensemble learning to 10.16% using deep learning, but with positive predictive values of 15.5% and 2.8% respectively. Ensemble learning exhibited a dominant positive likelihood ratio (45.5) compared to all other ML models (7.0–14.4). Machine-learning models show similar high accuracy in detecting FH, offering opportunities to increase diagnosis. However, the clinical case-finding workload required for yield of cases will differ substantially between models.

Download Full-text

Machine learning approaches reveal genomic regions associated with sugarcane brown rust resistance

Scientific Reports ◽

10.1038/s41598-020-77063-5 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Alexandre Hild Aono ◽

Estela Araujo Costa ◽

Hugo Vianna Silva Rody ◽

James Shiniti Nagai ◽

Ricardo José Gonzaga Pimenta ◽

...

Keyword(s):

Machine Learning ◽

Rust Resistance ◽

Predictive Accuracy ◽

Explanatory Power ◽

Genotyping By Sequencing ◽

Brown Rust ◽

Learning Approaches ◽

Molecular Approaches ◽

Genomic Regions ◽

Sugarcane Brown Rust

AbstractSugarcane is an economically important crop, but its genomic complexity has hindered advances in molecular approaches for genetic breeding. New cultivars are released based on the identification of interesting traits, and for sugarcane, brown rust resistance is a desirable characteristic due to the large economic impact of the disease. Although marker-assisted selection for rust resistance has been successful, the genes involved are still unknown, and the associated regions vary among cultivars, thus restricting methodological generalization. We used genotyping by sequencing of full-sib progeny to relate genomic regions with brown rust phenotypes. We established a pipeline to identify reliable SNPs in complex polyploid data, which were used for phenotypic prediction via machine learning. We identified 14,540 SNPs, which led to a mean prediction accuracy of 50% when using different models. We also tested feature selection algorithms to increase predictive accuracy, resulting in a reduced dataset with more explanatory power for rust phenotypes. As a result of this approach, we achieved an accuracy of up to 95% with a dataset of 131 SNPs related to brown rust QTL regions and auxiliary genes. Therefore, our novel strategy has the potential to assist studies of the genomic organization of brown rust resistance in sugarcane.

Download Full-text

Comparing machine learning approaches to identify myocardial scar from the ECG

European Heart Journal ◽

10.1093/ehjci/ehaa946.2048 ◽

2020 ◽

Vol 41 (Supplement_2) ◽

Author(s):

J Tung ◽

A.J Rogers ◽

N Ravi ◽

N.K Bhatia ◽

R.L Shah ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Predictive Accuracy ◽

Support Vector ◽

Funding Source ◽

Learning Approaches ◽

Wave Analysis ◽

Technical Parameters ◽

National Budget ◽

Q Wave

Abstract Background Detection of myocardial infarction (MI) traditionally requires ECG Q waves, which have poor sensitivity, or imaging, which is time consuming. We hypothesized that machine learning (ML) of the ECG could identify prior MI, but its accuracy may depend highly upon the architecture and parameters chosen. Purpose To compare ML architectures that predict prior MI from the ECG. Methods We curated ECGs in 608 patients seen in cardiology clinics at 2 centers. We transformed 12-lead ECGs to median beats in Frank (X, Y, Z) planes (fig. A). We tested 3 architectures: a 1D deep neural network (DNN), a 3D neural network, and a support vector machine (SVM). The 1D DNN used only temporal convolutions (fig B) while the 3D DNN uses a spatial convolution (fig C) prior to the fully-connected layer (fig. C). Predictive accuracy for history of MI was compared for all architectures (fig. D). Results Patients (61.4±14.5 years, 31.2% female) had a 28.7% (175/608) prevalence of prior MI. Optimized SVM of 6 features provided accuracy of 66.1% for identifying prior MI, similar to ECG Q wave analysis. 1D DDN had accuracy of 63.6% with an area under curve (AUC) of 0.625. 3D DNN outperformed 1D DNN and SVM, providing an accuracy of 71±5% (using k=5-fold cross validation), with an AUC of 0.730. Conclusion ECG machine learning can identify prior MI better than Q wave analysis, but is sensitive to technical parameters and specific computational architecture. It is important to develop a framework to enable robust comparisons of different ML studies and future refinements. Funding Acknowledgement Type of funding source: Public grant(s) – National budget only. Main funding source(s): National Institutes of Health - United States

Download Full-text

Integration of machine learning approaches for accelerated discovery of transition-metal dichalcogenides as Hg0 sensing materials

Applied Energy ◽

10.1016/j.apenergy.2019.113651 ◽

2019 ◽

Vol 254 ◽

pp. 113651 ◽

Cited By ~ 4

Author(s):

Haitao Zhao ◽

Collins I. Ezeh ◽

Weijia Ren ◽

Wentao Li ◽

Cheng Heng Pang ◽

...

Keyword(s):

Machine Learning ◽

Transition Metal ◽

Transition Metal Dichalcogenides ◽

Learning Approaches ◽

Metal Dichalcogenides

Download Full-text