Ensemble-AMPPred: Robust AMP Prediction and Recognition Using the Ensemble Learning Method with a New Hybrid Feature for Differentiating AMPs

Supatcha Lertampaiporn; Tayvich Vorapreeda; Apiradee Hongsthong; Chinae Thammarongtham

doi:10.3390/genes12020137

Ensemble-AMPPred: Robust AMP Prediction and Recognition Using the Ensemble Learning Method with a New Hybrid Feature for Differentiating AMPs

Genes ◽

10.3390/genes12020137 ◽

2021 ◽

Vol 12 (2) ◽

pp. 137

Author(s):

Supatcha Lertampaiporn ◽

Tayvich Vorapreeda ◽

Apiradee Hongsthong ◽

Chinae Thammarongtham

Keyword(s):

Machine Learning ◽

High Performance ◽

Predictive Accuracy ◽

Antimicrobial Activities ◽

Ensemble Model ◽

Learning Approaches ◽

Ensemble Machine Learning ◽

Screening And Identification ◽

Feature Based ◽

Natural Peptides

Antimicrobial peptides (AMPs) are natural peptides possessing antimicrobial activities. These peptides are important components of the innate immune system. They are found in various organisms. AMP screening and identification by experimental techniques are laborious and time-consuming tasks. Alternatively, computational methods based on machine learning have been developed to screen potential AMP candidates prior to experimental verification. Although various AMP prediction programs are available, there is still a need for improvement to reduce false positives (FPs) and to increase the predictive accuracy. In this work, several well-known single and ensemble machine learning approaches have been explored and evaluated based on balanced training datasets and two large testing datasets. We have demonstrated that the developed program with various predictive models has high performance in differentiating between AMPs and non-AMPs. Thus, we describe the development of a program for the prediction and recognition of AMPs using MaxProbVote, which is an ensemble model. Moreover, to increase prediction efficiency, the ensemble model was integrated with a new hybrid feature based on logistic regression. The ensemble model integrated with the hybrid feature can effectively increase the prediction sensitivity of the developed program called Ensemble-AMPPred, resulting in overall improvements in terms of both sensitivity and specificity compared to those of currently available programs.

Download Full-text

A novel multi-stage ensemble model with multiple K-means-based selective undersampling: An application in credit scoring

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-201954 ◽

2021 ◽

Vol 40 (5) ◽

pp. 9471-9484

Author(s):

Yilun Jin ◽

Yanan Liu ◽

Wenyu Zhang ◽

Shuai Zhang ◽

Yu Lou

Keyword(s):

Machine Learning ◽

Predictive Accuracy ◽

Credit Scoring ◽

Imbalanced Data ◽

Ensemble Model ◽

Selective Sampling ◽

Machine Learning Methods ◽

Multi Stage ◽

Proposed Model ◽

New Feature

With the advancement of machine learning, credit scoring can be performed better. As one of the widely recognized machine learning methods, ensemble learning has demonstrated significant improvements in the predictive accuracy over individual machine learning models for credit scoring. This study proposes a novel multi-stage ensemble model with multiple K-means-based selective undersampling for credit scoring. First, a new multiple K-means-based undersampling method is proposed to deal with the imbalanced data. Then, a new selective sampling mechanism is proposed to select the better-performing base classifiers adaptively. Finally, a new feature-enhanced stacking method is proposed to construct an effective ensemble model by composing the shortlisted base classifiers. In the experiments, four datasets with four evaluation indicators are used to evaluate the performance of the proposed model, and the experimental results prove the superiority of the proposed model over other benchmark models.

Download Full-text

Chemometrics‐based models hyphenated with ensemble machine learning for retention time simulation of Isoquercitrin in Coriander sativum L. using high performance liquid chromatography

Journal of Separation Science ◽

10.1002/jssc.202000890 ◽

2020 ◽

Author(s):

Abdullahi Garba Usman ◽

Selin Işik ◽

Sani Isah Abba ◽

Filiz Meriçli

Keyword(s):

Machine Learning ◽

High Performance Liquid Chromatography ◽

Liquid Chromatography ◽

Retention Time ◽

High Performance ◽

Ensemble Machine Learning ◽

Time Simulation

Download Full-text

An Optimized Stacking Ensemble Model for Phishing Websites Detection

Electronics ◽

10.3390/electronics10111285 ◽

2021 ◽

Vol 10 (11) ◽

pp. 1285

Author(s):

Mohammed Al-Sarem ◽

Faisal Saeed ◽

Zeyad Ghaleb Al-Mekhlafi ◽

Badiea Abdulkarem Mohammed ◽

Tawfik Al-Hadhrami ◽

...

Keyword(s):

Machine Learning ◽

Random Forests ◽

Ensemble Method ◽

Detection Methods ◽

Detection Accuracy ◽

Ensemble Model ◽

Security Attacks ◽

Data Set ◽

Machine Learning Methods ◽

Ensemble Machine Learning

Security attacks on legitimate websites to steal users’ information, known as phishing attacks, have been increasing. This kind of attack does not just affect individuals’ or organisations’ websites. Although several detection methods for phishing websites have been proposed using machine learning, deep learning, and other approaches, their detection accuracy still needs to be enhanced. This paper proposes an optimized stacking ensemble method for phishing website detection. The optimisation was carried out using a genetic algorithm (GA) to tune the parameters of several ensemble machine learning methods, including random forests, AdaBoost, XGBoost, Bagging, GradientBoost, and LightGBM. The optimized classifiers were then ranked, and the best three models were chosen as base classifiers of a stacking ensemble method. The experiments were conducted on three phishing website datasets that consisted of both phishing websites and legitimate websites—the Phishing Websites Data Set from UCI (Dataset 1); Phishing Dataset for Machine Learning from Mendeley (Dataset 2, and Datasets for Phishing Websites Detection from Mendeley (Dataset 3). The experimental results showed an improvement using the optimized stacking ensemble method, where the detection accuracy reached 97.16%, 98.58%, and 97.39% for Dataset 1, Dataset 2, and Dataset 3, respectively.

Download Full-text

Environmental Sound Recognition on Embedded Systems: From FPGAs to TPUs

Electronics ◽

10.3390/electronics10212622 ◽

2021 ◽

Vol 10 (21) ◽

pp. 2622

Author(s):

Jurgen Vandendriessche ◽

Nick Wouters ◽

Bruno da Silva ◽

Mimoun Lamrini ◽

Mohamed Yassin Chkouri ◽

...

Keyword(s):

Machine Learning ◽

High Performance ◽

Machine Learning Techniques ◽

Sound Recognition ◽

Learning Approaches ◽

Environmental Sound ◽

Embedded Devices ◽

Power Efficient ◽

Computationally Intensive ◽

Environmental Sound Recognition

In recent years, Environmental Sound Recognition (ESR) has become a relevant capability for urban monitoring applications. The techniques for automated sound recognition often rely on machine learning approaches, which have increased in complexity in order to achieve higher accuracy. Nonetheless, such machine learning techniques often have to be deployed on resource and power-constrained embedded devices, which has become a challenge with the adoption of deep learning approaches based on Convolutional Neural Networks (CNNs). Field-Programmable Gate Arrays (FPGAs) are power efficient and highly suitable for computationally intensive algorithms like CNNs. By fully exploiting their parallel nature, they have the potential to accelerate the inference time as compared to other embedded devices. Similarly, dedicated architectures to accelerate Artificial Intelligence (AI) such as Tensor Processing Units (TPUs) promise to deliver high accuracy while achieving high performance. In this work, we evaluate existing tool flows to deploy CNN models on FPGAs as well as on TPU platforms. We propose and adjust several CNN-based sound classifiers to be embedded on such hardware accelerators. The results demonstrate the maturity of the existing tools and how FPGAs can be exploited to outperform TPUs.

Download Full-text

Performance Assessment of Ensemble Learning Model for Prediction of Cardiac Disease Among Smokers Based on HRV Features

International Journal of Biomedical and Clinical Engineering ◽

10.4018/ijbce.2021010102 ◽

2021 ◽

Vol 10 (1) ◽

pp. 19-34

Author(s):

S. R. Rathod ◽

C. Y. Patil

Keyword(s):

Machine Learning ◽

Heart Rate ◽

Cardiac Disease ◽

Kappa Statistics ◽

Cardiac Diseases ◽

Ensemble Model ◽

Single Model ◽

Machine Learning Methods ◽

Ensemble Machine Learning ◽

Boosting Technique

Smoking impacts the pattern of heart rate variability (HRV); HRV therefore acts as a predictor of cardiac diseases (CD). In this study, to predict CD non-invasively among smokers, ensemble machine learning methods have been used. A single model is created based on ensemble voting classifier with a combined boosting technique to improve the accuracy of predictive model. The final ensemble model shows an accuracy of 95.20%, precision of 97.27%, sensitivity of 92.35%, specificity of 98.07%, F1 score of 0.95, AUC of 0.961, MCE of 0.0479, kappa statistics value of 0.9041, and MSE of 0.2189. The obtained accuracy by using the proposed method is the highest value achieved so far for the prediction of CD among smokers using HRV data.

Download Full-text

Classification of Driver Distraction: A Comprehensive Analysis of Feature Generation, Machine Learning, and Input Measures

Human Factors The Journal of the Human Factors and Ergonomics Society ◽

10.1177/0018720819856454 ◽

2019 ◽

Vol 62 (6) ◽

pp. 1019-1035 ◽

Cited By ~ 7

Author(s):

Anthony D. McDonald ◽

Thomas K. Ferris ◽

Tyler A. Wiener

Keyword(s):

Machine Learning ◽

Driving Behavior ◽

Driver Distraction ◽

Machine Learning Algorithms ◽

Physiological Data ◽

Learning Approaches ◽

Feature Generation ◽

Driver Performance ◽

Ensemble Machine Learning ◽

Vehicle Information

Objective The objective of this study was to analyze a set of driver performance and physiological data using advanced machine learning approaches, including feature generation, to determine the best-performing algorithms for detecting driver distraction and predicting the source of distraction. Background Distracted driving is a causal factor in many vehicle crashes, often resulting in injuries and deaths. As mobile devices and in-vehicle information systems become more prevalent, the ability to detect and mitigate driver distraction becomes more important. Method This study trained 21 algorithms to identify when drivers were distracted by secondary cognitive and texting tasks. The algorithms included physiological and driving behavioral input processed with a comprehensive feature generation package, Time Series Feature Extraction based on Scalable Hypothesis tests. Results Results showed that a Random Forest algorithm, trained using only driving behavior measures and excluding driver physiological data, was the highest-performing algorithm for accurately classifying driver distraction. The most important input measures identified were lane offset, speed, and steering, whereas the most important feature types were standard deviation, quantiles, and nonlinear transforms. Conclusion This work suggests that distraction detection algorithms may be improved by considering ensemble machine learning algorithms that are trained with driving behavior measures and nonstandard features. In addition, the study presents several new indicators of distraction derived from speed and steering measures. Application Future development of distraction mitigation systems should focus on driver behavior–based algorithms that use complex feature generation techniques.

Download Full-text

A Machine Learning Prediction Model of Respiratory Failure Within 48 Hours of Patient Admission for COVID-19: Model Development and Validation

Journal of Medical Internet Research ◽

10.2196/24246 ◽

2021 ◽

Vol 23 (2) ◽

pp. e24246 ◽

Cited By ~ 1

Author(s):

Siavash Bolourani ◽

Max Brenner ◽

Ping Wang ◽

Thomas McGinn ◽

Jamie S Hirsch ◽

...

Keyword(s):

Machine Learning ◽

Emergency Department ◽

Respiratory Failure ◽

Early Warning ◽

Clinical Decision Making ◽

Predictive Accuracy ◽

Invasive Mechanical Ventilation ◽

Laboratory Data ◽

Early Warning Score ◽

Learning Approaches

Background Predicting early respiratory failure due to COVID-19 can help triage patients to higher levels of care, allocate scarce resources, and reduce morbidity and mortality by appropriately monitoring and treating the patients at greatest risk for deterioration. Given the complexity of COVID-19, machine learning approaches may support clinical decision making for patients with this disease. Objective Our objective is to derive a machine learning model that predicts respiratory failure within 48 hours of admission based on data from the emergency department. Methods Data were collected from patients with COVID-19 who were admitted to Northwell Health acute care hospitals and were discharged, died, or spent a minimum of 48 hours in the hospital between March 1 and May 11, 2020. Of 11,525 patients, 933 (8.1%) were placed on invasive mechanical ventilation within 48 hours of admission. Variables used by the models included clinical and laboratory data commonly collected in the emergency department. We trained and validated three predictive models (two based on XGBoost and one that used logistic regression) using cross-hospital validation. We compared model performance among all three models as well as an established early warning score (Modified Early Warning Score) using receiver operating characteristic curves, precision-recall curves, and other metrics. Results The XGBoost model had the highest mean accuracy (0.919; area under the curve=0.77), outperforming the other two models as well as the Modified Early Warning Score. Important predictor variables included the type of oxygen delivery used in the emergency department, patient age, Emergency Severity Index level, respiratory rate, serum lactate, and demographic characteristics. Conclusions The XGBoost model had high predictive accuracy, outperforming other early warning scores. The clinical plausibility and predictive ability of XGBoost suggest that the model could be used to predict 48-hour respiratory failure in admitted patients with COVID-19.

Download Full-text

An Experimental Comparison between Deep Learning and Classical Machine Learning Approaches for Writer Identification in Medieval Documents

Journal of Imaging ◽

10.3390/jimaging6090089 ◽

2020 ◽

Vol 6 (9) ◽

pp. 89

Author(s):

Nicole Dalia Cilia ◽

Claudio De Stefano ◽

Francesco Fontanella ◽

Claudio Marrocco ◽

Mario Molinara ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

High Performance ◽

Ad Hoc ◽

Digital Images ◽

Experimental Comparison ◽

Learning Approaches ◽

Test Bed ◽

Ancient Manuscripts ◽

Ancient Documents

In the framework of palaeography, the availability of both effective image analysis algorithms, and high-quality digital images has favored the development of new applications for the study of ancient manuscripts and has provided new tools for decision-making support systems. The quality of the results provided by such applications, however, is strongly influenced by the selection of effective features, which should be able to capture the distinctive aspects to which the paleography expert is interested in. This process is very difficult to generalize due to the enormous variability in the type of ancient documents, produced in different historical periods with different languages and styles. The effect is that it is very difficult to define standard techniques that are general enough to be effectively used in any case, and this is the reason why ad-hoc systems, generally designed according to paleographers’ suggestions, have been designed for the analysis of ancient manuscripts. In recent years, there has been a growing scientific interest in the use of techniques based on deep learning (DL) for the automatic processing of ancient documents. This interest is not only due to their capability of designing high-performance pattern recognition systems, but also to their ability of automatically extracting features from raw data, without using any a priori knowledge. Moving from these considerations, the aim of this study is to verify if DL-based approaches may actually represent a general methodology for automatically designing machine learning systems for palaeography applications. To this purpose, we compared the performance of a DL-based approach with that of a “classical” machine learning one, in a particularly unfavorable case for DL, namely that of highly standardized schools. The rationale of this choice is to compare the obtainable results even when context information is present and discriminating: this information is ignored by DL approaches, while it is used by machine learning methods, making the comparison more significant. The experimental results refer to the use of a large sets of digital images extracted from an entire 12th-century Bibles, the “Avila Bible”. This manuscript, produced by several scribes who worked in different periods and in different places, represents a severe test bed to evaluate the efficiency of scribe identification systems.

Download Full-text

Comparing Statistical and Machine Learning Classifiers: Alternatives for Predictive Modeling in Human Factors Research

Human Factors The Journal of the Human Factors and Ergonomics Society ◽

10.1518/hfes.45.3.408.27248 ◽

2003 ◽

Vol 45 (3) ◽

pp. 408-423 ◽

Cited By ~ 6

Author(s):

Brian Carnahan ◽

Gérard Meyer ◽

Lois-Ann Kuntz

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Discriminant Analysis ◽

Human Factors ◽

Predictive Accuracy ◽

Performance Outcomes ◽

Learning Approaches ◽

Classification Models ◽

Machine Learning Classification ◽

Human Factors Research

Multivariate classification models play an increasingly important role in human factors research. In the past, these models have been based primarily on discriminant analysis and logistic regression. Models developed from machine learning research offer the human factors professional a viable alternative to these traditional statistical classification methods. To illustrate this point, two machine learning approaches - genetic programming and decision tree induction - were used to construct classification models designed to predict whether or not a student truck driver would pass his or her commercial driver license (CDL) examination. The models were developed and validated using the curriculum scores and CDL exam performances of 37 student truck drivers who had completed a 320-hr driver training course. Results indicated that the machine learning classification models were superior to discriminant analysis and logistic regression in terms of predictive accuracy. Actual or potential applications of this research include the creation of models that more accurately predict human performance outcomes.

Download Full-text

Performance Analysis of Microarray Data Classification using Machine Learning Techniques

International Journal of Knowledge Discovery in Bioinformatics ◽

10.4018/ijkdb.2015070104 ◽

2015 ◽

Vol 5 (2) ◽

pp. 43-54

Author(s):

Subhendu Kumar Pani ◽

Bikram Kesari Ratha ◽

Ajay Kumar Mishra

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Microarray Data ◽

Predictive Accuracy ◽

Machine Learning Techniques ◽

Learning Approaches ◽

Data Mining Technique ◽

Single Experiment ◽

Learning Techniques ◽

Microarray Datasets

Microarray technology of DNA permits simultaneous monitoring and determining of thousands of gene expression activation levels in a single experiment. Data mining technique such as classification is extensively used on microarray data for medical diagnosis and gene analysis. However, high dimensionality of the data affects the performance of classification and prediction. Consequently, a key issue in microarray data is feature selection and dimensionality reduction in order to achieve better classification and predictive accuracy. There are several machine learning approaches available for feature selection. In this study, the authors use Particle Swarm Organization (PSO) and Genetic Algorithm (GA) to find the performance of several popular classifiers on a set of microarray datasets. Experimental results conclude that feature selection affects the performance.

Download Full-text