Comprehensive and empirical evaluation of machine learning algorithms for LC retention time prediction

Mapping Intimacies ◽

10.1101/259168 ◽

2018 ◽

Author(s):

Robbin Bouwmeester ◽

Lennart Martens ◽

Sven Degroeve

Keyword(s):

Machine Learning ◽

Retention Time ◽

Learning Algorithm ◽

Learning Algorithms ◽

Mass Spectrometric ◽

Machine Learning Algorithms ◽

Data Sets ◽

Retention Time Prediction ◽

Time Prediction ◽

Different Types

AbstractLiquid chromatography is a core component of almost all mass spectrometric analyses of (bio)molecules. Because of the high-throughput nature of mass spectrometric analyses, the interpretation of these chromatographic data increasingly relies on informatics solutions that attempt to predict an analyte’s retention time. The key components of such predictive algorithms are the features these are supplies with, and the actual machine learning algorithm used to fit the model parameters.We here therefore evaluate the performance of seven machine learning algorithms on 36 distinct metabolomics data sets, using two distinct feature sets. Interestingly, the results show that no single learning algorithm performs optimally for all data sets, with different algorithm types achieving top performance for different types of analytes or different protocols. Our results can thus be used to find an optimal retention time prediction algorithm for specific analytes or protocols. Importantly, however, our results also show that blending different types of models together decreases the error on outliers, indicating that the combination of several approaches holds substantial promise for the development of more generic, high-performing algorithms.

Download Full-text

Comprehensive and Empirical Evaluation of Machine Learning Algorithms for Small Molecule LC Retention Time Prediction

Analytical Chemistry ◽

10.1021/acs.analchem.8b05820 ◽

2019 ◽

Vol 91 (5) ◽

pp. 3694-3703 ◽

Cited By ~ 22

Author(s):

Robbin Bouwmeester ◽

Lennart Martens ◽

Sven Degroeve

Keyword(s):

Machine Learning ◽

Small Molecule ◽

Retention Time ◽

Learning Algorithms ◽

Empirical Evaluation ◽

Machine Learning Algorithms ◽

Retention Time Prediction ◽

Time Prediction

Download Full-text

Wearable Devices Data for Activity Prediction Using Machine Learning Algorithms

International Journal of Big Data and Analytics in Healthcare ◽

10.4018/ijbdah.2019010103 ◽

2019 ◽

Vol 4 (1) ◽

pp. 32-46

Author(s):

Lakshmi Prayaga ◽

Krishna Devulapalli ◽

Chandra Prayaga

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Learning Algorithms ◽

Wearable Devices ◽

Machine Learning Algorithms ◽

Embedded Sensors ◽

Data Sets ◽

Activity Prediction ◽

Related Data ◽

Recent Trends

Wearable devices are contributing heavily towards the proliferation of data and creating a rich minefield for data analytics. Recent trends in the design of wearable devices include several embedded sensors which also provide useful data for many applications. This research presents results obtained from studying human-activity related data, collected from wearable devices. The activities considered for this study were working at the computer, standing and walking, standing, walking, walking up and down the stairs, and talking while walking. The research entails the use of a portion of the data to train machine learning algorithms and build a model. The rest of the data is used as test data for predicting the activity of an individual. Details of data collection, processing, and presentation are also discussed. After studying the literature and the data sets, a Random Forest machine learning algorithm was determined to be best applicable algorithm for analyzing data from wearable devices. The software used in this research includes the R statistical package and the SensorLog app.

Download Full-text

Increasing the Accuracy of Predictive Algorithms

Encyclopedia of Information Science and Technology, Second Edition ◽

10.4018/978-1-60566-026-4.ch300 ◽

2011 ◽

pp. 1906-1910

Author(s):

Sotiris Kotsiantis ◽

Dimitris Kanellopoulos ◽

Panayotis Pintelas

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Bayesian Networks ◽

Learning Algorithm ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Training Data ◽

Data Sets ◽

Combining Classifiers ◽

Predictive Algorithms

In classification learning, the learning scheme is presented with a set of classified examples from which it is expected tone can learn a way of classifying unseen examples (see Table 1). Formally, the problem can be stated as follows: Given training data {(x1, y1)…(xn, yn)}, produce a classifier h: X- >Y that maps an object x ? X to its classification label y ? Y. A large number of classification techniques have been developed based on artificial intelligence (logic-based techniques, perception-based techniques) and statistics (Bayesian networks, instance-based techniques). No single learning algorithm can uniformly outperform other algorithms over all data sets. The concept of combining classifiers is proposed as a new direction for the improvement of the performance of individual machine learning algorithms. Numerous methods have been suggested for the creation of ensembles of classi- fiers (Dietterich, 2000). Although, or perhaps because, many methods of ensemble creation have been proposed, there is as yet no clear picture of which method is best.

Download Full-text

Comparing different supervised machine learning algorithms for disease prediction

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-019-1004-8 ◽

2019 ◽

Vol 19 (1) ◽

Cited By ~ 19

Author(s):

Shahadat Uddin ◽

Arif Khan ◽

Md Ekramul Hossain ◽

Mohammad Ali Moni

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Relative Performance ◽

Supervised Machine Learning ◽

Support Vector ◽

Disease Prediction ◽

Machine Learning Algorithm ◽

Different Types

Abstract Background Supervised machine learning algorithms have been a dominant method in the data mining field. Disease prediction using health data has recently shown a potential application area for these methods. This study aims to identify the key trends among different types of supervised machine learning algorithms, and their performance and usage for disease risk prediction. Methods In this study, extensive research efforts were made to identify those studies that applied more than one supervised machine learning algorithm on single disease prediction. Two databases (i.e., Scopus and PubMed) were searched for different types of search items. Thus, we selected 48 articles in total for the comparison among variants supervised machine learning algorithms for disease prediction. Results We found that the Support Vector Machine (SVM) algorithm is applied most frequently (in 29 studies) followed by the Naïve Bayes algorithm (in 23 studies). However, the Random Forest (RF) algorithm showed superior accuracy comparatively. Of the 17 studies where it was applied, RF showed the highest accuracy in 9 of them, i.e., 53%. This was followed by SVM which topped in 41% of the studies it was considered. Conclusion This study provides a wide overview of the relative performance of different variants of supervised machine learning algorithms for disease prediction. This important information of relative performance can be used to aid researchers in the selection of an appropriate supervised machine learning algorithm for their studies.

Download Full-text

Machine Learning Algorithms for Analysis of DNA Data Sets

Machine Learning Algorithms for Problem Solving in Computational Applications ◽

10.4018/978-1-4666-1833-6.ch004 ◽

2012 ◽

pp. 47-58 ◽

Cited By ~ 2

Author(s):

John Yearwood ◽

Adil Bagirov ◽

Andrei V. Kelarev

Keyword(s):

Machine Learning ◽

Dna Sequences ◽

Learning Algorithm ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Local Alignment ◽

Data Sets ◽

Data Set ◽

Applications Of Machine Learning ◽

New Machine

The applications of machine learning algorithms to the analysis of data sets of DNA sequences are very important. The present chapter is devoted to the experimental investigation of applications of several machine learning algorithms for the analysis of a JLA data set consisting of DNA sequences derived from non-coding segments in the junction of the large single copy region and inverted repeat A of the chloroplast genome in Eucalyptus collected by Australian biologists. Data sets of this sort represent a new situation, where sophisticated alignment scores have to be used as a measure of similarity. The alignment scores do not satisfy properties of the Minkowski metric, and new machine learning approaches have to be investigated. The authors’ experiments show that machine learning algorithms based on local alignment scores achieve very good agreement with known biological classes for this data set. A new machine learning algorithm based on graph partitioning performed best for clustering of the JLA data set. Our novel k-committees algorithm produced most accurate results for classification. Two new examples of synthetic data sets demonstrate that the authors’ k-committees algorithm can outperform both the Nearest Neighbour and k-medoids algorithms simultaneously.

Download Full-text

Performance Analysis of Machine Learning Algorithms for Cervical Cancer Detection

International Journal of Healthcare Information Systems and Informatics ◽

10.4018/ijhisi.2020040101 ◽

2020 ◽

Vol 15 (2) ◽

pp. 1-21 ◽

Cited By ~ 1

Author(s):

Sanjay Kumar Singh ◽

Anjali Goyal

Keyword(s):

Machine Learning ◽

Cervical Cancer ◽

Pap Smear ◽

Learning Algorithm ◽

Early Stage ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Data Sets ◽

Hybrid Techniques ◽

Cpu Time

Cervical cancer is second most prevailing cancer in women all over the world and the Pap smear is one of the most popular techniques used to diagnosis cervical cancer at an early stage. Developing countries like India has to face the challenges in order to handle more cases day by day. In this article, various online and offline machine learning algorithms has been applied on benchmarked data sets to detect cervical cancer. This article also addresses the problem of segmentation with hybrid techniques and optimizes the number of features using extra tree classifiers. Accuracy, precision score, recall score, and F1 score are increasing in the proportion of data for training and attained up to 100% by some algorithms. Algorithm like logistic regression with L1 regularization has an accuracy of 100%, but it is too much costly in terms of CPU time in comparison to some of the algorithms which obtain 99% accuracy with less CPU time. The key finding in this article is the selection of the best machine learning algorithm with the highest accuracy. Cost effectiveness in terms of CPU time is also analysed.

Download Full-text

Predicting risk of stroke from lab tests using machine learning algorithms (Preprint)

10.2196/preprints.23440 ◽

2020 ◽

Author(s):

Eman Alanazi ◽

Alaa Abdou ◽

Jake Luo

Keyword(s):

Machine Learning ◽

Risk Factors ◽

Prediction Models ◽

Machine Learning Algorithms ◽

Data Sets ◽

Health Records ◽

Time Prediction ◽

Health And Nutrition ◽

Different Types ◽

Potential Risk Factors

UNSTRUCTURED Stroke, a cerebrovascular disease, is one of the major causes of death. It is also causing a health burden for both the patients and the healthcare systems. One of the important risk factors of stroke is health behavior which is an increasing focus of prevention. In addition, chronic diseases such as hypertension, diabetes, cardiac diseases, and asthma are potential risk factors for stroke. There are a lot of machine learning that built using predictors such as lifestyle or radiology imaging. However, there are no models built using lab tests. The aim of the study is to fill this gap by building prediction models to predict stroke from lab tests. We utilized the National Health and Nutrition Examination Survey (NHNES) data sets to develop models that would predict stroke from patient lab tests. We found that accurate and sensitive machine learning models can be created to predict stroke from lab tests. The results showed that prediction with the best tested algorithm random forest could reach the highest accuracy (ACC = 0.96) when all the attributes were used. The model proposed can be integrated with electronic health records to provide a real-time prediction of stroke from lab tests. Due to the data, we could not predict the type of stroke wither hemorrigic or ischemic. In future studies, we aim to use data that provide different types of stroke and explore the data to build a prediction model of each type.

Download Full-text

QSRR Automator: A Tool for Automating Retention Time Prediction in Lipidomics and Metabolomics

Metabolites ◽

10.3390/metabo10060237 ◽

2020 ◽

Vol 10 (6) ◽

pp. 237 ◽

Cited By ~ 1

Author(s):

Bradley C. Naylor ◽

J. Leon Catrow ◽

J. Alan Maschek ◽

James E. Cox

Keyword(s):

Machine Learning ◽

Retention Time ◽

Learning Model ◽

Data Sets ◽

Retention Time Prediction ◽

Time Prediction ◽

Applied Machine Learning ◽

Machine Learning Model ◽

Testing Data ◽

Similar Accuracy

The use of retention time is often critical for the identification of compounds in metabolomic and lipidomic studies. Standards are frequently unavailable for the retention time measurement of many metabolites, thus the ability to predict retention time for these compounds is highly valuable. A number of studies have applied machine learning to predict retention times, but applying a published machine learning model to different lab conditions is difficult. This is due to variation between chromatographic equipment, methods, and columns used for analysis. Recreating a machine learning model is likewise difficult without a dedicated bioinformatician. Herein we present QSRR Automator, a software package to automate retention time prediction model creation and demonstrate its utility by testing data from multiple chromatography columns from previous publications and in-house work. Analysis of these data sets shows similar accuracy to published models, demonstrating the software’s utility in metabolomic and lipidomic studies.

Download Full-text

Intelligent system of English composition scoring model based on improved machine learning algorithm

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189235 ◽

2020 ◽

pp. 1-11

Author(s):

Jie Liu ◽

Lin Lin ◽

Xiufang Liang

Keyword(s):

Machine Learning ◽

Evaluation System ◽

Intelligent System ◽

Learning Algorithm ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Assessment System ◽

English Composition ◽

Region Extraction ◽

Constraint Model

The online English teaching system has certain requirements for the intelligent scoring system, and the most difficult stage of intelligent scoring in the English test is to score the English composition through the intelligent model. In order to improve the intelligence of English composition scoring, based on machine learning algorithms, this study combines intelligent image recognition technology to improve machine learning algorithms, and proposes an improved MSER-based character candidate region extraction algorithm and a convolutional neural network-based pseudo-character region filtering algorithm. In addition, in order to verify whether the algorithm model proposed in this paper meets the requirements of the group text, that is, to verify the feasibility of the algorithm, the performance of the model proposed in this study is analyzed through design experiments. Moreover, the basic conditions for composition scoring are input into the model as a constraint model. The research results show that the algorithm proposed in this paper has a certain practical effect, and it can be applied to the English assessment system and the online assessment system of the homework evaluation system algorithm system.

Download Full-text

A Robust Method to Predict Fluid Properties Based on Big Data and Machine Learning Algorithms

10.2523/iptc-21356-ms ◽

2021 ◽

Author(s):

Yingxian Liu ◽

Cunliang Chen ◽

Hanqing Zhao ◽

Yu Wang ◽

Xiaodong Han

Keyword(s):

Machine Learning ◽

Physical Properties ◽

Learning Algorithm ◽

Direct Method ◽

Learning Algorithms ◽

Small Error ◽

Machine Learning Algorithms ◽

Well Test ◽

Empirical Formulas ◽

Fluid Properties

Abstract Fluid properties are key factors for predicting single well productivity, well test interpretation and oilfield recovery prediction, which directly affect the success of ODP program design. The most accurate and direct method of acquisition is underground sampling. However, not every well has samples due to technical reasons such as excessive well deviation or high cost during the exploration stage. Therefore, analogies or empirical formulas have to be adopted to carry out research in many cases. But a large number of oilfield developments have shown that the errors caused by these methods are very large. Therefore, how to quickly and accurately obtain fluid physical properties is of great significance. In recent years, with the development and improvement of artificial intelligence or machine learning algorithms, their applications in the oilfield have become more and more extensive. This paper proposed a method for predicting crude oil physical properties based on machine learning algorithms. This method uses PVT data from nearly 100 wells in Bohai Oilfield. 75% of the data is used for training and learning to obtain the prediction model, and the remaining 25% is used for testing. Practice shows that the prediction results of the machine learning algorithm are very close to the actual data, with a very small error. Finally, this method was used to apply the preliminary plan design of the BZ29 oilfield which is a new oilfield. Especially for the unsampled sand bodies, the fluid physical properties prediction was carried out. It also compares the influence of the analogy method on the scheme, which provides potential and risk analysis for scheme design. This method will be applied in more oil fields in the Bohai Sea in the future and has important promotion value.

Download Full-text