Prediction Power on Cardiovascular Disease of Neuroimmune Guidance Cues Expression by Peripheral Blood Monocytes Determined by Machine-Learning Methods

Huayu Zhang; Edwin O. W. Bredewold; Dianne Vreeken; Jacques. M. G. J. Duijs; Hetty C. de Boer; Adriaan O. Kraaijeveld; J. Wouter Jukema; Nico H. Pijls; Johannes Waltenberger; Erik A.L. Biessen; Eric P. van der Veer; Anton Jan van Zonneveld; Janine M. van Gils

doi:10.3390/ijms21176364

Prediction Power on Cardiovascular Disease of Neuroimmune Guidance Cues Expression by Peripheral Blood Monocytes Determined by Machine-Learning Methods

International Journal of Molecular Sciences ◽

10.3390/ijms21176364 ◽

2020 ◽

Vol 21 (17) ◽

pp. 6364

Author(s):

Huayu Zhang ◽

Edwin O. W. Bredewold ◽

Dianne Vreeken ◽

Jacques. M. G. J. Duijs ◽

Hetty C. de Boer ◽

...

Keyword(s):

Machine Learning ◽

Cardiovascular Disease ◽

Disease Outcome ◽

Study Cohort ◽

Learning Methods ◽

Predictive Values ◽

Linear Discriminant ◽

Guidance Cues ◽

Machine Learning Methods ◽

Circulating Cells

Atherosclerosis is the underlying pathology in a major part of cardiovascular disease, the leading cause of mortality in developed countries. The infiltration of monocytes into the vessel walls of large arteries is a key denominator of atherogenesis, making monocytes accountable for the development of atherosclerosis. With the development of high-throughput transcriptome profiling platforms and cytometric methods for circulating cells, it is now feasible to study in-depth the predicted functional change of circulating monocytes reflected by changes of gene expression in certain pathways and correlate the changes to disease outcome. Neuroimmune guidance cues comprise a group of circulating- and cell membrane-associated signaling proteins that are progressively involved in monocyte functions. Here, we employed the CIRCULATING CELLS study cohort to classify cardiovascular disease patients and healthy individuals in relation to their expression of neuroimmune guidance cues in circulating monocytes. To cope with the complexity of human datasets featured by noisy data, nonlinearity and multidimensionality, we assessed various machine-learning methods. Of these, the linear discriminant analysis, Naïve Bayesian model and stochastic gradient boost model yielded perfect or near-perfect sensibility and specificity and revealed that expression levels of the neuroimmune guidance cues SEMA6B, SEMA6D and EPHA2 in circulating monocytes were of predictive values for cardiovascular disease outcome.

Download Full-text

Comparison of machine learning methods for the classification of cardiovascular disease

Informatics in Medicine Unlocked ◽

10.1016/j.imu.2021.100606 ◽

2021 ◽

pp. 100606

Author(s):

Rachael Hagan ◽

Charles J. Gillan ◽

Fiona Mallett

Keyword(s):

Machine Learning ◽

Cardiovascular Disease ◽

Learning Methods ◽

Machine Learning Methods

Download Full-text

Accurate Prediction of Coronary Heart Disease for Patients With Hypertension From Electronic Health Records With Big Data and Machine-Learning Methods: Model Development and Performance Evaluation (Preprint)

10.2196/preprints.17257 ◽

2019 ◽

Author(s):

Zhenzhen Du ◽

Yujie Yang ◽

Jing Zheng ◽

Qi Li ◽

Denan Lin ◽

...

Keyword(s):

Machine Learning ◽

Cardiovascular Disease ◽

Coronary Heart Disease ◽

Big Data ◽

Heart Disease ◽

Large Population ◽

High Accuracy ◽

Learning Methods ◽

Health Records ◽

Machine Learning Methods

BACKGROUND Predictions of cardiovascular disease risks based on health records have long attracted broad research interests. Despite extensive efforts, the prediction accuracy has remained unsatisfactory. This raises the question as to whether the data insufficiency, statistical and machine-learning methods, or intrinsic noise have hindered the performance of previous approaches, and how these issues can be alleviated. OBJECTIVE Based on a large population of patients with hypertension in Shenzhen, China, we aimed to establish a high-precision coronary heart disease (CHD) prediction model through big data and machine-learning METHODS Data from a large cohort of 42,676 patients with hypertension, including 20,156 patients with CHD onset, were investigated from electronic health records (EHRs) 1-3 years prior to CHD onset (for CHD-positive cases) or during a disease-free follow-up period of more than 3 years (for CHD-negative cases). The population was divided evenly into independent training and test datasets. Various machine-learning methods were adopted on the training set to achieve high-accuracy prediction models and the results were compared with traditional statistical methods and well-known risk scales. Comparison analyses were performed to investigate the effects of training sample size, factor sets, and modeling approaches on the prediction performance. RESULTS An ensemble method, XGBoost, achieved high accuracy in predicting 3-year CHD onset for the independent test dataset with an area under the receiver operating characteristic curve (AUC) value of 0.943. Comparison analysis showed that nonlinear models (K-nearest neighbor AUC 0.908, random forest AUC 0.938) outperform linear models (logistic regression AUC 0.865) on the same datasets, and machine-learning methods significantly surpassed traditional risk scales or fixed models (eg, Framingham cardiovascular disease risk models). Further analyses revealed that using time-dependent features obtained from multiple records, including both statistical variables and changing-trend variables, helped to improve the performance compared to using only static features. Subpopulation analysis showed that the impact of feature design had a more significant effect on model accuracy than the population size. Marginal effect analysis showed that both traditional and EHR factors exhibited highly nonlinear characteristics with respect to the risk scores. CONCLUSIONS We demonstrated that accurate risk prediction of CHD from EHRs is possible given a sufficiently large population of training data. Sophisticated machine-learning methods played an important role in tackling the heterogeneity and nonlinear nature of disease prediction. Moreover, accumulated EHR data over multiple time points provided additional features that were valuable for risk prediction. Our study highlights the importance of accumulating big data from EHRs for accurate disease predictions.

Download Full-text

Predictions of chalcospinels with composition ABCX4 (X – S or Se)

Perspektivnye Materialy ◽

10.30791/1028-978x-2020-7-5-18 ◽

2020 ◽

pp. 5-18

Author(s):

N. N. Kiselyova ◽

◽

V. A. Dudarev ◽

V. V. Ryazanov ◽

O. V. Sen’ko ◽

...

Keyword(s):

Machine Learning ◽

Crystal Lattice ◽

Prediction Accuracy ◽

Cross Validation ◽

Chemical Elements ◽

Optical Memory ◽

Support Vector ◽

Learning Methods ◽

Linear Discriminant ◽

Machine Learning Methods

New chalcospinels of the most common compositions were predicted: AIBIIICIVX4 (X — S or Se) and AIIBIIICIIIS4 (A, B, and C are various chemical elements). They are promising for the search for new materials for magneto-optical memory elements, sensors and anodes in sodium-ion batteries. The parameter “a” values of their crystal lattice are estimated. When predicting only the values of chemical elements properties were used. The calculations were carried out using machine learning programs that are part of the information-analytical system developed by the authors (various ensembles of algorithms of: the binary decision trees, the linear machine, the search for logical regularities of classes, the support vector machine, Fisher linear discriminant, the k-nearest neighbors, the learning a multilayer perceptron and a neural network), — for predicting chalcospinels not yet obtained, as well as an extensive family of regression methods, presented in the scikit-learn package for the Python language, and multilevel machine learning methods that were proposed by the authors — for estimation of the new chalcospinels lattice parameter value). The prediction accuracy of new chalcospinels according to the results of the cross-validation is not lower than 80%, and the prediction accuracy of the parameter of their crystal lattice (according to the results of calculating the mean absolute error (when cross-validation in the leave-one-out mode)) is ± 0.1 Å. The effectiveness of using multilevel machine learning methods to predict the physical properties of substances was shown.

Download Full-text

Harvesting Brain Signal using Machine Learning Methods

Journal of Engineering and Science in Medical Diagnostics and Therapy ◽

10.1115/1.4053064 ◽

2021 ◽

Author(s):

Kevin Matsuno ◽

Vidya Nandikolla

Keyword(s):

Machine Learning ◽

Error Rate ◽

Learning Algorithm ◽

Relevance Vector Machine ◽

Event Related Potential ◽

Machine Learning Algorithm ◽

Learning Methods ◽

Linear Discriminant ◽

Machine Learning Methods ◽

Physical Task

Abstract Brain computer interface (BCI) systems are developed in biomedical fields to increase the quality of life. The development of a six class BCI controller to operate a semi-autonomous robotic arm is presented. The controller uses the following mental tasks: imagined left/right hand squeeze, imagined left/right foot tap, rest, one physical task, and jaw clench. To design a controller, the locations of active electrodes are verified and an appropriate machine learning algorithm is determined. Three subjects, ages ranging between 22-27, participated in five sessions of motor imagery experiments to record their brainwaves. These recordings were analyzed using event related potential plots and topographical maps to determine active electrodes. BCILAB was used to train two, three, five, and six class BCI controllers using linear discriminant analysis (LDA) and relevance vector machine (RVM) machine learning methods. The subjects' data was used to compare the two-method's performance in terms of error rate percentage. While a two class BCI controller showed the same accuracy for both methods, the three and five class BCI controllers showed the RVM approach having a higher accuracy than the LDA approach. For the five-class controller, error rate percentage was 33.3% for LDA and 29.2% for RVM. The six class BCI controller error rate percentage for both LDA and RVM was 34.5%. While the percentage values are the same, RVM was chosen as the desired machine learning algorithm based on the trend seen in the three and five class controller performances.

Download Full-text

Predicting COVID-19 mortality risk in Toronto, Canada: a comparison of tree-based and regression-based machine learning methods

BMC Medical Research Methodology ◽

10.1186/s12874-021-01441-4 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Cindy Feng ◽

George Kephart ◽

Elizabeth Juarez-Colunga

Keyword(s):

Machine Learning ◽

Mortality Risk ◽

Predictive Accuracy ◽

Classification Tree ◽

Superior Performance ◽

Gradient Boosting ◽

Learning Methods ◽

Linear Discriminant ◽

Machine Learning Methods ◽

Extreme Gradient Boosting

Abstract Background Coronavirus disease (COVID-19) presents an unprecedented threat to global health worldwide. Accurately predicting the mortality risk among the infected individuals is crucial for prioritizing medical care and mitigating the healthcare system’s burden. The present study aimed to assess the predictive accuracy of machine learning methods to predict the COVID-19 mortality risk. Methods We compared the performance of classification tree, random forest (RF), extreme gradient boosting (XGBoost), logistic regression, generalized additive model (GAM) and linear discriminant analysis (LDA) to predict the mortality risk among 49,216 COVID-19 positive cases in Toronto, Canada, reported from March 1 to December 10, 2020. We used repeated split-sample validation and k-steps-ahead forecasting validation. Predictive models were estimated using training samples, and predictive accuracy of the methods for the testing samples was assessed using the area under the receiver operating characteristic curve, Brier’s score, calibration intercept and calibration slope. Results We found XGBoost is highly discriminative, with an AUC of 0.9669 and has superior performance over conventional tree-based methods, i.e., classification tree or RF methods for predicting COVID-19 mortality risk. Regression-based methods (logistic, GAM and LASSO) had comparable performance to the XGBoost with slightly lower AUCs and higher Brier’s scores. Conclusions XGBoost offers superior performance over conventional tree-based methods and minor improvement over regression-based methods for predicting COVID-19 mortality risk in the study population.

Download Full-text

Determination of the Geographical Origin of Coffee Beans Using Terahertz Spectroscopy Combined With Machine Learning Methods

Frontiers in Nutrition ◽

10.3389/fnut.2021.680627 ◽

2021 ◽

Vol 8 ◽

Author(s):

Si Yang ◽

Chenxi Li ◽

Yang Mei ◽

Wen Liu ◽

Rong Liu ◽

...

Keyword(s):

Machine Learning ◽

Terahertz Spectroscopy ◽

Geographic Origin ◽

Principal Component ◽

Support Vector ◽

Thz Spectroscopy ◽

Learning Methods ◽

Linear Discriminant ◽

Machine Learning Methods ◽

Coffee Beans

Different geographical origins can lead to great variance in coffee quality, taste, and commercial value. Hence, controlling the authenticity of the origin of coffee beans is of great importance for producers and consumers worldwide. In this study, terahertz (THz) spectroscopy, combined with machine learning methods, was investigated as a fast and non-destructive method to classify the geographic origin of coffee beans, comparing it with the popular machine learning methods, including convolutional neural network (CNN), linear discriminant analysis (LDA), and support vector machine (SVM) to obtain the best model. The curse of dimensionality will cause some classification methods which are struggling to train effective models. Thus, principal component analysis (PCA) and genetic algorithm (GA) were applied for LDA and SVM to create a smaller set of features. The first nine principal components (PCs) with an accumulative contribution rate of 99.9% extracted by PCA and 21 variables selected by GA were the inputs of LDA and SVM models. The results demonstrate that the excellent classification (accuracy was 90% in a prediction set) could be achieved using a CNN method. The results also indicate variable selecting as an important step to create an accurate and robust discrimination model. The performances of LDA and SVM algorithms could be improved with spectral features extracted by PCA and GA. The GA-SVM has achieved 75% accuracy in a prediction set, while the SVM and PCA-SVM have achieved 50 and 65% accuracy, respectively. These results demonstrate that THz spectroscopy, together with machine learning methods, is an effective and satisfactory approach for classifying geographical origins of coffee beans, suggesting the techniques to tap the potential application of deep learning in the authenticity of agricultural products while expanding the application of THz spectroscopy.

Download Full-text

Accurate Prediction of Coronary Heart Disease for Patients With Hypertension From Electronic Health Records With Big Data and Machine-Learning Methods: Model Development and Performance Evaluation

JMIR Medical Informatics ◽

10.2196/17257 ◽

2020 ◽

Vol 8 (7) ◽

pp. e17257

Author(s):

Zhenzhen Du ◽

Yujie Yang ◽

Jing Zheng ◽

Qi Li ◽

Denan Lin ◽

...

Keyword(s):

Machine Learning ◽

Cardiovascular Disease ◽

Coronary Heart Disease ◽

Big Data ◽

Heart Disease ◽

Large Population ◽

High Accuracy ◽

Learning Methods ◽

Health Records ◽

Machine Learning Methods

Background Predictions of cardiovascular disease risks based on health records have long attracted broad research interests. Despite extensive efforts, the prediction accuracy has remained unsatisfactory. This raises the question as to whether the data insufficiency, statistical and machine-learning methods, or intrinsic noise have hindered the performance of previous approaches, and how these issues can be alleviated. Objective Based on a large population of patients with hypertension in Shenzhen, China, we aimed to establish a high-precision coronary heart disease (CHD) prediction model through big data and machine-learning Methods Data from a large cohort of 42,676 patients with hypertension, including 20,156 patients with CHD onset, were investigated from electronic health records (EHRs) 1-3 years prior to CHD onset (for CHD-positive cases) or during a disease-free follow-up period of more than 3 years (for CHD-negative cases). The population was divided evenly into independent training and test datasets. Various machine-learning methods were adopted on the training set to achieve high-accuracy prediction models and the results were compared with traditional statistical methods and well-known risk scales. Comparison analyses were performed to investigate the effects of training sample size, factor sets, and modeling approaches on the prediction performance. Results An ensemble method, XGBoost, achieved high accuracy in predicting 3-year CHD onset for the independent test dataset with an area under the receiver operating characteristic curve (AUC) value of 0.943. Comparison analysis showed that nonlinear models (K-nearest neighbor AUC 0.908, random forest AUC 0.938) outperform linear models (logistic regression AUC 0.865) on the same datasets, and machine-learning methods significantly surpassed traditional risk scales or fixed models (eg, Framingham cardiovascular disease risk models). Further analyses revealed that using time-dependent features obtained from multiple records, including both statistical variables and changing-trend variables, helped to improve the performance compared to using only static features. Subpopulation analysis showed that the impact of feature design had a more significant effect on model accuracy than the population size. Marginal effect analysis showed that both traditional and EHR factors exhibited highly nonlinear characteristics with respect to the risk scores. Conclusions We demonstrated that accurate risk prediction of CHD from EHRs is possible given a sufficiently large population of training data. Sophisticated machine-learning methods played an important role in tackling the heterogeneity and nonlinear nature of disease prediction. Moreover, accumulated EHR data over multiple time points provided additional features that were valuable for risk prediction. Our study highlights the importance of accumulating big data from EHRs for accurate disease predictions.

Download Full-text

Phase Angle determinants in patients with cardiovascular disease using machine learning methods

Health and Technology ◽

10.1007/s12553-021-00622-x ◽

2021 ◽

Author(s):

Seyed Amir Tabatabaei Hosseini ◽

Fariborz Rahimi ◽

Mahdad Esmaeili ◽

Mohammad Khalili

Keyword(s):

Machine Learning ◽

Cardiovascular Disease ◽

Phase Angle ◽

Learning Methods ◽

Machine Learning Methods

Download Full-text

The Tomatoes and Chilies Type Classifications by Using Machine Learning Methods

Journal of Development Research ◽

10.28926/jdr.v4i1.93 ◽

2020 ◽

Vol 4 (1) ◽

pp. 1-6

Author(s):

Irzal Ahmad Sabilla ◽

Chastine Fatichah

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Support Vector ◽

Staple Food ◽

K Nearest Neighbor ◽

Learning Methods ◽

Linear Discriminant ◽

Machine Learning Methods

Vegetables are ingredients for flavoring, such as tomatoes and chilies. A Both of these ingredients are processed to accompany the people's staple food in the form of sauce and seasoning. In supermarkets, these vegetables can be found easily, but many people do not understand how to choose the type and quality of chilies and tomatoes. This study discusses the classification of types of cayenne, curly, green, red chilies, and tomatoes with good and bad conditions using machine learning and contrast enhancement techniques. The machine learning methods used are Support Vector Machine (SVM), K-Nearest Neighbor (K-NN), Linear Discriminant Analysis (LDA), and Random Forest (RF). The results of testing the best method are measured based on the value of accuracy. In addition to the accuracy of this study, it also measures the speed of computation so that the methods used are efficient.

Download Full-text

Construction of environmental risk score beyond standard linear models using machine learning methods: application to metal mixtures, oxidative stress and cardiovascular disease in NHANES

Environmental Health ◽

10.1186/s12940-017-0310-9 ◽

2017 ◽

Vol 16 (1) ◽

Cited By ~ 24

Author(s):

Sung Kyun Park ◽

Zhangchen Zhao ◽

Bhramar Mukherjee

Keyword(s):

Oxidative Stress ◽

Machine Learning ◽

Cardiovascular Disease ◽

Risk Score ◽

Environmental Risk ◽

Linear Models ◽

Learning Methods ◽

Metal Mixtures ◽

Machine Learning Methods ◽

Standard Linear

Download Full-text