Random Forests Highlight the Combined Effect of Environmental Heavy Metals Exposure and Genetic Damages for Cardiovascular Diseases

Alfonso Monaco; Antonio Lacalamita; Nicola Amoroso; Armando D’Orta; Andrea Del Buono; Francesco di Tuoro; Sabina Tangaro; Aldo Innocente Galeandro; Roberto Bellotti

doi:10.3390/app11188405

Random Forests Highlight the Combined Effect of Environmental Heavy Metals Exposure and Genetic Damages for Cardiovascular Diseases

Applied Sciences ◽

10.3390/app11188405 ◽

2021 ◽

Vol 11 (18) ◽

pp. 8405

Author(s):

Alfonso Monaco ◽

Antonio Lacalamita ◽

Nicola Amoroso ◽

Armando D’Orta ◽

Andrea Del Buono ◽

...

Keyword(s):

Machine Learning ◽

Heavy Metals ◽

Cardiovascular Diseases ◽

Random Forests ◽

Classification Problem ◽

Classification Performance ◽

Cvd Risk ◽

Learning Framework ◽

Clinic Foundation ◽

Highly Correlated

Heavy metals are a dangerous source of pollution due to their toxicity, permanence in the environment and chemical nature. It is well known that long-term exposure to heavy metals is related to several chronic degenerative diseases (cardiovascular diseases, neoplasms, neurodegenerative syndromes, etc.). In this work, we propose a machine learning framework to evaluate the severity of cardiovascular diseases (CVD) from Human scalp hair analysis (HSHA) tests and genetic analysis and identify a small group of these clinical features mostly associated with the CVD risk. Using a private dataset provided by the DD Clinic foundation in Caserta, Italy, we cross-validated the classification performance of a Random Forests model with 90 subjects affected by CVD. The proposed model reached an AUC of 0.78 ± 0.01 on a three class classification problem. The robustness of the predictions was assessed by comparison with different cross-validation schemes and two state-of-the-art classifiers, such as Artificial Neural Network and General Linear Model. Thus, is the first work that studies, through a machine learning approach, the tight link between CVD severity, heavy metal concentrations and SNPs. Then, the selected features appear highly correlated with the CVD phenotype, and they could represent targets for future CVD therapies.

Download Full-text

Phospholipids are A Potentially Important Source of Tissue Biomarkers for Hepatocellular Carcinoma: Results of a Pilot Study Involving Targeted Metabolomics

Diagnostics ◽

10.3390/diagnostics9040167 ◽

2019 ◽

Vol 9 (4) ◽

pp. 167

Author(s):

Erin B. Evangelista ◽

Sandi A. Kwee ◽

Miles M. Sato ◽

Lu Wang ◽

Christoph Rettenmeier ◽

...

Keyword(s):

Machine Learning ◽

Hepatocellular Carcinoma ◽

Fatty Acids ◽

Free Fatty Acids ◽

Small Molecules ◽

Bile Acids ◽

Random Forests ◽

Liver Tissue ◽

Classification Performance ◽

Tissue Biomarkers

Background: Hepatocellular carcinoma (HCC) pathogenesis involves the alteration of multiple liver-specific metabolic pathways. We systematically profiled cancer- and liver-related classes of metabolites in HCC and adjacent liver tissues and applied supervised machine learning to compare their potential yield for HCC biomarkers. Methods: Tumor and corresponding liver tissue samples were profiled as follows: Bile acids by ultra-performance liquid chromatography (LC) coupled to tandem mass spectrometry (MS), phospholipids by LC-MS/MS, and other small molecules including free fatty acids by gas chromatography—time of flight MS. The overall classification performance of metabolomic signatures derived by support vector machine (SVM) and random forests machine learning algorithms was then compared across classes of metabolite. Results: For each metabolite class, there was a plateau in classification performance with signatures of 10 metabolites. Phospholipid signatures consistently showed the highest discrimination for HCC followed by signatures derived from small molecules, free fatty acids, and bile acids with area under the receiver operating characteristic curve (AUC) values of 0.963, 0.934, 0.895, 0.695, respectively, for SVM-generated signatures comprised of 10 metabolites. Similar classification performance patterns were observed with signatures derived by random forests. Conclusion: Membrane phospholipids are a promising source of tissue biomarkers for discriminating between HCC tumor and liver tissue.

Download Full-text

Margin-Based Pareto Ensemble Pruning: An Ensemble Pruning Algorithm That Learns to Search Optimized Ensembles

Computational Intelligence and Neuroscience ◽

10.1155/2019/7560872 ◽

2019 ◽

Vol 2019 ◽

pp. 1-12 ◽

Cited By ~ 2

Author(s):

Ruihan Hu ◽

Songbin Zhou ◽

Yisen Liu ◽

Zhiri Tang

Keyword(s):

Machine Learning ◽

State Of The Art ◽

Classification Performance ◽

Test Set ◽

Pruning Algorithm ◽

Ensemble Pruning ◽

Learning Framework ◽

Classification Tasks ◽

Validation Set ◽

Definition Of

The ensemble pruning system is an effective machine learning framework that combines several learners as experts to classify a test set. Generally, ensemble pruning systems aim to define a region of competence based on the validation set to select the most competent ensembles from the ensemble pool with respect to the test set. However, the size of the ensemble pool is usually fixed, and the performance of an ensemble pool heavily depends on the definition of the region of competence. In this paper, a dynamic pruning framework called margin-based Pareto ensemble pruning is proposed for ensemble pruning systems. The framework explores the optimized ensemble pool size during the overproduction stage and finetunes the experts during the pruning stage. The Pareto optimization algorithm is used to explore the size of the overproduction ensemble pool that can result in better performance. Considering the information entropy of the learners in the indecision region, the marginal criterion for each learner in the ensemble pool is calculated using margin criterion pruning, which prunes the experts with respect to the test set. The effectiveness of the proposed method for classification tasks is assessed using datasets. The results show that margin-based Pareto ensemble pruning can achieve smaller ensemble sizes and better classification performance in most datasets when compared with state-of-the-art models.

Download Full-text

Improved COVID-19 Serology Test Performance by Integrating Multiple Lateral Flow Assays using Machine Learning

10.1101/2020.07.15.20154773 ◽

2020 ◽

Author(s):

Cody T Mowery ◽

Alexander Marson ◽

Yun S Song ◽

Chun Jimmie Ye

Keyword(s):

Machine Learning ◽

Test Performance ◽

Classification Performance ◽

Lateral Flow ◽

Past Infection ◽

Learning Framework ◽

Lateral Flow Assays ◽

Serology Test ◽

Low Performance ◽

Existing Data

Mitigating transmission of SARS-CoV-2 has been complicated by the inaccessibility and, in some cases, inadequacy of testing options to detect present or past infection. Immunochromatographic lateral flow assays (LFAs) are a cheap and scalable modality for tracking viral transmission by testing for serological immunity, though systematic evaluations have revealed the low performance of some SARS-CoV-2 LFAs. Here, we re-analyzed existing data to present a proof-of-principle machine learning framework that may be used to inform the pairing of LFAs to achieve superior classification performance while enabling tunable False Positive Rates optimized for the estimated seroprevalence of the population being tested.

Download Full-text

An Integrated Machine Learning Framework for Effective Prediction of Cardiovascular Diseases

IEEE Access ◽

10.1109/access.2021.3098688 ◽

2021 ◽

pp. 1-1

Author(s):

Aqsa Rahim ◽

Yawar Rasheed ◽

Farooque Azam ◽

Muhammad Waseem Anwar ◽

Muhammad Abdul Rahim ◽

...

Keyword(s):

Machine Learning ◽

Cardiovascular Diseases ◽

Learning Framework

Download Full-text

Machine learning for predicting the outcomes and risks of cardiovascular diseases in patients with hypertension: results of ESSE-RF in the Primorsky Krai

Russian Journal of Cardiology ◽

10.15829/1560-4071-2020-3-3751 ◽

2020 ◽

Vol 25 (3) ◽

pp. 3751

Author(s):

V. A. Nevzorova ◽

N. G. Plekhova ◽

L. G. Priseko ◽

I. N. Chernenko ◽

D. Yu. Bogdanov ◽

...

Keyword(s):

Machine Learning ◽

Cardiovascular Diseases ◽

Exception Handling ◽

Object Oriented Programming ◽

Biochemical Profile ◽

Anthropometric Parameters ◽

Cvd Risk ◽

Software Application ◽

Reactive Protein ◽

High Level

Aim. To assess the prospects of using artificial intelligence technologies in predicting the outcomes and risks of cardiovascular diseases (CVD) in patients with hypertension (HTN).Material and methods. A software application was created for data mining from respondent profiles in a semi-automatic mode; libraries with data preprocessing were analyzed. We analyzed the main and additional parameters (35) of CVD risk factors in 2131 people as a part of ESSE-RF study (2014-2019). To create a forecasting model, a high-level language Python 2.7 was used using object-oriented programming and exception handling with multithreading support. Using randomization, learning (n=488) and test (n=245) samples were formed, which included data from patients with an established diagnosis of HTN.Results. The prevalence of HTN among subjects was 34,39%. There were following significant factors for predicting CVD: anthropometric parameters, smoking, biochemical profile (total cholesterol, ApoA, ApoB, glucose, D-dimer, C-reactive protein). As a result of a 5-year follow-up, CVD was found in 235 people (32,06%) with HTN and 187 people (13,38%) without HTN; mortality rates were 1,27% in subjects with HTN and 1,12% — without HTN. The absolute mortality risk among participants with HTN (0,037) was significantly higher (p<0,05) than in patients without HTN (0,017). To create a neural network (NN), the basic Sequential model from the Keras library was used. During machine learning, 26 variables important for the CVD development were used as input and 9 neurons — as output, which corresponded to the number of established cardiovascular events. The created NN had a predictive value of up to 97,9%, which exceeded the SCORE value (34,9%).Conclusion. The data obtained indicate the importance of risk factor phenotyping using anthropometric markers and biochemical profile for determining their significance in the top 20 predictors of CVD. The Python-based machine learning provides CVD prediction according to standard risk assessments.

Download Full-text

Predictors of remission from body dysmorphic disorder after internet-delivered cognitive behavior therapy: a machine learning approach

10.31234/osf.io/eqcdx ◽

2019 ◽

Author(s):

Oskar Flygare ◽

Jesper Enander ◽

Erik Andersson ◽

Brjánn Ljótsson ◽

Volen Z Ivanov ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forests ◽

Clinical Utility ◽

Body Dysmorphic Disorder ◽

Prediction Models ◽

Behavioral Therapy ◽

Learning Approach ◽

Learning Approaches ◽

Machine Learning Approach

**Background:** Previous attempts to identify predictors of treatment outcomes in body dysmorphic disorder (BDD) have yielded inconsistent findings. One way to increase precision and clinical utility could be to use machine learning methods, which can incorporate multiple non-linear associations in prediction models. **Methods:** This study used a random forests machine learning approach to test if it is possible to reliably predict remission from BDD in a sample of 88 individuals that had received internet-delivered cognitive behavioral therapy for BDD. The random forest models were compared to traditional logistic regression analyses. **Results:** Random forests correctly identified 78% of participants as remitters or non-remitters at post-treatment. The accuracy of prediction was lower in subsequent follow-ups (68%, 66% and 61% correctly classified at 3-, 12- and 24-month follow-ups, respectively). Depressive symptoms, treatment credibility, working alliance, and initial severity of BDD were among the most important predictors at the beginning of treatment. By contrast, the logistic regression models did not identify consistent and strong predictors of remission from BDD. **Conclusions:** The results provide initial support for the clinical utility of machine learning approaches in the prediction of outcomes of patients with BDD. **Trial registration:** ClinicalTrials.gov ID: NCT02010619.

Download Full-text

Binary Spectrum Feature for Improved Classiﬁer Performance

10.36227/techrxiv.12993122 ◽

2020 ◽

Author(s):

Nalika Ulapane ◽

Karthick Thiyagarajan ◽

sarath kodagoda

Keyword(s):

Machine Learning ◽

Classification Performance ◽

Feature Reduction ◽

Sensor Data ◽

Machine Learning Techniques ◽

Support Vector ◽

Svm Classifier ◽

Monitoring Task ◽

Classifier Performance ◽

Spectrum Feature

<div>Classiﬁcation has become a vital task in modern machine learning and Artiﬁcial Intelligence applications, including smart sensing. Numerous machine learning techniques are available to perform classiﬁcation. Similarly, numerous practices, such as feature selection (i.e., selection of a subset of descriptor variables that optimally describe the output), are available to improve classiﬁer performance. In this paper, we consider the case of a given supervised learning classiﬁcation task that has to be performed making use of continuous-valued features. It is assumed that an optimal subset of features has already been selected. Therefore, no further feature reduction, or feature addition, is to be carried out. Then, we attempt to improve the classiﬁcation performance by passing the given feature set through a transformation that produces a new feature set which we have named the “Binary Spectrum”. Via a case study example done on some Pulsed Eddy Current sensor data captured from an infrastructure monitoring task, we demonstrate how the classiﬁcation accuracy of a Support Vector Machine (SVM) classiﬁer increases through the use of this Binary Spectrum feature, indicating the feature transformation’s potential for broader usage.</div><div><br></div>

Download Full-text

An Introduction to Machine Learning for Panel Data: Decision Trees, Random Forests, and Other Dendrological Methods

SSRN Electronic Journal ◽

10.2139/ssrn.3717879 ◽

2020 ◽

Author(s):

James Ming Chen

Keyword(s):

Machine Learning ◽

Panel Data ◽

Decision Trees ◽

Random Forests

Download Full-text

Does HbA1cc Play a Role in the Development of Cardiovascular Diseases?

Current Pharmaceutical Design ◽

10.2174/1381612824666180903121957 ◽

2018 ◽

Vol 24 (24) ◽

pp. 2876-2882 ◽

Cited By ~ 4

Author(s):

Kailash Prasad

Keyword(s):

Risk Factors ◽

Cardiovascular Diseases ◽

Cardiovascular System ◽

Half Life ◽

Serum Levels ◽

Serum Glucose ◽

Cvd Risk ◽

Factors Affecting ◽

Diagnosis And Management ◽

Coronary Artery Atherosclerosis

Cardiovascular diseases (CVD) may be mediated through increases in the cardiovascular risk factors. Hemoglobin A1c (HbA1c) also called glycated hemoglobin is presently used for the diagnosis and management of diabetes. It has adverse effects on cardiovascular system. This review deals with its synthesis and effects on the cardiovascular system. The serum levels of HbA1c have been reported to be affected by various factors including, the lifespan of erythrocytes, factors affecting erythropoiesis, agents interfering glycation of Hb, destruction of erythrocytes, drugs that shift the formation of Hb, statins, and drugs interfering the HbA1c assay. Levels of HbA1c are positively correlated with serum glucose and advanced glycation end products ( AGE), but no correlation between AGE and serum glucose. AGE cannot replace HbA1c for the diagnosis and management of diabetes because there is no correlation of AGE with serum glucose, and because the half-life of protein with which glucose combines is only 14-20 days as compared to erythrocytes which have a half-life of 90-120 days. HbA1c is positively associated with CVD such as the carotid and coronary artery atherosclerosis, ischemic heart disease, ischemic stroke and hypertension.HbA1c induces dyslipidemia, hyperhomocysteinemia, and hypertension, and increases C-reactive protein, oxidative stress and blood viscosity that would contribute to the development of cardiovascular diseases. In conclusion, HbA1c serves as a useful marker for the diagnosis and management of diabetes. AGE cannot replace HbA1c in the diagnosis and management of diabetes. There is an association of HbA1c with CVD which be mediated through modulation of CVD risk factors.

Download Full-text

Document Preprocessing with TF-IDF to Improve the Polarity Classification Performance of Unstructured Sentiment Analysis

Kinetik Game Technology Information System Computer Network Computing Electronics and Control ◽

10.22219/kinetik.v5i3.1066 ◽

2020 ◽

pp. 235-242

Author(s):

Farrikh Alzami ◽

Erika Devi Udayanti ◽

Dwi Puji Prabowo ◽

Rama Aria Megantara

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Random Forest ◽

Sentiment Analysis ◽

Classification Performance ◽

Document Preparation ◽

Learning Models ◽

Polarity Classification ◽

Negative Sentiment ◽

Machine Learning Models

Sentiment analysis in terms of polarity classification is very important in everyday life, with the existence of polarity, many people can find out whether the respected document has positive or negative sentiment so that it can help in choosing and making decisions. Sentiment analysis usually done manually. Therefore, an automatic sentiment analysis classification process is needed. However, it is rare to find studies that discuss extraction features and which learning models are suitable for unstructured sentiment analysis types with the Amazon food review case. This research explores some extraction features such as Word Bags, TF-IDF, Word2Vector, as well as a combination of TF-IDF and Word2Vector with several machine learning models such as Random Forest, SVM, KNN and Naïve Bayes to find out a combination of feature extraction and learning models that can help add variety to the analysis of polarity sentiments. By assisting with document preparation such as html tags and punctuation and special characters, using snowball stemming, TF-IDF results obtained with SVM are suitable for obtaining a polarity classification in unstructured sentiment analysis for the case of Amazon food review with a performance result of 87,3 percent.

Download Full-text