scholarly journals Random Forests Highlight the Combined Effect of Environmental Heavy Metals Exposure and Genetic Damages for Cardiovascular Diseases

2021 ◽  
Vol 11 (18) ◽  
pp. 8405
Author(s):  
Alfonso Monaco ◽  
Antonio Lacalamita ◽  
Nicola Amoroso ◽  
Armando D’Orta ◽  
Andrea Del Buono ◽  
...  

Heavy metals are a dangerous source of pollution due to their toxicity, permanence in the environment and chemical nature. It is well known that long-term exposure to heavy metals is related to several chronic degenerative diseases (cardiovascular diseases, neoplasms, neurodegenerative syndromes, etc.). In this work, we propose a machine learning framework to evaluate the severity of cardiovascular diseases (CVD) from Human scalp hair analysis (HSHA) tests and genetic analysis and identify a small group of these clinical features mostly associated with the CVD risk. Using a private dataset provided by the DD Clinic foundation in Caserta, Italy, we cross-validated the classification performance of a Random Forests model with 90 subjects affected by CVD. The proposed model reached an AUC of 0.78 ± 0.01 on a three class classification problem. The robustness of the predictions was assessed by comparison with different cross-validation schemes and two state-of-the-art classifiers, such as Artificial Neural Network and General Linear Model. Thus, is the first work that studies, through a machine learning approach, the tight link between CVD severity, heavy metal concentrations and SNPs. Then, the selected features appear highly correlated with the CVD phenotype, and they could represent targets for future CVD therapies.

Diagnostics ◽  
2019 ◽  
Vol 9 (4) ◽  
pp. 167
Author(s):  
Erin B. Evangelista ◽  
Sandi A. Kwee ◽  
Miles M. Sato ◽  
Lu Wang ◽  
Christoph Rettenmeier ◽  
...  

Background: Hepatocellular carcinoma (HCC) pathogenesis involves the alteration of multiple liver-specific metabolic pathways. We systematically profiled cancer- and liver-related classes of metabolites in HCC and adjacent liver tissues and applied supervised machine learning to compare their potential yield for HCC biomarkers. Methods: Tumor and corresponding liver tissue samples were profiled as follows: Bile acids by ultra-performance liquid chromatography (LC) coupled to tandem mass spectrometry (MS), phospholipids by LC-MS/MS, and other small molecules including free fatty acids by gas chromatography—time of flight MS. The overall classification performance of metabolomic signatures derived by support vector machine (SVM) and random forests machine learning algorithms was then compared across classes of metabolite. Results: For each metabolite class, there was a plateau in classification performance with signatures of 10 metabolites. Phospholipid signatures consistently showed the highest discrimination for HCC followed by signatures derived from small molecules, free fatty acids, and bile acids with area under the receiver operating characteristic curve (AUC) values of 0.963, 0.934, 0.895, 0.695, respectively, for SVM-generated signatures comprised of 10 metabolites. Similar classification performance patterns were observed with signatures derived by random forests. Conclusion: Membrane phospholipids are a promising source of tissue biomarkers for discriminating between HCC tumor and liver tissue.


2019 ◽  
Vol 2019 ◽  
pp. 1-12 ◽  
Author(s):  
Ruihan Hu ◽  
Songbin Zhou ◽  
Yisen Liu ◽  
Zhiri Tang

The ensemble pruning system is an effective machine learning framework that combines several learners as experts to classify a test set. Generally, ensemble pruning systems aim to define a region of competence based on the validation set to select the most competent ensembles from the ensemble pool with respect to the test set. However, the size of the ensemble pool is usually fixed, and the performance of an ensemble pool heavily depends on the definition of the region of competence. In this paper, a dynamic pruning framework called margin-based Pareto ensemble pruning is proposed for ensemble pruning systems. The framework explores the optimized ensemble pool size during the overproduction stage and finetunes the experts during the pruning stage. The Pareto optimization algorithm is used to explore the size of the overproduction ensemble pool that can result in better performance. Considering the information entropy of the learners in the indecision region, the marginal criterion for each learner in the ensemble pool is calculated using margin criterion pruning, which prunes the experts with respect to the test set. The effectiveness of the proposed method for classification tasks is assessed using datasets. The results show that margin-based Pareto ensemble pruning can achieve smaller ensemble sizes and better classification performance in most datasets when compared with state-of-the-art models.


2020 ◽  
Author(s):  
Cody T Mowery ◽  
Alexander Marson ◽  
Yun S Song ◽  
Chun Jimmie Ye

Mitigating transmission of SARS-CoV-2 has been complicated by the inaccessibility and, in some cases, inadequacy of testing options to detect present or past infection. Immunochromatographic lateral flow assays (LFAs) are a cheap and scalable modality for tracking viral transmission by testing for serological immunity, though systematic evaluations have revealed the low performance of some SARS-CoV-2 LFAs. Here, we re-analyzed existing data to present a proof-of-principle machine learning framework that may be used to inform the pairing of LFAs to achieve superior classification performance while enabling tunable False Positive Rates optimized for the estimated seroprevalence of the population being tested.


IEEE Access ◽  
2021 ◽  
pp. 1-1
Author(s):  
Aqsa Rahim ◽  
Yawar Rasheed ◽  
Farooque Azam ◽  
Muhammad Waseem Anwar ◽  
Muhammad Abdul Rahim ◽  
...  

2020 ◽  
Vol 25 (3) ◽  
pp. 3751
Author(s):  
V. A. Nevzorova ◽  
N. G. Plekhova ◽  
L. G. Priseko ◽  
I. N. Chernenko ◽  
D. Yu. Bogdanov ◽  
...  

Aim. To assess the prospects of using artificial intelligence technologies in predicting the outcomes and risks of cardiovascular diseases (CVD) in patients with hypertension (HTN).Material and methods. A software application was created for data mining from respondent profiles in a semi-automatic mode; libraries with data preprocessing were analyzed. We analyzed the main and additional parameters (35) of CVD risk factors in 2131 people as a part of ESSE-RF study (2014-2019). To create a forecasting model, a high-level language Python 2.7 was used using object-oriented programming and exception handling with multithreading support. Using randomization, learning (n=488) and test (n=245) samples were formed, which included data from patients with an established diagnosis of HTN.Results. The prevalence of HTN among subjects was 34,39%. There were following significant factors for predicting CVD: anthropometric parameters, smoking, biochemical profile (total cholesterol, ApoA, ApoB, glucose, D-dimer, C-reactive protein). As a result of a 5-year follow-up, CVD was found in 235 people (32,06%) with HTN and 187 people (13,38%) without HTN; mortality rates were 1,27% in subjects with HTN and 1,12% — without HTN. The absolute mortality risk among participants with HTN (0,037) was significantly higher (p<0,05) than in patients without HTN (0,017). To create a neural network (NN), the basic Sequential model from the Keras library was used. During machine learning, 26 variables important for the CVD development were used as input and 9 neurons — as output, which corresponded to the number of established cardiovascular events. The created NN had a predictive value of up to 97,9%, which exceeded the SCORE value (34,9%).Conclusion. The data obtained indicate the importance of risk factor phenotyping using anthropometric markers and biochemical profile for determining their significance in the top 20 predictors of CVD. The Python-based machine learning provides CVD prediction according to standard risk assessments.


2019 ◽  
Author(s):  
Oskar Flygare ◽  
Jesper Enander ◽  
Erik Andersson ◽  
Brjánn Ljótsson ◽  
Volen Z Ivanov ◽  
...  

**Background:** Previous attempts to identify predictors of treatment outcomes in body dysmorphic disorder (BDD) have yielded inconsistent findings. One way to increase precision and clinical utility could be to use machine learning methods, which can incorporate multiple non-linear associations in prediction models. **Methods:** This study used a random forests machine learning approach to test if it is possible to reliably predict remission from BDD in a sample of 88 individuals that had received internet-delivered cognitive behavioral therapy for BDD. The random forest models were compared to traditional logistic regression analyses. **Results:** Random forests correctly identified 78% of participants as remitters or non-remitters at post-treatment. The accuracy of prediction was lower in subsequent follow-ups (68%, 66% and 61% correctly classified at 3-, 12- and 24-month follow-ups, respectively). Depressive symptoms, treatment credibility, working alliance, and initial severity of BDD were among the most important predictors at the beginning of treatment. By contrast, the logistic regression models did not identify consistent and strong predictors of remission from BDD. **Conclusions:** The results provide initial support for the clinical utility of machine learning approaches in the prediction of outcomes of patients with BDD. **Trial registration:** ClinicalTrials.gov ID: NCT02010619.


2020 ◽  
Author(s):  
Nalika Ulapane ◽  
Karthick Thiyagarajan ◽  
sarath kodagoda

<div>Classification has become a vital task in modern machine learning and Artificial Intelligence applications, including smart sensing. Numerous machine learning techniques are available to perform classification. Similarly, numerous practices, such as feature selection (i.e., selection of a subset of descriptor variables that optimally describe the output), are available to improve classifier performance. In this paper, we consider the case of a given supervised learning classification task that has to be performed making use of continuous-valued features. It is assumed that an optimal subset of features has already been selected. Therefore, no further feature reduction, or feature addition, is to be carried out. Then, we attempt to improve the classification performance by passing the given feature set through a transformation that produces a new feature set which we have named the “Binary Spectrum”. Via a case study example done on some Pulsed Eddy Current sensor data captured from an infrastructure monitoring task, we demonstrate how the classification accuracy of a Support Vector Machine (SVM) classifier increases through the use of this Binary Spectrum feature, indicating the feature transformation’s potential for broader usage.</div><div><br></div>


2018 ◽  
Vol 24 (24) ◽  
pp. 2876-2882 ◽  
Author(s):  
Kailash Prasad

Cardiovascular diseases (CVD) may be mediated through increases in the cardiovascular risk factors. Hemoglobin A1c (HbA1c) also called glycated hemoglobin is presently used for the diagnosis and management of diabetes. It has adverse effects on cardiovascular system. This review deals with its synthesis and effects on the cardiovascular system. The serum levels of HbA1c have been reported to be affected by various factors including, the lifespan of erythrocytes, factors affecting erythropoiesis, agents interfering glycation of Hb, destruction of erythrocytes, drugs that shift the formation of Hb, statins, and drugs interfering the HbA1c assay. Levels of HbA1c are positively correlated with serum glucose and advanced glycation end products ( AGE), but no correlation between AGE and serum glucose. AGE cannot replace HbA1c for the diagnosis and management of diabetes because there is no correlation of AGE with serum glucose, and because the half-life of protein with which glucose combines is only 14-20 days as compared to erythrocytes which have a half-life of 90-120 days. HbA1c is positively associated with CVD such as the carotid and coronary artery atherosclerosis, ischemic heart disease, ischemic stroke and hypertension.HbA1c induces dyslipidemia, hyperhomocysteinemia, and hypertension, and increases C-reactive protein, oxidative stress and blood viscosity that would contribute to the development of cardiovascular diseases. In conclusion, HbA1c serves as a useful marker for the diagnosis and management of diabetes. AGE cannot replace HbA1c in the diagnosis and management of diabetes. There is an association of HbA1c with CVD which be mediated through modulation of CVD risk factors.


Author(s):  
Farrikh Alzami ◽  
Erika Devi Udayanti ◽  
Dwi Puji Prabowo ◽  
Rama Aria Megantara

Sentiment analysis in terms of polarity classification is very important in everyday life, with the existence of polarity, many people can find out whether the respected document has positive or negative sentiment so that it can help in choosing and making decisions. Sentiment analysis usually done manually. Therefore, an automatic sentiment analysis classification process is needed. However, it is rare to find studies that discuss extraction features and which learning models are suitable for unstructured sentiment analysis types with the Amazon food review case. This research explores some extraction features such as Word Bags, TF-IDF, Word2Vector, as well as a combination of TF-IDF and Word2Vector with several machine learning models such as Random Forest, SVM, KNN and Naïve Bayes to find out a combination of feature extraction and learning models that can help add variety to the analysis of polarity sentiments. By assisting with document preparation such as html tags and punctuation and special characters, using snowball stemming, TF-IDF results obtained with SVM are suitable for obtaining a polarity classification in unstructured sentiment analysis for the case of Amazon food review with a performance result of 87,3 percent.


Sign in / Sign up

Export Citation Format

Share Document