Combination of Plasma-Based Metabolomics and Machine Learning Algorithm Provides a Novel Diagnostic Strategy for Malignant Mesothelioma

Na Li; Chenxi Yang; Sicheng Zhou; Siyu Song; Yuyao Jin; Ding Wang; Junping Liu; Yun Gao; Haining Yang; Weimin Mao; Zhongjian Chen

doi:10.3390/diagnostics11071281

Combination of Plasma-Based Metabolomics and Machine Learning Algorithm Provides a Novel Diagnostic Strategy for Malignant Mesothelioma

Diagnostics ◽

10.3390/diagnostics11071281 ◽

2021 ◽

Vol 11 (7) ◽

pp. 1281

Author(s):

Na Li ◽

Chenxi Yang ◽

Sicheng Zhou ◽

Siyu Song ◽

Yuyao Jin ◽

...

Keyword(s):

Machine Learning ◽

Malignant Mesothelioma ◽

Molecular Mechanisms ◽

Taurocholic Acid ◽

Machine Learning Algorithms ◽

Plasma Metabolites ◽

Large Sample Size ◽

Diagnostic Strategy ◽

Test Set ◽

Tauroursodeoxycholic Acid

Background: Malignant mesothelioma (MM) is an aggressive and incurable carcinoma that is primarily caused by asbestos exposure. However, the current diagnostic tool for MM is still under-developed. Therefore, the aim of this study is to explore the diagnostic significance of a strategy that combined plasma-based metabolomics with machine learning algorithms for MM. Methods: Plasma samples collected from 25 MM patients and 32 healthy controls (HCs) were randomly divided into train set and test set, after which analyzation was performed by liquid chromatography-mass spectrometry-based metabolomics. Differential metabolites were screened out from the samples of the train set. Subsequently, metabolite-based diagnostic models, including receiver operating characteristic (ROC) curves and Random Forest model (RF), were established, and their prediction accuracies were calculated for the test set samples. Results: Twenty differential plasma metabolites were annotated in the train set; 10 of these metabolites were validated in the test set. The seven most prevalent diagnostic metabolites were taurocholic acid), 0.7142 (uracil), 0.7142 (biliverdin), 0.8571 (histidine), 0.5000 (tauroursodeoxycholic acid), 0.8571 (pyrroline hydroxycarboxylic acid), and 0.7857 (phenylalanine). Furthermore, RF based on 20 annotated metabolites showed a prediction accuracy of 0.9286, and its optimized version achieved 1.0000 in the test set. Moreover, the comparison between the samples of peritoneal MM (n = 8) and pleural MM (n = 17) illustrated a significant increase in levels of taurocholic acid and tauroursodeoxycholic acid, as well as an evident decrease in biliverdin. Conclusions: Our results revealed the potential diagnostic value of plasma-based metabolomics combined with machine learning for MM. Further research with large sample size is worthy conducting. Moreover, our data demonstrated dysregulated metabolism pathways in MM, which aids in better understanding of molecular mechanisms related to the initiation and development of MM.

Download Full-text

Evaluation and Identification of the Neuroprotective Compounds of Xiaoxuming Decoction by Machine Learning: A Novel Mode to Explore the Combination Rules in Traditional Chinese Medicine Prescription

BioMed Research International ◽

10.1155/2019/6847685 ◽

2019 ◽

Vol 2019 ◽

pp. 1-14

Author(s):

Shilun Yang ◽

Yanjia Shen ◽

Wendan Lu ◽

Yinglin Yang ◽

Haigang Wang ◽

...

Keyword(s):

Machine Learning ◽

Chinese Medicine ◽

Traditional Chinese Medicine ◽

Cross Validation ◽

Bayesian Models ◽

Machine Learning Algorithms ◽

Therapeutic Effects ◽

Test Set ◽

Screening Experiments ◽

Fold Cross Validation

Xiaoxuming decoction (XXMD), a classic traditional Chinese medicine (TCM) prescription, has been used as a therapeutic in the treatment of stroke in clinical practice for over 1200 years. However, the pharmacological mechanisms of XXMD have not yet been elucidated. The purpose of this study was to develop neuroprotective models for identifying neuroprotective compounds in XXMD against hypoxia-induced and H2O2-induced brain cell damage. In this study, a phenotype-based classification method was designed by machine learning to identify neuroprotective compounds and to clarify the compatibility of XXMD components. Four different single classifiers (AB, kNN, CT, and RF) and molecular fingerprint descriptors were used to construct stacked naïve Bayesian models. Among them, the RF algorithm had a better performance with an average MCC value of 0.725±0.014 and 0.774±0.042 from 5-fold cross-validation and test set, respectively. The probability values calculated by four models were then integrated into a stacked Bayesian model. In total, two optimal models, s-NB-1-LPFP6 and s-NB-2-LPFP6, were obtained. The two validated optimal models revealed Matthews correlation coefficients (MCC) of 0.968 and 0.993 for 5-fold cross-validation and of 0.874 and 0.959 for the test set, respectively. Furthermore, the two models were used for virtual screening experiments to identify neuroprotective compounds in XXMD. Ten representative compounds with potential therapeutic effects against the two phenotypes were selected for further cell-based assays. Among the selected compounds, two compounds significantly inhibited H2O2-induced and Na2S2O4-induced neurotoxicity simultaneously. Together, our findings suggested that machine learning algorithms such as combination Bayesian models were feasible to predict neuroprotective compounds and to preliminarily demonstrate the pharmacological mechanisms of TCM.

Download Full-text

IGRNet: A Deep Learning Model for Non-Invasive, Real-Time Diagnosis of Prediabetes through Electrocardiograms

Sensors ◽

10.3390/s20092556 ◽

2020 ◽

Vol 20 (9) ◽

pp. 2556

Author(s):

Liyang Wang ◽

Yao Mu ◽

Jing Zhao ◽

Xiaoya Wang ◽

Huilian Che

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Real Time ◽

Clinical Symptoms ◽

Characteristic Curve ◽

Learning Model ◽

Machine Learning Algorithms ◽

Test Set ◽

Non Invasive ◽

Deep Learning Model

The clinical symptoms of prediabetes are mild and easy to overlook, but prediabetes may develop into diabetes if early intervention is not performed. In this study, a deep learning model—referred to as IGRNet—is developed to effectively detect and diagnose prediabetes in a non-invasive, real-time manner using a 12-lead electrocardiogram (ECG) lasting 5 s. After searching for an appropriate activation function, we compared two mainstream deep neural networks (AlexNet and GoogLeNet) and three traditional machine learning algorithms to verify the superiority of our method. The diagnostic accuracy of IGRNet is 0.781, and the area under the receiver operating characteristic curve (AUC) is 0.777 after testing on the independent test set including mixed group. Furthermore, the accuracy and AUC are 0.856 and 0.825, respectively, in the normal-weight-range test set. The experimental results indicate that IGRNet diagnoses prediabetes with high accuracy using ECGs, outperforming existing other machine learning methods; this suggests its potential for application in clinical practice as a non-invasive, prediabetes diagnosis technology.

Download Full-text

Reverse-engineering human olfactory perception from chemical features of odor molecules

10.1101/082495 ◽

2016 ◽

Cited By ~ 2

Author(s):

Andreas Keller ◽

Richard C. Gerkin ◽

Yuanfang Guan ◽

Amit Dhurandhar ◽

Gabor Turu ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Molecular Mechanisms ◽

Linear Models ◽

Predictive Accuracy ◽

High Accuracy ◽

Machine Learning Algorithms ◽

Olfactory Perception ◽

Theoretical Limit ◽

Reverse Engineer

AbstractDespite 25 years of progress in understanding the molecular mechanisms of olfaction, it is still not possible to predict whether a given molecule will have a perceived odor, or what olfactory percept it will produce. To address this stimulus-percept problem for olfaction, we organized the crowd-sourced DREAM Olfaction Prediction Challenge. Working from a large olfactory psychophysical dataset, teams developed machine learning algorithms to predict sensory attributes of molecules based on their chemoinformatic features. The resulting models predicted odor intensity and pleasantness with high accuracy, and also successfully predicted eight semantic descriptors (“garlic”, “fish”, “sweet”, “fruit”, “burnt”, “spices”, “flower”, “sour”). Regularized linear models performed nearly as well as random-forest-based approaches, with a predictive accuracy that closely approaches a key theoretical limit. The models presented here make it possible to predict the perceptual qualities of virtually any molecule with an impressive degree of accuracy to reverse-engineer the smell of a molecule.One Sentence SummaryResults of a crowdsourcing competition show that it is possible to accurately predict and reverse-engineer the smell of a molecule.

Download Full-text

Early Prediction of Malignant Mesothelioma: An Approach towards Non-invasive Method

Current Bioinformatics ◽

10.2174/1574893616666210616121023 ◽

2021 ◽

Vol 16 ◽

Author(s):

Shakir Shabbir ◽

M. Shahzad Asif ◽

Talha Mahboob Alam ◽

Zeeshan Ramzan

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Malignant Mesothelioma ◽

Information Gain ◽

Imbalanced Data ◽

Machine Learning Algorithms ◽

Diagnostic Methods ◽

Machine Learning Techniques ◽

Diagnostic Model ◽

Diagnostic Measures

Background: Malignant Mesothelioma (MM) is a rare but aggressive tumor that arises in the lungs. Commonly, costly imaging and laboratory resources, i.e., X-ray imaging, magnetic resonance imaging (MRI), positron emission tomography (PET) scans, biopsies, and blood tests, have already been utilized for the diagnosis of MM. Even though these diagnostic measures are expensive and unavailable in distant areas, some of these diagnostic methods are also very painful for the patient, including biopsy and cytology of pleural fluid. Objective: In this study, we proposed a diagnostic model for early identification of MM via machine learning techniques. We explored the health records of 324 Turkish patients, which showed the symptoms related to MM. The data of patients included socio-economic, geographical, and clinical features. Methods: Different feature selection methods have been employed for the selection of significant features. To overcome the data imbalance problem, various data-level resampling techniques have been utilized to obtain efficient results. The gradient boosted decision tree (GBDT) method has been used to develop the diagnostic model. The performance of the GBDT model is also compared with traditional machine learning algorithms. Results and Conclusion: Our model's results outperformed other models, both on balance and imbalance data. The results clearly show that undersampling techniques outperformed imbalanced data without resampling based on accuracy and receiving operating characteristic (ROC) value. Conversely, it has also been observed that oversampling techniques outperformed undersampling and imbalanced data based on accuracy and ROC. All classifiers employed in this study achieved efficient results utilizing feature selection-based methods (OneR, information gain, and Relief-F), but the other two methods (gain ratio and correlation) results were not entirely promising. Finally, when the combination of Synthetic Minority Oversampling Technique (SMOTE) and OneR was applied with GBDT, it gave the most favorable results based on accuracy, F-measure, and ROC. The diagnosis model has also been deployed to assist doctors, patients, medical practitioners, and other healthcare professionals for early diagnosis and better treatment of MM.

Download Full-text

Lightweight Modeling Attack-Resistant Multiplexer-Based Multi-PUF (MMPUF) Design on FPGA

Electronics ◽

10.3390/electronics9050815 ◽

2020 ◽

Vol 9 (5) ◽

pp. 815 ◽

Cited By ~ 1

Author(s):

Yijun Cui ◽

Chongyan Gu ◽

Qingqing Ma ◽

Yue Fang ◽

Chenghua Wang ◽

...

Keyword(s):

Machine Learning ◽

Mathematical Model ◽

Sample Size ◽

High Resistance ◽

Hardware Implementation ◽

Machine Learning Algorithms ◽

Experimental Results ◽

Large Sample Size ◽

Prediction Rate ◽

Arbiter Puf

Physical unclonable function (PUF) is a primary hardware security primitive that is suitable for lightweight applications. However, it is found to be vulnerable to modeling attacks using machine learning algorithms. In this paper, multiplexer (MUX)-based Multi-PUF (MMPUF) design is proposed to thwart modeling attacks. The proposed design uses a weak PUF to obfuscate the challenge of a strong PUF. A mathematical model of the proposed design is presented and analyzed. The three most widely used modeling attack techniques are used to evaluate the resistance of the proposed design. Experimental results show that the proposed MMPUF design is more resistant to the machine learning attack than the previously proposed XOR-based Multi-PUF (XMPUF) design. For a large sample size, the prediction rate of the proposed MMPUF is less than the conventional Arbiter PUF (APUF). Compared with existing attack-resistant PUF designs, the proposed MMPUF design demonstrates high resistance. To verify the proposed design, a hardware implementation on Xilinx 7 Series FPGAs is presented. The hardware experimental results show that the proposed MMPUF designs present good results of uniqueness and reliability.

Download Full-text

Machine Learning Algorithm Identifies Patients at High Risk for Early Complications After Intracranial Tumor Surgery: Registry-Based Cohort Study

Neurosurgery ◽

10.1093/neuros/nyz145 ◽

2019 ◽

Vol 85 (4) ◽

pp. E756-E764 ◽

Cited By ~ 7

Author(s):

Christiaan H B van Niftrik ◽

Frank van der Wouden ◽

Victor E Staartjes ◽

Jorn Fierstra ◽

Martin N Stienen ◽

...

Keyword(s):

Machine Learning ◽

Cohort Study ◽

Statistical Methods ◽

Learning Algorithms ◽

Intracranial Tumor ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Tumor Surgery ◽

Test Set ◽

Gradient Boosting Machine

Abstract INTRODUCTION Reliable preoperative identification of patients at high risk for early postoperative complications occurring within 24 h (EPC) of intracranial tumor surgery can improve patient safety and postoperative management. Statistical analysis using machine learning algorithms may generate models that predict EPC better than conventional statistical methods. OBJECTIVE To train such a model and to assess its predictive ability. METHODS This cohort study included patients from an ongoing prospective patient registry at a single tertiary care center with an intracranial tumor that underwent elective neurosurgery between June 2015 and May 2017. EPC were categorized based on the Clavien-Dindo classification score. Conventional statistical methods and different machine learning algorithms were used to predict EPC using preoperatively available patient, clinical, and surgery-related variables. The performance of each model was derived from examining classification performance metrics on an out-of-sample test dataset. RESULTS EPC occurred in 174 (26%) of 668 patients included in the analysis. Gradient boosting machine learning algorithms provided the model best predicting the probability of an EPC. The model scored an accuracy of 0.70 (confidence interval [CI] 0.59-0.79) with an area under the curve (AUC) of 0.73 and a sensitivity and specificity of 0.80 (CI 0.58-0.91) and 0.67 (CI 0.53-0.77) on the test set. The conventional statistical model showed inferior predictive power (test set: accuracy: 0.59 (CI 0.47-0.71); AUC: 0.64; sensitivity: 0.76 (CI 0.64-0.85); specificity: 0.53 (CI 0.41-0.64)). CONCLUSION Using gradient boosting machine learning algorithms, it was possible to create a prediction model superior to conventional statistical methods. While conventional statistical methods favor patients’ characteristics, we found the pathology and surgery-related (histology, anatomical localization, surgical access) variables to be better predictors of EPC.

Download Full-text

Test Set Optimization by Machine Learning Algorithms

2020 IEEE International Conference on Big Data (Big Data) ◽

10.1109/bigdata50022.2020.9377792 ◽

2020 ◽

Author(s):

Kaiming Fu ◽

Yulu Jin ◽

Zhousheng Chen

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Set Optimization ◽

Test Set ◽

Test Set Optimization

Download Full-text

Prediction of Progression to Severe Stroke in Initially Diagnosed Anterior Circulation Ischemic Cerebral Infarction

Frontiers in Neurology ◽

10.3389/fneur.2021.652757 ◽

2021 ◽

Vol 12 ◽

Author(s):

Lai Wei ◽

Yidi Cao ◽

Kangwei Zhang ◽

Yun Xu ◽

Xiang Zhou ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Characteristic Curve ◽

Machine Learning Algorithms ◽

Support Vector ◽

Manual Segmentation ◽

Anterior Circulation ◽

Test Set ◽

Severe Stroke ◽

Diffusion Weighted Images

Purpose: Accurate prediction of the progression to severe stroke in initially diagnosed nonsevere patients with acute–subacute anterior circulation nonlacuna ischemic infarction (ASACNLII) is important in making clinical decision. This study aimed to apply a machine learning method to predict if the initially diagnosed nonsevere patients with ASACNLII would progress to severe stroke by using diffusion-weighted images and clinical information on admission.Methods: This retrospective study enrolled 344 patients with ASACNLII from June 2017 to August 2020 on admission, and 108 cases progressed to severe stroke during hospitalization within 3–21 days. The entire data were randomized into a training set (n = 271) and an independent test set (n = 73). A U-Net neural network was employed for automatic segmentation and volume measurement of the ischemic lesions. Predictive models were developed and used for evaluating the progression to severe stroke using different feature sets (the volume data, the clinical data, and the combination) and machine learning methods (random forest, support vector machine, and logistic regression).Results: The U-Net showed high correlation with manual segmentation in terms of Dice coefficient of 0.806 and R2 value of the volume measurements of 0.960 in the test set. The random forest classifier of the volume + clinical combination achieved the best area under the receiver operating characteristic curve of 0.8358 (95% CI 0.7321–0.9269), and the accuracy, sensitivity, and specificity were 0.7780 (0.7397–0.7945), 0.7695 (0.6102–0.9074), and 0.8686 (0.6923–1.0), respectively. The Shapley additive explanation diagram showed the volume variable as the most important predictor.Conclusion: The U-Net was fully automatic and showed a high correlation with manual segmentation. An integrated approach combining clinical variables and stroke lesion volumes that were derived from the advanced machine learning algorithms had high accuracy in predicting the progression to severe stroke in ASACNLII patients.

Download Full-text

Visual light perceptions caused by medical linear accelerator: Findings of machine-learning algorithms in a prospective questionnaire-based case–control study

PLoS ONE ◽

10.1371/journal.pone.0247597 ◽

2021 ◽

Vol 16 (2) ◽

pp. e0247597

Author(s):

Chao-Yang Kuo ◽

Cheng-Chun Lee ◽

Yuh-Lin Lee ◽

Shueh-Chun Liou ◽

Jia-Cheng Lee ◽

...

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Area Under The Curve ◽

Radiation Energy ◽

Continuous Variable ◽

Machine Learning Algorithms ◽

Test Set ◽

Fraction Dose ◽

Positive Effect ◽

Visual Light

This study aimed to investigate the possible incidence of visual light perceptions (VLPs) during radiation therapy (RT). We analyzed whether VLPs could be affected by differences in the radiation energy, prescription doses, age, sex, or RT locations, and whether all VLPs were caused by radiation. From November 2016 to August 2018, a total of 101 patients who underwent head-and-neck or brain RT were screened. After receiving RT, questionnaires were completed, and the subjects were interviewed. Random forests (RF), a tree-based machine learning algorithm, and logistic regression (LR) analyses were compared by the area under the curve (AUC), and the algorithm that achieved the highest AUC was selected. The dataset sample was based on treatment with non-human units, and a total of 293 treatment fields from 78 patients were analyzed. VLPs were detected only in 122 of the 293 exposure portals (40.16%). The dataset was randomly divided into 80% and 20% as the training set and test set, respectively. In the test set, RF achieved an AUC of 0.888, whereas LR achieved an AUC of 0.773. In this study, the retina fraction dose was the most important continuous variable and had a positive effect on VLP. Age was the most important categorical variable. In conclusion, the visual light perception phenomenon by the human body during RT is induced by radiation rather than being a self-suggested hallucination or induced by phosphenes.

Download Full-text

Predicting suicide attempt or suicide death following a visit to psychiatric specialty care: A machine learning study using Swedish national registry data

PLoS Medicine ◽

10.1371/journal.pmed.1003416 ◽

2020 ◽

Vol 17 (11) ◽

pp. e1003416 ◽

Cited By ~ 1

Author(s):

Qi Chen ◽

Yanli Zhang-James ◽

Eric J. Barnett ◽

Paul Lichtenstein ◽

Jussi Jokinen ◽

...

Keyword(s):

Machine Learning ◽

Suicide Attempt ◽

Suicidal Behavior ◽

Learning Algorithms ◽

National Registry ◽

Specialty Care ◽

Machine Learning Algorithms ◽

Registry Data ◽

Test Set ◽

National Registry Data

Background Suicide is a major public health concern globally. Accurately predicting suicidal behavior remains challenging. This study aimed to use machine learning approaches to examine the potential of the Swedish national registry data for prediction of suicidal behavior. Methods and findings The study sample consisted of 541,300 inpatient and outpatient visits by 126,205 Sweden-born patients (54% female and 46% male) aged 18 to 39 (mean age at the visit: 27.3) years to psychiatric specialty care in Sweden between January 1, 2011 and December 31, 2012. The most common psychiatric diagnoses at the visit were anxiety disorders (20.0%), major depressive disorder (16.9%), and substance use disorders (13.6%). A total of 425 candidate predictors covering demographic characteristics, socioeconomic status (SES), electronic medical records, criminality, as well as family history of disease and crime were extracted from the Swedish registry data. The sample was randomly split into an 80% training set containing 433,024 visits and a 20% test set containing 108,276 visits. Models were trained separately for suicide attempt/death within 90 and 30 days following a visit using multiple machine learning algorithms. Model discrimination and calibration were both evaluated. Among all eligible visits, 3.5% (18,682) were followed by a suicide attempt/death within 90 days and 1.7% (9,099) within 30 days. The final models were based on ensemble learning that combined predictions from elastic net penalized logistic regression, random forest, gradient boosting, and a neural network. The area under the receiver operating characteristic (ROC) curves (AUCs) on the test set were 0.88 (95% confidence interval [CI] = 0.87–0.89) and 0.89 (95% CI = 0.88–0.90) for the outcome within 90 days and 30 days, respectively, both being significantly better than chance (i.e., AUC = 0.50) (p < 0.01). Sensitivity, specificity, and predictive values were reported at different risk thresholds. A limitation of our study is that our models have not yet been externally validated, and thus, the generalizability of the models to other populations remains unknown. Conclusions By combining the ensemble method of multiple machine learning algorithms and high-quality data solely from the Swedish registers, we developed prognostic models to predict short-term suicide attempt/death with good discrimination and calibration. Whether novel predictors can improve predictive performance requires further investigation.

Download Full-text