Predicting Depression from Smartphone Behavioral Markers Using Machine Learning Methods, Hyper-parameter Optimization, and Feature Importance Analysis: An Exploratory Study (Preprint)

Mapping Intimacies ◽

10.2196/preprints.26540 ◽

2020 ◽

Author(s):

Kennedy Opoku Asare ◽

Yannik Terhorst ◽

Julio Vega ◽

Ella Peltonen ◽

Eemil Lagerspetz ◽

...

Keyword(s):

Machine Learning ◽

Age Distribution ◽

Area Under The Curve ◽

Imbalanced Data ◽

Positive Association ◽

Assessment Methods ◽

Supervised Machine Learning ◽

Significant Positive Association ◽

Depression Assessment ◽

Importance Analysis

BACKGROUND Depression is a prevalent mental health challenge. Current depression assessment methods using self-reported and clinician-administered questionnaires have limitations. Instrumenting smartphones to passively and continuously collect moment by moment datasets to quantify human behaviours that have the potential to augment current depression assessment methods for early diagnosis, scalable, and longitudinal monitoring of depression. OBJECTIVE The objective of this study is to investigate the feasibility of predicting depression with human behaviours quantified from a smartphone datasets, and to identify behaviours that can influence depression. METHODS Smartphone datasets and self-reported eight-item Patient Health Questionnaire (PHQ-8) depression assessments were collected from 629 participants in an exploratory longitudinal study over an average 22.1 days (SD =17.90, min= 8, max=86). We quantified 22 regularity, entropy, and standard deviation behavioural markers from the smartphone usage data. We explore the linear relationship between the behavioural features and depression using correlation and bivariate linear mixed models (LMM). We leverage 5 supervised machine learning (ML) algorithms with hyperparameter optimization, nested cross-validation, and imbalanced data handling to predict depression. Finally, with the Permutation Importance method, we find influential behavioural markers in predicting depression. RESULTS Of the 629 participants from at least 56 countries, 10.96% were females, 86.80% males, 2.22% non-binary. For participants’ age distribution; 11.61% were between 18–24 years, 32.43% 25–34, 24.80% 35–44, 26.39% 45–64 and 4.77% were 65 years and over. Of the 1374 PHQ-8 assessments 83.19% were non-depressed, 16.81% were depressed, based on PHQ-8 cut off. Significant positive Pearson’s correlation was found between screen status normalised entropy and depression (r=0.14, P<.001). LMM demonstrates intra-class correlation of 0.7584 and significant positive association between screen status normalised entropy and depression (beta=.48, P=0.03). The best ML algorithms obtained precision (85.55%–92.50%), recall (92.19%–94.38%), F1 (88.73%–93.41%), area under the curve receiver operating characteristic AUC (94.68%–98.83%), Cohen’s kappa (86.61%–92.21%), and accuracy (96.44%–97.97%). Including age group and gender as predictors improved the ML performances. Screen and Internet connectivity features were the most influential in predicting depression. CONCLUSIONS Our findings demonstrate that behavioural markers indicative of depression can be unobtrusively identified from smartphone sensors’ data. Traditional assessment of depression can be augmented with behavioural markers from smartphones for depression diagnosis and monitoring.

Download Full-text

Can sonographic features of microcalcification predict thyroid nodule malignancy? a prospective observational study

Egyptian Journal of Radiology and Nuclear Medicine ◽

10.1186/s43055-021-00498-x ◽

2021 ◽

Vol 52 (1) ◽

Author(s):

Mehrdad Nabahati ◽

Rahele Mehraeen ◽

Zoleika Moazezi ◽

Naser Ghaemian

Keyword(s):

Thyroid Nodule ◽

Roc Analysis ◽

Thyroid Nodules ◽

Area Under The Curve ◽

Positive Association ◽

Needle Aspiration ◽

Northern Iran ◽

Significant Positive Association ◽

Irregular Margin ◽

Benign Nodules

Abstract Background The aim of this study was to investigate the diagnostic accuracy of microcalcification, as well as its associated sonographic features, for prediction of thyroid nodule malignancy. We prospectively assessed the patients with thyroid nodule, who underwent ultrasound-guided fine-needle aspiration during 2017–2020 in Babol, northern Iran. The ultrasonographic characteristics of the nodules, as well as their cytological results, were recorded. We used regression analysis to evaluate the relation between sonographic findings and nodule malignancy. A receiver operator characteristics (ROC) analysis was also used to estimate the ability of ultrasound to predict the characteristic features of malignancy, as estimated by the area under the curve (AUC). Results Overall, 1129 thyroid nodules were finally included in the study, of which 452 (40%) had microcalcification. A significant positive association was found between nodule malignancy and microcalcification in both univariate (OR=3.626, 95% CI 2.258–5.822) and multivariable regression analyses (OR=1.878, 95% CI 1.095–3.219). In the nodules with microcalcification, significant positive relations were seen between malignancy and hypoechogenicity (OR=3.833, 95% CI 1.032–14.238), >5 microcalcification number (OR=3.045, 95% CI 1.328–6.982), irregular margin (OR=3.341, 95% CI 1.078–10.352), and lobulated margin (OR=5.727, 95% CI 1.934–16.959). The ROC analysis indicated that AUC for hypoechogenicity, >5 microcalcification number, irregular margin, and lobulated margin were 60%, 62%, 55%, and 60%, respectively, in predicting malignant thyroid nodules. Conclusion The findings indicated that microcalcification can be a potential predictor of thyroid nodule malignancy. Also, the presence of irregular or lobulated margins, multiple intranodular microcalcification (>5 microcalcifications), and/or hypoechogenicity can improve the ability of microcalcification in distinguishing malignant from benign nodules.

Download Full-text

Clinical Score and Machine Learning-Based Model to Predict Diagnosis of Primary Aldosteronism in Arterial Hypertension

Hypertension ◽

10.1161/hypertensionaha.121.17444 ◽

2021 ◽

Vol 78 (5) ◽

pp. 1595-1604

Author(s):

Fabrizio Buffolo ◽

Jacopo Burrello ◽

Alessio Burrello ◽

Daniel Heinrich ◽

Christian Adolf ◽

...

Keyword(s):

Machine Learning ◽

Arterial Hypertension ◽

Primary Aldosteronism ◽

Learning Algorithm ◽

Area Under The Curve ◽

Clinical Score ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Individual Risk ◽

The Individual

Primary aldosteronism (PA) is the cause of arterial hypertension in 4% to 6% of patients, and 30% of patients with PA are affected by unilateral and surgically curable forms. Current guidelines recommend screening for PA ≈50% of patients with hypertension on the basis of individual factors, while some experts suggest screening all patients with hypertension. To define the risk of PA and tailor the diagnostic workup to the individual risk of each patient, we developed a conventional scoring system and supervised machine learning algorithms using a retrospective cohort of 4059 patients with hypertension. On the basis of 6 widely available parameters, we developed a numerical score and 308 machine learning-based models, selecting the one with the highest diagnostic performance. After validation, we obtained high predictive performance with our score (optimized sensitivity of 90.7% for PA and 92.3% for unilateral PA [UPA]). The machine learning-based model provided the highest performance, with an area under the curve of 0.834 for PA and 0.905 for diagnosis of UPA, with optimized sensitivity of 96.6% for PA, and 100.0% for UPA, at validation. The application of the predicting tools allowed the identification of a subgroup of patients with very low risk of PA (0.6% for both models) and null probability of having UPA. In conclusion, this score and the machine learning algorithm can accurately predict the individual pretest probability of PA in patients with hypertension and circumvent screening in up to 32.7% of patients using a machine learning-based model, without omitting patients with surgically curable UPA.

Download Full-text

The Comparison and Interpretation of Machine-Learning Models in Post-Stroke Functional Outcome Prediction

Diagnostics ◽

10.3390/diagnostics11101784 ◽

2021 ◽

Vol 11 (10) ◽

pp. 1784

Author(s):

Shih-Chieh Chang ◽

Chan-Lin Chu ◽

Chih-Kuang Chen ◽

Hsiang-Ning Chang ◽

Alice M. K. Wong ◽

...

Keyword(s):

Machine Learning ◽

Area Under The Curve ◽

Superior Performance ◽

Support Vector ◽

Balance Test ◽

Post Stroke ◽

Feature Importance ◽

Value Range ◽

Importance Analysis ◽

Partial Dependence

Prediction of post-stroke functional outcomes is crucial for allocating medical resources. In this study, a total of 577 patients were enrolled in the Post-Acute Care-Cerebrovascular Disease (PAC-CVD) program, and 77 predictors were collected at admission. The outcome was whether a patient could achieve a Barthel Index (BI) score of >60 upon discharge. Eight machine-learning (ML) methods were applied, and their results were integrated by stacking method. The area under the curve (AUC) of the eight ML models ranged from 0.83 to 0.887, with random forest, stacking, logistic regression, and support vector machine demonstrating superior performance. The feature importance analysis indicated that the initial Berg Balance Test (BBS-I), initial BI (BI-I), and initial Concise Chinese Aphasia Test (CCAT-I) were the top three predictors of BI scores at discharge. The partial dependence plot (PDP) and individual conditional expectation (ICE) plot indicated that the predictors’ ability to predict outcomes was the most pronounced within a specific value range (e.g., BBS-I < 40 and BI-I < 60). BI at discharge could be predicted by information collected at admission with the aid of various ML models, and the PDP and ICE plots indicated that the predictors could predict outcomes at a certain value range.

Download Full-text

Prediction of Adverse Events in Stable Non-Variceal Gastrointestinal Bleeding Using Machine Learning

Journal of Clinical Medicine ◽

10.3390/jcm9082603 ◽

2020 ◽

Vol 9 (8) ◽

pp. 2603 ◽

Cited By ~ 1

Author(s):

Dong-Woo Seo ◽

Hahn Yi ◽

Beomhee Park ◽

Youn-Jung Kim ◽

Dae Ho Jung ◽

...

Keyword(s):

Machine Learning ◽

High Risk ◽

Adverse Events ◽

Gastrointestinal Bleeding ◽

Area Under The Curve ◽

Scoring Systems ◽

Hemodynamic Instability ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Importance Analysis

Clinical risk-scoring systems are important for identifying patients with upper gastrointestinal bleeding (UGIB) who are at a high risk of hemodynamic instability. We developed an algorithm that predicts adverse events in patients with initially stable non-variceal UGIB using machine learning (ML). Using prospective observational registry, 1439 out of 3363 consecutive patients were enrolled. Primary outcomes included adverse events such as mortality, hypotension, and rebleeding within 7 days. Four machine learning algorithms, namely, logistic regression with regularization (LR), random forest classifier (RF), gradient boosting classifier (GB), and voting classifier (VC), were compared with the Glasgow–Blatchford score (GBS) and Rockall scores. The RF model showed the highest accuracies and significant improvement over conventional methods for predicting mortality (area under the curve: RF 0.917 vs. GBS 0.710), but the performance of the VC model was best in hypotension (VC 0.757 vs. GBS 0.668) and rebleeding within 7 days (VC 0.733 vs. GBS 0.694). Clinically significant variables including blood urea nitrogen, albumin, hemoglobin, platelet, prothrombin time, age, and lactate were identified by the global feature importance analysis. These results suggest that ML models will be useful early predictive tools for identifying high-risk patients with initially stable non-variceal UGIB admitted at an emergency department.

Download Full-text

Identification of Core Suppliers Based on E-Invoice Data Using Supervised Machine Learning

Journal of Risk and Financial Management ◽

10.3390/jrfm11040070 ◽

2018 ◽

Vol 11 (4) ◽

pp. 70 ◽

Cited By ~ 1

Author(s):

Jung-sik Hong ◽

Hyeongyu Yeo ◽

Nam-Wook Cho ◽

Taeuk Ahn

Keyword(s):

Machine Learning ◽

Random Forests ◽

Area Under The Curve ◽

High Accuracy ◽

Supervised Machine Learning ◽

Machine Learning Method ◽

Learning Method ◽

Machine Learning Technique ◽

Novel Approach ◽

Learning Technique

Since not all suppliers are to be managed in the same way, a purchasing strategy requires proper supplier segmentation so that the most suitable strategies can be used for different segments. Most existing methods for supplier segmentation, however, either depend on subjective judgements or require significant efforts. To overcome the limitations, this paper proposes a novel approach for supplier segmentation. The objective of this paper is to develop an automated and effective way to identify core suppliers, whose profit impact on a buyer is significant. To achieve this objective, the application of a supervised machine learning technique, Random Forests (RF), to e-invoice data is proposed. To validate the effectiveness, the proposed method has been applied to real e-invoice data obtained from an automobile parts manufacturer. Results of high accuracy and the area under the curve (AUC) attest to the applicability of our approach. Our method is envisioned to be of value for automating the identification of core suppliers. The main benefits of the proposed approach include the enhanced efficiency of supplier segmentation procedures. Besides, by utilizing a machine learning method to e-invoice data, our method results in more reliable segmentation in terms of selecting and weighting variables.

Download Full-text

c-Cbl expression as a novel predictive marker of survival in patients with metastatic colorectal cancer.

Journal of Clinical Oncology ◽

10.1200/jco.2017.35.15_suppl.e15090 ◽

2017 ◽

Vol 35 (15_suppl) ◽

pp. e15090-e15090

Author(s):

Shin Yin Lee ◽

Vijaya B. Kolachalama ◽

Umit Tapan ◽

Janice Weinberg ◽

Jean M. Francis ◽

...

Keyword(s):

Colorectal Cancer ◽

Machine Learning ◽

Medical Center ◽

Area Under The Curve ◽

Predictive Marker ◽

Learning Model ◽

Negative Regulator ◽

Supervised Machine Learning ◽

Molecular Features ◽

Machine Learning Model

e15090 Background: Aberrant hyperactive Wnt/ß-catenin signaling is critical in colorectal cancer (CRC) tumorigenesis. Casitas B-lineage Lymphoma (c-Cbl) is a negative regulator of Wnt signaling, and functions as a tumor suppressor. The objective of this study was to evaluate c-Cbl expression as a predictive marker of survival in patients with metastatic CRC (mCRC). Methods: Patients with mCRC treated at Boston University Medical Center between 2004 and 2014 were analyzed. c-Cbl and nuclear ß-catenin expression was quantified in explanted biopsies using a customized color-based image segmentation pipeline. Quantification was normalized to the total tumor area in an image, and deemed ‘low’ or ‘high’ according to the mean normalized values of the cohort. A supervised machine-learning model based on bootstrap aggregating was constructed with c-Cbl expression as the input feature and 3-year survival as output. Results: Of the 72 subjects with mCRC, 52.78% had high and 47.22% had low c-Cbl expression. Patients with high c-Cbl had significantly better median overall survival than those with low c-Cbl expression (3.7 years vs. 1.8 years; p = 0.0026), and experienced superior 3-year survival (47.37% vs 20.59%; p = 0.017). Intriguingly, nuclear ß-catenin expression did not correlate with survival. No significant differences were detected between high and low c-Cbl groups in baseline characteristics (demographics, comorbidities), tumor-related parameters (primary tumor location, number of metastasis, molecular features) or therapy received (surgery, chemotherapy regimen). A 5-fold cross-validated machine-learning model associated with 3-year survival demonstrated an area under the curve of 0.729, supporting c-Cbl expression as a predictor of mCRC survival. Conclusions: Our results show that c-Cbl expression is associated with and predicts mCRC survival. Demonstration of these findings despite the small cohort size underscores the power of quantitative histology and machine-learning application. While further work is needed to validate c-Cbl as a novel biomarker of mCRC survival, this study supports c-Cbl as a regulator of Wnt/ß-catenin signaling and a suppressor of other oncogenes in CRC tumorigenesis.

Download Full-text

Prediction of mosquito species and population age structure using mid-infrared spectroscopy and supervised machine learning

Wellcome Open Research ◽

10.12688/wellcomeopenres.15201.3 ◽

2019 ◽

Vol 4 ◽

pp. 76 ◽

Cited By ~ 11

Author(s):

Mario González Jiménez ◽

Simon A. Babayan ◽

Pegah Khazaeli ◽

Margaret Doyle ◽

Finlay Walton ◽

...

Keyword(s):

Machine Learning ◽

Infrared Spectroscopy ◽

Malaria Vector ◽

Mosquito Species ◽

Age Distribution ◽

Supervised Machine Learning ◽

Malaria Vector Control ◽

Promising Alternative ◽

Mid Infrared ◽

Mid Infrared Spectroscopy

Despite the global efforts made in the fight against malaria, the disease is resurging. One of the main causes is the resistance that Anopheles mosquitoes, vectors of the disease, have developed to insecticides. Anopheles must survive for at least 10 days to possibly transmit malaria. Therefore, to evaluate and improve malaria vector control interventions, it is imperative to monitor and accurately estimate the age distribution of mosquito populations as well as their population sizes. Here, we demonstrate a machine-learning based approach that uses mid-infrared spectra of mosquitoes to characterise simultaneously both age and species identity of females of the African malaria vector species Anopheles gambiae and An. arabiensis, using laboratory colonies. Mid-infrared spectroscopy-based prediction of mosquito age structures was statistically indistinguishable from true modelled distributions. The accuracy of classifying mosquitoes by species was 82.6%. The method has a negligible cost per mosquito, does not require highly trained personnel, is rapid, and so can be easily applied in both laboratory and field settings. Our results indicate this method is a promising alternative to current mosquito species and age-grading approaches, with further improvements to accuracy and expansion for use with wild mosquito vectors possible through collection of larger mid-infrared spectroscopy data sets.

Download Full-text

Predicting flood responses from spatial rainfall variability and basin morphology through machine learning

10.5194/egusphere-egu2020-22179 ◽

2020 ◽

Author(s):

Jorge Duarte ◽

Pierre E. Kirstetter ◽

Manabendra Saharia ◽

Jonathan J. Gourley ◽

Humberto Vergara ◽

...

Keyword(s):

Machine Learning ◽

Flash Flood ◽

Rainfall Variability ◽

Explanatory Power ◽

Model Performance ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Storm Event ◽

Basin Scale ◽

Importance Analysis

<p>Predicting flash floods at short time scales as well as their impacts is of vital interest to forecasters, emergency managers and community members alike. Particularly, characteristics such as location, timing, and duration are crucial for decision-making processes for the protection of lives, property and infrastructure. Even though these characteristics are primarily driven by the causative rainfall and basin geomorphology, &#160;untangling the complex interactions between precipitation and hydrological processes becomes challenging due to the lack of observational datasets which capture diverse conditions.</p><p>This work follows upon previous efforts on incorporating spatial rainfall moments as viable predictors for flash flood event characteristics such as lag time and the exceedance of flood stage thresholds at gauged locations over the Conterminous United States (CONUS). These variables were modeled by applying various supervised machine learning techniques over a database of flood events. The data included morphological, climatological, streamflow and precipitation data from over 21,000 flood-producing rainfall events &#8211; that occurred over 900+ different basins throughout the CONUS between 2002-2011. This dataset included basin parameters and indices derived from radar-based precipitation, which represented sub-basin scale rainfall spatial variability for each storm event. Both classification and regression models were constructed, and variable importance analysis was performed in order to determine the relevant factors reflecting hydrometeorological processes. In this iteration, a closer look at model performance consistency and variable selection aims to further explore rainfall moments&#8217; explanatory power of flood characteristics.&#160;</p>

Download Full-text

Assessment of Corneal Pachymetry Distribution and Morphologic Changes in Subclinical Keratoconus with Normal Biomechanics

BioMed Research International ◽

10.1155/2019/1748579 ◽

2019 ◽

Vol 2019 ◽

pp. 1-7 ◽

Cited By ~ 1

Author(s):

Peng Song ◽

Kaili Yang ◽

Pei Li ◽

Yu Liu ◽

Dengfeng Liang ◽

...

Keyword(s):

Operating Characteristic ◽

Characteristic Curve ◽

Area Under The Curve ◽

Positive Association ◽

Significant Positive Association ◽

Central Cornea ◽

Elevation Difference ◽

Retrospective Comparative Study ◽

Eyes Detection ◽

Morphologic Changes

Purpose. To investigate the pachymetry distribution of central cornea and morphologic changes in subclinical keratoconus with normal biomechanics and determine their potential benefit for the screening of very early keratoconus. Methods. This retrospective comparative study was performed in 33 clinically unaffected eyes with normal topography and biomechanics from 33 keratoconus patients with very asymmetric ectasia (VAE-NTB; Corvis Biomechanical Index defined) and 70 truly normal eyes from 70 age-matched subjects. Corneal topographic, tomographic, and biomechanical metrics were measured using Pentacam and Corvis ST. The distance and pachymetry difference between the corneal thinnest point and the apex were defined as DTCP-Apex and DPTCP-Apex, respectively, to evaluate the pachymetry distribution within the central cornea. The discriminatory power of metrics was analysed via the receiver operating characteristic curve. A logistic regression analysis was used to establish predictive models. Results. The parameters, DTCP-Apex and DPTCP-Apex, were significantly higher in VAE-NTB than those in normal eyes. For differentiating normal and VAE-NTB eyes, the Belin-Ambrósio deviation (BAD-D) showed the largest area under the curve (AUC; 0.799), followed by ARTmax (0.798), DTCP-Apex (0.771), tomography and biomechanical index (0.760), maximum pachymetry progression index (PPImax, 0.756), DPTCP-Apex (0.753), and back eccentricity (B_Ecc, 0.707) with no statistically significant differences among these AUCs. In the VAE-NTB group, the parameter B_Ecc was significantly and positively correlated with DTCP-Apex (P=0.011) and DPTCP-Apex (P=0.035), whereas the posterior elevation difference had a significant positive association with DPTCP-Apex (P=0.042). A model using the indices DTCP-Apex, B_Ecc, PPImax, and index of height asymmetry demonstrated the highest AUC of 0.846 with 91.43% specificity. Conclusions. Abnormal pachymetry distribution within the central cornea and subtle morphologic changes are detectable in subclinical keratoconus with normal biomechanics. This may improve VAE-NTB eyes detection.

Download Full-text

A machine learning model for the prediction of survival and tumor subtype in pancreatic ductal adenocarcinoma from preoperative diffusion-weighted imaging

European Radiology Experimental ◽

10.1186/s41747-019-0119-0 ◽

2019 ◽

Vol 3 (1) ◽

Cited By ~ 13

Author(s):

Georgios Kaissis ◽

Sebastian Ziegelmayer ◽

Fabian Lohöfer ◽

Hana Algül ◽

Matthias Eiber ◽

...

Keyword(s):

Machine Learning ◽

Pancreatic Ductal Adenocarcinoma ◽

Diffusion Weighted Imaging ◽

Area Under The Curve ◽

Supervised Machine Learning ◽

Ductal Adenocarcinoma ◽

Recursive Feature Elimination ◽

Training Cohort ◽

Diffusion Weighted ◽

Histopathological Subtype

Abstract Background To develop a supervised machine learning (ML) algorithm predicting above- versus below-median overall survival (OS) from diffusion-weighted imaging-derived radiomic features in patients with pancreatic ductal adenocarcinoma (PDAC). Methods One hundred two patients with histopathologically proven PDAC were retrospectively assessed as training cohort, and 30 prospectively accrued and retrospectively enrolled patients served as independent validation cohort (IVC). Tumors were segmented on preoperative apparent diffusion coefficient (ADC) maps, and radiomic features were extracted. A random forest ML algorithm was fit to the training cohort and tested in the IVC. Histopathological subtype of tumor samples was assessed by immunohistochemistry in 21 IVC patients. Individual radiomic feature importance was evaluated by assessment of tree node Gini impurity decrease and recursive feature elimination. Fisher’s exact test, 95% confidence intervals (CI), and receiver operating characteristic area under the curve (ROC-AUC) were used. Results The ML algorithm achieved 87% sensitivity (95% IC 67.3–92.7), 80% specificity (95% CI 74.0–86.7), and ROC-AUC 90% for the prediction of above- versus below-median OS in the IVC. Heterogeneity-related features were highly ranked by the model. Of the 21 patients with determined histopathological subtype, 8/9 patients predicted to experience below-median OS exhibited the quasi-mesenchymal subtype, whilst 11/12 patients predicted to experience above-median OS exhibited a non-quasi-mesenchymal subtype (p < 0.001). Conclusion ML application to ADC radiomics allowed OS prediction with a high diagnostic accuracy in an IVC. The high overlap of clinically relevant histopathological subtypes with model predictions underlines the potential of quantitative imaging in PDAC pre-operative subtyping and prognosis.

Download Full-text