Machine Learning Prediction of Smoking Risk from Repeated Sampling of Environmental Images (Preprint)

Mapping Intimacies ◽

10.2196/preprints.27875 ◽

2021 ◽

Author(s):

Matthew M Engelhard ◽

Joshua D'Arcy ◽

Jason A Oliver ◽

Rachel Kozink ◽

F Joseph McClernon

Keyword(s):

Risk Model ◽

Characteristic Curve ◽

Image Features ◽

Prediction Performance ◽

Smoking Habits ◽

Daily Smoking ◽

Habitual Smoking ◽

Out Of Sample ◽

Repeated Sampling ◽

Out Of Sample Prediction

BACKGROUND Viewing their habitual smoking environments increases smokers’ craving and smoking behaviors in laboratory settings. A deep learning approach can differentiate between habitual smoking versus nonsmoking environments, but its ability to predict smoking risk associated with the broader range of environments smokers encounter in their daily lives is unknown. OBJECTIVE To predict environment-associated smoking risk from continuously acquired images of smokers’ daily environments. METHODS Smokers from the Durham, NC area completed ecological momentary assessments both immediately smoking and at randomly selected times throughout the day, for two weeks. At each assessment, participants took a picture of their current environment and completed a questionnaire on smoking, craving, and the environmental setting. A convolutional neural network (CNN)-based model was trained to predict smoking, craving, whether smoking was allowed, and whether the participant was outside based on images of participants’ daily environments, the time since their last cigarette, and baseline data on daily smoking habits. Prediction performance, quantified using the area under the receiver operating characteristic curve (AUC) and average precision (AP), was assessed for (a) out-of-sample prediction, and (b) personalized models trained on images from days 1-10. Models were optimized for mobile devices and implemented as a smartphone app. RESULTS Forty-eight participants completed the study, and 8008 images were acquired. The personalized models were highly effective in predicting smoking risk (AUC=0.827; AP=0.882), craving (AUC=0.837; AP=0.798), whether smoking was allowed in the current environment (AUC=0.932, AP=0.981), and whether the participant was outside (AUC=0.977, AP=0.956). The out-of-sample models were also effective in predicting smoking risk (AUC=0.723, AP=0.785), whether smoking was allowed in the current environment (AUC=0.815, AP=0.937), and whether the participant was outside (AUC=0.949, AP=0.922); but were not effective in predicting craving (AUC=0.522, AP=0.427). Omitting image features reduced performance (p<0.05) when predicting all outcomes except craving (p>0.05). Smoking prediction was more effective in participants whose self-reported location type was more variable (Spearman’s ρ=0.48, p=0.001). CONCLUSIONS Images of daily environments can be used to effectively predict smoking risk. Model personalization, achieved by incorporating information about daily smoking habits and training on participant-specific images, further improves prediction performance. Environment-associated smoking risk can be assessed in real time on a mobile device, and could be incorporated in device-based smoking cessation interventions.

Download Full-text

Recovery after stroke: the severely impaired are a distinct group

Journal of Neurology Neurosurgery & Psychiatry ◽

10.1136/jnnp-2021-327211 ◽

2021 ◽

pp. jnnp-2021-327211

Author(s):

Anna K Bonkhoff ◽

Tom Hope ◽

Danilo Bzdok ◽

Adrian G Guggisberg ◽

Rachel L Hawe ◽

...

Keyword(s):

Distinct Group ◽

Recovery Process ◽

Prediction Performance ◽

Stroke Recovery ◽

Bayesian Hierarchical ◽

Statistical Framework ◽

Process Framework ◽

Out Of Sample ◽

Out Of Sample Prediction ◽

Different Levels

IntroductionStroke causes different levels of impairment and the degree of recovery varies greatly between patients. The majority of recovery studies are biased towards patients with mild-to-moderate impairments, challenging a unified recovery process framework. Our aim was to develop a statistical framework to analyse recovery patterns in patients with severe and non-severe initial impairment and concurrently investigate whether they recovered differently.MethodsWe designed a Bayesian hierarchical model to estimate 3–6 months upper limb Fugl-Meyer (FM) scores after stroke. When focusing on the explanation of recovery patterns, we addressed confounds affecting previous recovery studies and considered patients with FM-initial scores <45 only. We systematically explored different FM-breakpoints between severe/non-severe patients (FM-initial=5–30). In model comparisons, we evaluated whether impairment-level-specific recovery patterns indeed existed. Finally, we estimated the out-of-sample prediction performance for patients across the entire initial impairment range.ResultsRecovery data was assembled from eight patient cohorts (n=489). Data were best modelled by incorporating two subgroups (breakpoint: FM-initial=10). Both subgroups recovered a comparable constant amount, but with different proportional components: severely affected patients recovered more the smaller their impairment, while non-severely affected patients recovered more the larger their initial impairment. Prediction of 3–6 months outcomes could be done with an R2=63.5% (95% CI=51.4% to 75.5%).ConclusionsOur work highlights the benefit of simultaneously modelling recovery of severely-to-non-severely impaired patients and demonstrates both shared and distinct recovery patterns. Our findings provide evidence that the severe/non-severe subdivision in recovery modelling is not an artefact of previous confounds. The presented out-of-sample prediction performance may serve as benchmark to evaluate promising biomarkers of stroke recovery.

Download Full-text

Spatial Panel Models of Crop Yield Response to Weather: Econometric Specification Strategies and Prediction Performance

Journal of Agricultural and Applied Economics ◽

10.1017/aae.2021.29 ◽

2021 ◽

pp. 1-19

Author(s):

Seong D. Yun ◽

Benjamin M. Gramig

Keyword(s):

Crop Yield ◽

Fixed Effects ◽

Prediction Performance ◽

Model Specification ◽

Yield Response ◽

Fixed Effects Model ◽

Spatial Econometric ◽

Site Characteristics ◽

Out Of Sample ◽

Out Of Sample Prediction

Abstract This study scrutinizes spatial econometric models and specifications of crop yield response functions to provide a robust evaluation of empirical alternatives available to researchers. We specify 14 competing panel regression models of crop yield response to weather and site characteristics. Using county corn yields in the US, this study implements in-sample, out-of-sample, and bootstrapped out-of-sample prediction performance comparisons. Descriptive propositions and empirical results demonstrate the importance of spatial correlation and empirically support the fixed effects model with spatially dependent error structures. This study also emphasizes the importance of extensive model specification testing and evaluation of selection criteria for prediction.

Download Full-text

A Machine Learning-Based Screening Test for Sarcopenic Dysphagia Using Image Recognition

Nutrients ◽

10.3390/nu13114009 ◽

2021 ◽

Vol 13 (11) ◽

pp. 4009

Author(s):

Kotomi Sakai ◽

Stuart Gilmour ◽

Eri Hoshino ◽

Enri Nakayama ◽

Ryo Momosaki ◽

...

Keyword(s):

Image Recognition ◽

Predictive Value ◽

Screening Test ◽

Older Patients ◽

Characteristic Curve ◽

Image Features ◽

Prediction Performance ◽

Care Hospital ◽

Cross Sectional ◽

Sarcopenic Dysphagia

Background: Sarcopenic dysphagia, a swallowing disorder caused by sarcopenia, is prevalent in older patients and can cause malnutrition and aspiration pneumonia. This study aimed to develop a simple screening test using image recognition with a low risk of droplet transmission for sarcopenic dysphagia. Methods: Older patients admitted to a post-acute care hospital were enrolled in this cross-sectional study. As a main variable for the development of a screening test, we photographed the anterior neck to analyze the image features of sarcopenic dysphagia. The studied image features included the pixel values and the number of feature points. We constructed screening models using the image features, age, sex, and body mass index. The prediction performance of each model was investigated. Results: A total of 308 patients participated, including 175 (56.82%) patients without dysphagia and 133 (43.18%) with sarcopenic dysphagia. The area under the receiver operating characteristic curve (ROC-AUC), sensitivity, specificity, positive predictive value, negative predictive value, and area under the precision-recall curve (PR-AUC) values of the best model were 0.877, 87.50%, 76.67%, 66.67%, 92.00%, and 0.838, respectively. The model with image features alone showed an ROC-AUC of 0.814 and PR-AUC of 0.726. Conclusions: The screening test for sarcopenic dysphagia using image recognition of neck appearance had high prediction performance.

Download Full-text

Predicting the trend of stock index based on feature engineering and CatBoost model

International Journal of Financial Engineering ◽

10.1142/s2424786321500274 ◽

2021 ◽

pp. 2150027

Author(s):

Renzhe Xu ◽

Yudong Chen ◽

Tenglong Xiao ◽

Jingli Wang ◽

Xiong Wang

Keyword(s):

Time Scale ◽

Stock Index ◽

Feature Engineering ◽

Multiple Time ◽

Multiple Time Scale ◽

Back Testing ◽

Price Series ◽

Out Of Sample ◽

Out Of Sample Prediction ◽

Sharp Ratio

As an important tool to measure the current situation of the whole stock market, the stock index has always been the focus of researchers, especially for its prediction. This paper uses trend types, which are received by clustering price series under multiple time scale, combined with the day-of-the-week effect to construct a categorical feature combination. Based on the historical data of six kinds of Chinese stock indexes, the CatBoost model is used for training and predicting. Experimental results show that the out-of-sample prediction accuracy is 0.55, and the long–short trading strategy can obtain average annualized return of 34.43%, which is a great improvement compared with other classical classification algorithms. Under the rolling back-testing, the model can always obtain stable returns in each period of time from 2012 to 2020. Among them, the SSESC’s long–short strategy has the best performance with an annualized return of 40.85% and a sharp ratio of 1.53. Therefore, the trend information on multiple time-scale features based on feature engineering can be learned by the CatBoost model well, which has a guiding effect on predicting stock index trends.

Download Full-text

Customizing persuasive messages; the value of operative measures

Journal of Consumer Marketing ◽

10.1108/jcm-11-2016-1996 ◽

2018 ◽

Vol 35 (2) ◽

pp. 208-217 ◽

Cited By ~ 3

Author(s):

Maurits Kaptein

Keyword(s):

Interactive Media ◽

Database Systems ◽

Customer Behavior ◽

Customer Relationship ◽

Persuasive Messages ◽

Psychological Traits ◽

Content Type ◽

Out Of Sample ◽

Out Of Sample Prediction ◽

Operative Measures

Purpose This paper aims to examine whether estimates of psychological traits obtained using meta-judgmental measures (as commonly present in customer relationship management database systems) or operative measures are most useful in predicting customer behavior. Design/methodology/approach Using an online experiment (N = 283), the study collects meta-judgmental and operative measures of customers. Subsequently, it compares the out-of-sample prediction error of responses to persuasive messages. Findings The study shows that operative measures – derived directly from measures of customer behavior – are more informative than meta-judgmental measures. Practical implications Using interactive media, it is possible to actively elicit operative measures. This study shows that practitioners seeking to customize their marketing communication should focus on obtaining such psychographic observations. Originality/value While currently both meta-judgmental measures and operative measures are used for customization in interactive marketing, this study directly compares their utility for the prediction of future responses to persuasive messages.

Download Full-text

Network controllability in transmodal cortex predicts psychosis spectrum symptoms

10.1101/2020.10.01.20205336 ◽

2020 ◽

Cited By ~ 1

Author(s):

Linden Parkes ◽

Tyler M. Moore ◽

Monica E. Calkins ◽

Matthew Cieslak ◽

David R. Roalf ◽

...

Keyword(s):

Control Theory ◽

Structural Connectivity ◽

Prediction Performance ◽

Network Control ◽

Association Cortex ◽

Out Of Sample ◽

Sample Testing ◽

Functional Hierarchy ◽

Structural Connections ◽

Network Controllability

ABSTRACTBackgroundThe psychosis spectrum is associated with structural dysconnectivity concentrated in transmodal association cortex. However, understanding of this pathophysiology has been limited by an exclusive focus on the direct connections to a region. Using Network Control Theory, we measured variation in both direct and indirect structural connections to a region to gain new insights into the pathophysiology of the psychosis spectrum.MethodsWe used psychosis symptom data and structural connectivity in 1,068 youths aged 8 to 22 years from the Philadelphia Neurodevelopmental Cohort. Applying a Network Control Theory metric called average controllability, we estimated each brain region’s capacity to leverage its direct and indirect structural connections to control linear brain dynamics. Next, using non-linear regression, we determined the accuracy with which average controllability could predict negative and positive psychosis spectrum symptoms in out-of-sample testing. We also compared prediction performance for average controllability versus strength, which indexes only direct connections to a region. Finally, we assessed how the prediction performance for psychosis spectrum symptoms varied over the functional hierarchy spanning unimodal to transmodal cortex.ResultsAverage controllability outperformed strength at predicting positive psychosis spectrum symptoms, demonstrating that indexing indirect structural connections to a region improved prediction performance. Critically, improved prediction was concentrated in association cortex for average controllability, whereas prediction performance for strength was uniform across the cortex, suggesting that indexing indirect connections is crucial in association cortex.ConclusionsExamining inter-individual variation in direct and indirect structural connections to association cortex is crucial for accurate prediction of positive psychosis spectrum symptoms.

Download Full-text

Machine-Learning and Stochastic Tumor Growth Models for Predicting Outcomes in Patients With Advanced Non–Small-Cell Lung Cancer

JCO Clinical Cancer Informatics ◽

10.1200/cci.19.00046 ◽

2019 ◽

pp. 1-11 ◽

Cited By ~ 1

Author(s):

Kien Wei Siah ◽

Sean Khozin ◽

Chi Heem Wong ◽

Andrew W. Lo

Keyword(s):

Machine Learning ◽

Lung Cancer ◽

Clinical Trials ◽

Operating Characteristic ◽

Advanced Nsclc ◽

Characteristic Curve ◽

Small Cell ◽

Small Cell Lung ◽

Out Of Sample ◽

Operating Characteristic Curve

PURPOSE The prediction of clinical outcomes for patients with cancer is central to precision medicine and the design of clinical trials. We developed and validated machine-learning models for three important clinical end points in patients with advanced non–small-cell lung cancer (NSCLC)—objective response (OR), progression-free survival (PFS), and overall survival (OS)—using routinely collected patient and disease variables. METHODS We aggregated patient-level data from 17 randomized clinical trials recently submitted to the US Food and Drug Administration evaluating molecularly targeted therapy and immunotherapy in patients with advanced NSCLC. To our knowledge, this is one of the largest studies of NSCLC to consider biomarker and inhibitor therapy as candidate predictive variables. We developed a stochastic tumor growth model to predict tumor response and explored the performance of a range of machine-learning algorithms and survival models. Models were evaluated on out-of-sample data using the standard area under the receiver operating characteristic curve and concordance index (C-index) performance metrics. RESULTS Our models achieved promising out-of-sample predictive performances of 0.79 area under the receiver operating characteristic curve (95% CI, 0.77 to 0.81), 0.67 C-index (95% CI, 0.66 to 0.69), and 0.73 C-index (95% CI, 0.72 to 0.74) for OR, PFS, and OS, respectively. The calibration plots for PFS and OS suggested good agreement between actual and predicted survival probabilities. In addition, the Kaplan-Meier survival curves showed that the difference in survival between the low- and high-risk groups was significant (log-rank test P < .001) for both PFS and OS. CONCLUSION Biomarker status was the strongest predictor of OR, PFS, and OS in patients with advanced NSCLC treated with immune checkpoint inhibitors and targeted therapies. However, single biomarkers have limited predictive value, especially for programmed death-ligand 1 immunotherapy. To advance beyond the results achieved in this study, more comprehensive data on composite multiomic signatures is required.

Download Full-text

Serum acylcarnitines and amino acids and risk of type 2 diabetes in a multiethnic Asian population

BMJ Open Diabetes Research & Care ◽

10.1136/bmjdrc-2020-001315 ◽

2020 ◽

Vol 8 (1) ◽

pp. e001315

Author(s):

Samuel H Gunther ◽

Chin Meng Khoo ◽

E-Shyong Tai ◽

Xueling Sim ◽

Jean-Paul Kovalik ◽

...

Keyword(s):

Type 2 Diabetes ◽

Amino Acids ◽

Risk Model ◽

Characteristic Curve ◽

Asian Population ◽

Cox Proportional Hazards ◽

Diabetes Risk ◽

Study Program ◽

Parental History

IntroductionWe evaluated whether concentrations of serum acylcarnitines and amino acids are associated with risk of type 2 diabetes and can improve predictive diabetes models in an Asian population.Research design and methodsWe used data from 3313 male and female participants from the Singapore Prospective Study Program cohort who were diabetes-free at baseline. The average age at baseline was 48.0 years (SD: 11.9 years), and participants were of Chinese, Malay, and Indian ethnicity. Diabetes cases were identified through self-reported physician diagnosis, fasting glucose and glycated hemoglobin concentrations, and linkage to national disease registries. We measured fasting serum concentrations of 45 acylcarnitines and 14 amino acids. The association between metabolites and incident diabetes was modeled using Cox proportional hazards regression with adjustment for age, sex, ethnicity, height, and parental history of diabetes, and correction for multiple testing. Metabolites were added to the Atherosclerosis Risk in Communities (ARIC) predictive diabetes risk model to assess whether they could increase the area under the receiver operating characteristic curve (AUC).ResultsParticipants were followed up for an average of 8.4 years (SD: 2.1 years), during which time 314 developed diabetes. Branched-chain amino acids (HR: 1.477 per SD; 95% CI 1.325 to 1.647) and the alanine to glycine ratio (HR: 1.572; 95% CI 1.426 to 1.733) were most strongly associated with diabetes risk. Additionally, the acylcarnitines C4 and C16-OH, and the amino acids alanine, combined glutamate/glutamine, ornithine, phenylalanine, proline, and tyrosine were significantly associated with higher diabetes risk, and the acylcarnitine C8-DC and amino acids glycine and serine with lower risk. Adding selected metabolites to the ARIC model resulted in a significant increase in AUC from 0.836 to 0.846.ConclusionsWe identified acylcarnitines and amino acids associated with risk of type 2 diabetes in an Asian population. A subset of these modestly improved the prediction of diabetes when added to an established diabetes risk model.

Download Full-text

Microstructure in the Machine Age

Review of Financial Studies ◽

10.1093/rfs/hhaa078 ◽

2020 ◽

Cited By ~ 1

Author(s):

David Easley ◽

Marcos López de Prado ◽

Maureen O’Hara ◽

Zhibai Zhang

Keyword(s):

Machine Learning ◽

Market Efficiency ◽

Predictive Power ◽

Explanatory Power ◽

Futures Contracts ◽

Machine Age ◽

Price Process ◽

Out Of Sample ◽

Asset Classes ◽

Out Of Sample Prediction

Abstract Understanding modern market microstructure phenomena requires large amounts of data and advanced mathematical tools. We demonstrate how machine learning can be applied to microstructural research. We find that microstructure measures continue to provide insights into the price process in current complex markets. Some microstructure features with high explanatory power exhibit low predictive power, while others with less explanatory power have more predictive power. We find that some microstructure-based measures are useful for out-of-sample prediction of various market statistics, leading to questions about market efficiency. We also show how microstructure measures can have important cross-asset effects. Our results are derived using 87 liquid futures contracts across all asset classes.

Download Full-text

Application of bivariate negative binomial regression model in analysing insurance count data

Annals of Actuarial Science ◽

10.1017/s1748499517000070 ◽

2017 ◽

Vol 11 (2) ◽

pp. 390-411 ◽

Cited By ~ 2

Author(s):

Feng Liu ◽

David Pitt

Keyword(s):

Ridge Regression ◽

Negative Binomial ◽

Negative Binomial Regression ◽

Third Party ◽

Negative Binomial Regression Model ◽

Shrinkage Methods ◽

Binomial Regression ◽

Out Of Sample ◽

Selection Operator ◽

Out Of Sample Prediction

AbstractIn this paper we analyse insurance claim frequency data using the bivariate negative binomial regression (BNBR) model. We use general insurance data on claims from simple third-party liability insurance and comprehensive insurance. We find that bivariate regression, with its capacity for modelling correlation between the two observed claim counts, provides both a superior fit and out-of-sample prediction compared with the more common practice of fitting univariate negative binomial regression models separately to each claim type. Noting the complexity of BNBR models and their potential for a large number of parameters, we explore the use of model shrinkage methodology, namely the least absolute shrinkage and selection operator (Lasso) and ridge regression. We find that models estimated using shrinkage methods outperform the ordinary likelihood-based models when being used to make predictions out-of-sample. We find that the Lasso performs better than ridge regression as a method of shrinkage.

Download Full-text