scholarly journals Prediction of the 1-Year Risk of Incident Lung Cancer: Prospective Study Using Electronic Health Records from the State of Maine (Preprint)

2018 ◽  
Author(s):  
Xiaofang Wang ◽  
Yan Zhang ◽  
Shiying Hao ◽  
Le Zheng ◽  
Jiayu Liao ◽  
...  

BACKGROUND Lung cancer is the leading cause of cancer death worldwide. Early detection of individuals at risk of lung cancer is critical to reduce the mortality rate. OBJECTIVE The aim of this study was to develop and validate a prospective risk prediction model to identify patients at risk of new incident lung cancer within the next 1 year in the general population. METHODS Data from individual patient electronic health records (EHRs) were extracted from the Maine Health Information Exchange network. The study population consisted of patients with at least one EHR between April 1, 2016, and March 31, 2018, who had no history of lung cancer. A retrospective cohort (N=873,598) and a prospective cohort (N=836,659) were formed for model construction and validation. An Extreme Gradient Boosting (XGBoost) algorithm was adopted to build the model. It assigned a score to each individual to quantify the probability of a new incident lung cancer diagnosis from October 1, 2016, to September 31, 2017. The model was trained with the clinical profile in the retrospective cohort from the preceding 6 months and validated with the prospective cohort to predict the risk of incident lung cancer from April 1, 2017, to March 31, 2018. RESULTS The model had an area under the curve (AUC) of 0.881 (95% CI 0.873-0.889) in the prospective cohort. Two thresholds of 0.0045 and 0.01 were applied to the predictive scores to stratify the population into low-, medium-, and high-risk categories. The incidence of lung cancer in the high-risk category (579/53,922, 1.07%) was 7.7 times higher than that in the overall cohort (1167/836,659, 0.14%). Age, a history of pulmonary diseases and other chronic diseases, medications for mental disorders, and social disparities were found to be associated with new incident lung cancer. CONCLUSIONS We retrospectively developed and prospectively validated an accurate risk prediction model of new incident lung cancer occurring in the next 1 year. Through statistical learning from the statewide EHR data in the preceding 6 months, our model was able to identify statewide high-risk patients, which will benefit the population health through establishment of preventive interventions or more intensive surveillance.

2016 ◽  
Vol 6 (1) ◽  
Author(s):  
Xifeng Wu ◽  
Chi Pang Wen ◽  
Yuanqing Ye ◽  
MinKwang Tsai ◽  
Christopher Wen ◽  
...  

Abstract The objective of this study was to develop markedly improved risk prediction models for lung cancer using a prospective cohort of 395,875 participants in Taiwan. Discriminatory accuracy was measured by generation of receiver operator curves and estimation of area under the curve (AUC). In multivariate Cox regression analysis, age, gender, smoking pack-years, family history of lung cancer, personal cancer history, BMI, lung function test, and serum biomarkers such as carcinoembryonic antigen (CEA), bilirubin, alpha fetoprotein (AFP), and c-reactive protein (CRP) were identified and included in an integrative risk prediction model. The AUC in overall population was 0.851 (95% CI = 0.840–0.862), with never smokers 0.806 (95% CI = 0.790–0.819), light smokers 0.847 (95% CI = 0.824–0.871), and heavy smokers 0.732 (95% CI = 0.708–0.752). By integrating risk factors such as family history of lung cancer, CEA and AFP for light smokers, and lung function test (Maximum Mid-Expiratory Flow, MMEF25–75%), AFP and CEA for never smokers, light and never smokers with cancer risks as high as those within heavy smokers could be identified. The risk model for heavy smokers can allow us to stratify heavy smokers into subgroups with distinct risks, which, if applied to low-dose computed tomography (LDCT) screening, may greatly reduce false positives.


2021 ◽  
Vol 12 ◽  
Author(s):  
Carolina Varela Rodríguez ◽  
Francisco Arias Horcajadas ◽  
Cristina Martín-Arriscado Arroba ◽  
Carolina Combarro Ripoll ◽  
Alba Juanes Gonzalez ◽  
...  

Patients with an alcohol abuse disorder exhibit several medical characteristics and social determinants, which suggest a greater vulnerability to the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection and a worse course of the coronavirus disease 2019 (COVID-19) once infected. During the first wave of the COVID-19, most of the countries have register an increase in alcohol consumption. However, studies on the impact of alcohol addiction on the risk of COVID-19 infection are very scarce and inconclusive. This research offers a descriptive observational retrospective cohort study using real world data obtained from the Electronic Health Records. We found that patients with a personal history of alcohol abuse were 8% more likely to extend their hospitalization length of stay for 1 day (95% CI = 1.04–1.12) and 15% more likely to extend their Intensive Care Unit (ICU) length of stay (95% CI = 1.01–1.30). They were also 5.47 times more at risk of needing an ICU admission (95% CI = 1.61–18.57) and 3.54 times (95% CI = 1.51–8.30) more at risk of needing a respirator. Regarding COVID-19 symptoms, patients with a personal history of alcohol abuse were 91% more likely of exhibiting dyspnea (95% CI = 1.03–3.55) and 3.15 times more at risk of showing at least one neuropsychiatric symptom (95% CI = 1.61–6.17). In addition, they showed statistically significant differences in the number of neuropsychiatric symptoms developed during the COVID-19 infection. Therefore, we strongly recommend to warn of the negative consequences of alcohol abuse over COVID-19 complications. For this purpose. Clinicians should systematically assess history of alcohol issues and drinking habits in all patients, especially for those who seek medical advice regarding COVID-19 infection, in order to predict its severity of symptoms and potential complications. Moreover, this information should be included, in a structured field, into the Electronic Health Record to facilitate the automatic extraction of data, in real time, useful to evaluate the decision-making process in a dynamic context.


2017 ◽  
Author(s):  
Chengyin Ye ◽  
Tianyun Fu ◽  
Shiying Hao ◽  
Yan Zhang ◽  
Oliver Wang ◽  
...  

BACKGROUND As a high-prevalence health condition, hypertension is clinically costly, difficult to manage, and often leads to severe and life-threatening diseases such as cardiovascular disease (CVD) and stroke. OBJECTIVE The aim of this study was to develop and validate prospectively a risk prediction model of incident essential hypertension within the following year. METHODS Data from individual patient electronic health records (EHRs) were extracted from the Maine Health Information Exchange network. Retrospective (N=823,627, calendar year 2013) and prospective (N=680,810, calendar year 2014) cohorts were formed. A machine learning algorithm, XGBoost, was adopted in the process of feature selection and model building. It generated an ensemble of classification trees and assigned a final predictive risk score to each individual. RESULTS The 1-year incident hypertension risk model attained areas under the curve (AUCs) of 0.917 and 0.870 in the retrospective and prospective cohorts, respectively. Risk scores were calculated and stratified into five risk categories, with 4526 out of 381,544 patients (1.19%) in the lowest risk category (score 0-0.05) and 21,050 out of 41,329 patients (50.93%) in the highest risk category (score 0.4-1) receiving a diagnosis of incident hypertension in the following 1 year. Type 2 diabetes, lipid disorders, CVDs, mental illness, clinical utilization indicators, and socioeconomic determinants were recognized as driving or associated features of incident essential hypertension. The very high risk population mainly comprised elderly (age>50 years) individuals with multiple chronic conditions, especially those receiving medications for mental disorders. Disparities were also found in social determinants, including some community-level factors associated with higher risk and others that were protective against hypertension. CONCLUSIONS With statewide EHR datasets, our study prospectively validated an accurate 1-year risk prediction model for incident essential hypertension. Our real-time predictive analytic model has been deployed in the state of Maine, providing implications in interventions for hypertension and related diseases and hopefully enhancing hypertension care.


Author(s):  
Sunitha .T ◽  
Shyamala .J ◽  
Annie Jesus Suganthi Rani.A

Data mining suggest an innovative way of prognostication stereotype of Patients health risks. Large amount of Electronic Health Records (EHRs) collected over the years have provided a rich base for risk analysis and prediction. An EHR contains digitally stored healthcare information about an individual, such as observations, laboratory tests, diagnostic reports, medications, procedures, patient identifying information and allergies. A special type of EHR is the Health Examination Records (HER) from annual general health check-ups. Identifying participants at risk based on their current and past HERs is important for early warning and preventive intervention. By “risk”, we mean unwanted outcomes such as mortality and morbidity. This approach is limited due to the classification problem and consequently it is not informative about the specific disease area in which a personal is at risk. Limited amount of data extracted from the health record is not feasible for providing the accurate risk prediction. The main motive of this project is for risk prediction to classify progressively developing situation with the majority of the data unlabeled.


CHEST Journal ◽  
2019 ◽  
Vol 156 (1) ◽  
pp. 112-119 ◽  
Author(s):  
Heber MacMahon ◽  
Feng Li ◽  
Yulei Jiang ◽  
Samuel G. Armato

Sign in / Sign up

Export Citation Format

Share Document