scholarly journals Federated Learning on Clinical Benchmark Data: Performance Assessment

10.2196/20891 ◽  
2020 ◽  
Vol 22 (10) ◽  
pp. e20891
Author(s):  
Geun Hyeong Lee ◽  
Soo-Yong Shin

Background Federated learning (FL) is a newly proposed machine-learning method that uses a decentralized dataset. Since data transfer is not necessary for the learning process in FL, there is a significant advantage in protecting personal privacy. Therefore, many studies are being actively conducted in the applications of FL for diverse areas. Objective The aim of this study was to evaluate the reliability and performance of FL using three benchmark datasets, including a clinical benchmark dataset. Methods To evaluate FL in a realistic setting, we implemented FL using a client-server architecture with Python. The implemented client-server version of the FL software was deployed to Amazon Web Services. Modified National Institute of Standards and Technology (MNIST), Medical Information Mart for Intensive Care-III (MIMIC-III), and electrocardiogram (ECG) datasets were used to evaluate the performance of FL. To test FL in a realistic setting, the MNIST dataset was split into 10 different clients, with one digit for each client. In addition, we conducted four different experiments according to basic, imbalanced, skewed, and a combination of imbalanced and skewed data distributions. We also compared the performance of FL to that of the state-of-the-art method with respect to in-hospital mortality using the MIMIC-III dataset. Likewise, we conducted experiments comparing basic and imbalanced data distributions using MIMIC-III and ECG data. Results FL on the basic MNIST dataset with 10 clients achieved an area under the receiver operating characteristic curve (AUROC) of 0.997 and an F1-score of 0.946. The experiment with the imbalanced MNIST dataset achieved an AUROC of 0.995 and an F1-score of 0.921. The experiment with the skewed MNIST dataset achieved an AUROC of 0.992 and an F1-score of 0.905. Finally, the combined imbalanced and skewed experiment achieved an AUROC of 0.990 and an F1-score of 0.891. The basic FL on in-hospital mortality using MIMIC-III data achieved an AUROC of 0.850 and an F1-score of 0.944, while the experiment with the imbalanced MIMIC-III dataset achieved an AUROC of 0.850 and an F1-score of 0.943. For ECG classification, the basic FL achieved an AUROC of 0.938 and an F1-score of 0.807, and the imbalanced ECG dataset achieved an AUROC of 0.943 and an F1-score of 0.807. Conclusions FL demonstrated comparative performance on different benchmark datasets. In addition, FL demonstrated reliable performance in cases where the distribution was imbalanced, skewed, and extreme, reflecting the real-life scenario in which data distributions from various hospitals are different. FL can achieve high performance while maintaining privacy protection because there is no requirement to centralize the data.

2020 ◽  
Author(s):  
Geun Hyeong Lee ◽  
Soo-Yong Shin

BACKGROUND Federated learning (FL) is a newly proposed machine-learning method that uses a decentralized dataset. Since data transfer is not necessary for the learning process in FL, there is a significant advantage in protecting personal privacy. Therefore, many studies are being actively conducted in the applications of FL for diverse areas. OBJECTIVE The aim of this study was to evaluate the reliability and performance of FL using three benchmark datasets, including a clinical benchmark dataset. METHODS To evaluate FL in a realistic setting, we implemented FL using a client-server architecture with Python. The implemented client-server version of the FL software was deployed to Amazon Web Services. Modified National Institute of Standards and Technology (MNIST), Medical Information Mart for Intensive Care-III (MIMIC-III), and electrocardiogram (ECG) datasets were used to evaluate the performance of FL. To test FL in a realistic setting, the MNIST dataset was split into 10 different clients, with one digit for each client. In addition, we conducted four different experiments according to basic, imbalanced, skewed, and a combination of imbalanced and skewed data distributions. We also compared the performance of FL to that of the state-of-the-art method with respect to in-hospital mortality using the MIMIC-III dataset. Likewise, we conducted experiments comparing basic and imbalanced data distributions using MIMIC-III and ECG data. RESULTS FL on the basic MNIST dataset with 10 clients achieved an area under the receiver operating characteristic curve (AUROC) of 0.997 and an F1-score of 0.946. The experiment with the imbalanced MNIST dataset achieved an AUROC of 0.995 and an F1-score of 0.921. The experiment with the skewed MNIST dataset achieved an AUROC of 0.992 and an F1-score of 0.905. Finally, the combined imbalanced and skewed experiment achieved an AUROC of 0.990 and an F1-score of 0.891. The basic FL on in-hospital mortality using MIMIC-III data achieved an AUROC of 0.850 and an F1-score of 0.944, while the experiment with the imbalanced MIMIC-III dataset achieved an AUROC of 0.850 and an F1-score of 0.943. For ECG classification, the basic FL achieved an AUROC of 0.938 and an F1-score of 0.807, and the imbalanced ECG dataset achieved an AUROC of 0.943 and an F1-score of 0.807. CONCLUSIONS FL demonstrated comparative performance on different benchmark datasets. In addition, FL demonstrated reliable performance in cases where the distribution was imbalanced, skewed, and extreme, reflecting the real-life scenario in which data distributions from various hospitals are different. FL can achieve high performance while maintaining privacy protection because there is no requirement to centralize the data.


2022 ◽  
Vol 22 (1) ◽  
Author(s):  
Yinlong Ren ◽  
Luming Zhang ◽  
Fengshuo Xu ◽  
Didi Han ◽  
Shuai Zheng ◽  
...  

Abstract Background Lung infection is a common cause of sepsis, and patients with sepsis and lung infection are more ill and have a higher mortality rate than sepsis patients without lung infection. We constructed a nomogram prediction model to accurately evaluate the prognosis of and provide treatment advice for patients with sepsis and lung infection. Methods Data were retrospectively extracted from the Medical Information Mart for Intensive Care (MIMIC-III) open-source clinical database. The definition of Sepsis 3.0 [10] was used, which includes patients with life-threatening organ dysfunction caused by an uncontrolled host response to infection, and SOFA score ≥ 2. The nomogram prediction model was constructed from the training set using logistic regression analysis, and was then internally validated and underwent sensitivity analysis. Results The risk factors of age, lactate, temperature, oxygenation index, BUN, lactate, Glasgow Coma Score (GCS), liver disease, cancer, organ transplantation, Troponin T(TnT), neutrophil-to-lymphocyte ratio (NLR), and CRRT, MV, and vasopressor use were included in the nomogram. We compared our nomogram with the Sequential Organ Failure Assessment (SOFA) score and Simplified Acute Physiology Score II (SAPSII), the nomogram had better discrimination ability, with areas under the receiver operating characteristic curve (AUROC) of 0.743 (95% C.I.: 0.713–0.773) and 0.746 (95% C.I.: 0.699–0.790) in the training and validation sets, respectively. The calibration plot indicated that the nomogram was adequate for predicting the in-hospital mortality risk in both sets. The decision-curve analysis (DCA) of the nomogram revealed that it provided net benefits for clinical use over using the SOFA score and SAPSII in both sets. Conclusion Our new nomogram is a convenient tool for accurate predictions of in-hospital mortality among ICU patients with sepsis and lung infection. Treatment strategies that improve the factors considered relevant in the model could increase in-hospital survival for these ICU patients.


2021 ◽  
Author(s):  
Aiming Zhou ◽  
Shanshan Wu ◽  
Qin Chen ◽  
Lili Chen ◽  
Jingye Pan

Abstract Thrombocytopenia is common among sepsis patients. Platelet transfusion is frequently administered to increase platelet counts but its clinical impacts remain unclear in sepsis-induced thrombocytopenia. The goal of this study was to explore the association between platelet transfusion and mortality in patients with sepsis-induced thrombocytopenia based on the Medical Information Mart for Intensive Care (MIMIC) III database. In this study, we included 1733 patients with sepsis-induced thrombocytopenia, and these patients were divided into two groups: platelet transfusion group (PT group) and no platelet transfusion group (NPT group). Propensity-score matching was used to reduce the imbalance. We found that patients in the PT group had a higher in-hospital mortality as compared with the NPT group. Furthermore, in the subgroup of age (>60 years), gender (female), sequential organ failure assessment score (≤8), simplified acute physiology score (≤47), platelet count (>27/nL), congestive heart failure, platelet transfusion was associated with increased in-hospital mortality. However, there was no significant difference in the 90-day mortality and the length of ICU stays (LOS-ICU) between these two groups. All these results remain stable after adjustment for confounders and in the comparisons after propensity score matching. In conclusion, platelet transfusion was associated with increased in-hospital mortality in patients with sepsis-induced thrombocytopenia.


2021 ◽  
Author(s):  
Yinlong Ren ◽  
Luming Zhang ◽  
Fengshuo Xu ◽  
Didi Han ◽  
Shuai Zheng ◽  
...  

Abstract BackgroundLung infection is a common cause of sepsis, and patients with sepsis and lung infection are more ill and have a higher mortality rate than sepsis patients without lung infection. We constructed a nomogram prediction model to accurately evaluate the prognosis of and provide treatment advice for patients with sepsis and lung infection.MethodsData were retrospectively extracted from the Medical Information Mart for Intensive Care (MIMIC-III) open-source clinical database. The definition of Sepsis 3.0[10] was used, which includes patients with life-threatening organ dysfunction caused by an uncontrolled host response to infection, and SOFA score ≥2.The nomogram prediction model was constructed from the training set using logistic regression analysis, and was then internally validated and underwent sensitivity analysis.ResultsThe risk factors of age, lactate, temperature, oxygenation index, BUN, lactate, Glasgow Coma Score(GCS), liver disease, cancer, organ transplantation, Troponin T(TnT), neutrophil-to-lymphocyte ratio(NLR), and CRRT, MV, and vasopressor use were included in the nomogram. Compared with the Sequential Organ Failure Assessment (SOFA) score and Simplified Acute Physiology Score II (SAPSII), the nomogram had better discrimination ability, with areas under the receiver operating characteristic curve (AUROC) of 0.743 (95% confidence interval [95% CI]=0.713–0.773, p < 0.001) and 0.746 (95% CI=0.699–0.790, p < 0.001) in the training and validation sets, respectively. The calibration plot indicated that the nomogram was adequate for predicting the in-hospital mortality risk in both sets. The decision-curve analysis(DCA) of the nomogram revealed that it provided net benefits for clinical use over using the SOFA score and SAPSII in both sets. ConclusionOur new nomogram is a convenient tool for accurate predictions of in-hospital mortality among patients with sepsis and lung infection. Treatment strategies that improve the factors considered relevant in the model could increase in-hospital survival for these ICU patients.


Author(s):  
Xihua Huang ◽  
Zhenyu Liang ◽  
Tang Li ◽  
Yu Lingna ◽  
Wei Zhu ◽  
...  

Abstract Background To explore the influencing factors for in-hospital mortality in the neonatal intensive care unit (NICU) and to establish a predictive nomogram. Methods Neonatal data were extracted from the Medical Information Mart for Intensive Care III (MIMIC-III) database. Both univariate and multivariate logit binomial general linear models were used to analyse the factors influencing neonatal death. The area under the receiver operating characteristics (ROC) curve was used to assess the predictive model, which was visualized by a nomogram. Results A total of 1258 neonates from the NICU in the MIMIC-III database were eligible for the study, including 1194 surviving patients and 64 deaths. Multivariate analysis showed that red cell distribution width (RDW) (odds ratio [OR] 0.813, p=0.003) and total bilirubin (TBIL; OR 0.644, p&lt;0.001) had protective effects on neonatal in-hospital death, while lymphocytes (OR 1.205, p=0.025), arterial partial pressure of carbon dioxide (PaCO2; OR 1.294, p=0.016) and sequential organ failure assessment (SOFA) score (OR 1.483, p&lt;0.001) were its independent risk factors. Based on this, the area under the curve of this predictive model was up to 0.865 (95% confidence interval 0.813 to 0.917), which was also confirmed by a nomogram. Conclusions The nomogram constructed suggests that RDW, TBIL, lymphocytes, PaCO2 and SOFA score are all significant predictors for in-hospital mortality in the NICU.


2020 ◽  
Author(s):  
Joo Heung Yoon ◽  
Vincent Jeanselme ◽  
Artur Dubrawski ◽  
Marilyn Hravnak ◽  
Michael R. Pinsky ◽  
...  

Abstract Background. Even brief hypotension is associated with increased morbidity and mortality. We developed a machine learning model to predict the initial hypotension event among intensive care unit (ICU) patients, and designed an alert system for bedside implementation. Materials and Methods. From the Medical Information Mart for Intensive Care III (MIMIC-3) dataset minute-by-minute vital signs were extracted. A hypotension event was defined as at least 5 measurements within a 10-minute period of systolic blood pressure ≤ 90 mmHg and mean arterial pressure ≤ 60 mmHg. A random forest (RF) classifier was used to predict hypotension, and performance was measured with area under the receiver operating characteristic curve (AUROC) and area under the precision recall curve (AUPRC). Hypotension alerts were generated using risk score thresholds, then a stacked RF model and a lock-out time were applied for real-life implementation. Results. We identified 1307 subjects (1580 ICU stays) as the case (hypotension) group and 1619 subjects (2279 ICU stays) as the control group. The RF model showed AUROC of 0.93 and 0.88 at 15 and 60 minutes respectively before hypotension, and AUPRC of 0.77 at 60 minutes before. Risk score trajectories revealed 80% and > 60% of cases predicted at 15 and 60 minutes before the hypotension, respectively. The stacked model with 15-minute lock-out produced on average 0.79 alerts/subject/hour (sensitivity 92.4%). Conclusion. Clinically significant hypotension events in the ICU can be predicted at least 1 hour before the initial hypotension episode. Developing a high-sensitive and reliable practical alert system is feasible, with low rate of alerts.


2020 ◽  
Vol 27 (3) ◽  
pp. 407-418 ◽  
Author(s):  
Hannah L Weeks ◽  
Cole Beck ◽  
Elizabeth McNeer ◽  
Michael L Williams ◽  
Cosmin A Bejan ◽  
...  

Abstract Objective We developed medExtractR, a natural language processing system to extract medication information from clinical notes. Using a targeted approach, medExtractR focuses on individual drugs to facilitate creation of medication-specific research datasets from electronic health records. Materials and Methods Written using the R programming language, medExtractR combines lexicon dictionaries and regular expressions to identify relevant medication entities (eg, drug name, strength, frequency). MedExtractR was developed on notes from Vanderbilt University Medical Center, using medications prescribed with varying complexity. We evaluated medExtractR and compared it with 3 existing systems: MedEx, MedXN, and CLAMP (Clinical Language Annotation, Modeling, and Processing). We also demonstrated how medExtractR can be easily tuned for better performance on an outside dataset using the MIMIC-III (Medical Information Mart for Intensive Care III) database. Results On 50 test notes per development drug and 110 test notes for an additional drug, medExtractR achieved high overall performance (F-measures &gt;0.95), exceeding performance of the 3 existing systems across all drugs. MedExtractR achieved the highest F-measure for each individual entity, except drug name and dose amount for allopurinol. With tuning and customization, medExtractR achieved F-measures &gt;0.90 in the MIMIC-III dataset. Discussion The medExtractR system successfully extracted entities for medications of interest. High performance in entity-level extraction provides a strong foundation for developing robust research datasets for pharmacological research. When working with new datasets, medExtractR should be tuned on a small sample of notes before being broadly applied. Conclusions The medExtractR system achieved high performance extracting specific medications from clinical text, leading to higher-quality research datasets for drug-related studies than some existing general-purpose medication extraction tools.


2020 ◽  
Author(s):  
Yangjing Xue ◽  
Jinsheng Wang ◽  
Yangpei Peng ◽  
Kaiyu Huang ◽  
Lu Qian ◽  
...  

Abstract BackgroundAlthough milrinone has been widely used in daily clinical practice, its effect on survival in patients with cardiogenic shock (CS) is not known. The primary purpose of this study was to evaluate the effectiveness of milrinone on in hospital mortality in a large critical care cohort of patients with CS of various etiological causes.MethodsPatients with CS were identified from the Medical Information Mart for Intensive Care III (MIMIC-III) database. Propensity score matching (PSM) was used to account for the baseline differences in the probability to receive milrinone or not. Multivariate Cox regression model was employed to adjust for imbalance by including parameters and potential confounders.ResultsA total of 1068 critically ill patients with CS were enrolled for this analysis, including 161 in the milrinone group and 907 in the non-milrinone group. Multivariate Cox regression model results found milrinone was associated with a significantly decreased in hospital mortality in critically ill patients with CS (HR 0.61, 95% CI 0.45-0.83; P=0.001). The impact of milrinone on survival benefit in CS was remaining in patients with non-ACS, while it was not statistically significant in subgroup with ACS (HR 0.66, 95% CI 0.40-1.07; P=0.093). Similar results were replicated after PSM.ConclusionsOur study observed that milrinone was related with improved survival in patients with CS, but it was not associated with improved outcome in patients complicated with ACS. The results need to be verified in randomized controlled trials.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Didi Han ◽  
Fengshuo Xu ◽  
Chengzhuo Li ◽  
Luming Zhang ◽  
Rui Yang ◽  
...  

Background. Severe acute pancreatitis (SAP) can cause various complications. Septic shock is a relatively common and serious complication that causes uncontrolled systemic inflammatory response syndrome, which is one of the main causes of death. This study aimed to develop a nomogram for predicting the overall survival of SAP patients during the initial 24 hours following admission. Materials and Methods. All the data utilized in this study were obtained from the MIMIC-III (Medical Information Mart for Intensive Care III) database. The data were analyzed using multivariate Cox regression, and the performance of the proposed nomogram was evaluated based on Harrell’s concordance index (C-index) and the area under the receiver operating characteristic curve (AUC). The clinical value of the prediction model was tested using decision-curve analysis (DCA). The primary outcomes were 28-day, 60-day, and 90-day mortality rates. Results. The 850 patients included in the analysis comprised 595 in the training cohort and 255 in the validation cohort. The training cohort consisted of 353 (59.3%) males and 242 (40.7%) females with SAP. Multivariate Cox regression showed that weight, sex, insurance status, explicit sepsis, SAPSII score, Elixhauser score, bilirubin, anion gap, creatinine, hematocrit, hemoglobin, RDW, SPO2, and respiratory rate were independent prognostic factors for the survival of SAP patients admitted to an intensive care unit. The predicted values were compared using C-indexes, calibration plots, integrated discrimination improvement, net reclassification improvement, and DCA. Conclusions. We have identified some important demographic and laboratory parameters related to the prognosis of patients with SAP and have used them to establish a more accurate and convenient nomogram for evaluating their 28-day, 60-day, and 90-day mortality rates.


2021 ◽  
Vol 11 (10) ◽  
pp. 1004
Author(s):  
Hong-Jie Jhou ◽  
Po-Huang Chen ◽  
Li-Yu Yang ◽  
Shu-Hao Chang ◽  
Cho-Hao Lee

We aimed to investigate the association between the plasma anion gap (AG) and in-hospital mortality among patients with acute ischemic stroke (AIS). In total, 1236 AIS patients were enrolled using the Medical Information Mart for Intensive Care Database IV. Primary outcome was in-hospital mortality. The patients were divided into four groups according to AG category. The mean age and Charlson comorbidity index increased as the AG category increased. The fourth AG category was most related to the in-hospital mortality (hazards ratio (HR), 95% confidence interval (CI): 2.77, 1.60–4.71), even after adjusting for possible confounding variables (Model 1: HR, 95% CI: 3.37, 1.81–6.09; Model 2: HR, 95% CI: 3.57, 1.91–6.69). Moreover, intensive care unit mortality (p = 0.008) was higher in the highest AG category, but the intracranial hemorrhage (p = 0.071) did not associate with the plasma AG. The plasma AG had a satisfactory predictive ability for in-hospital mortality among AIS patients (areas under the receiver operating characteristic curve: 0.631). The plasma AG is an independent risk factor that can satisfactorily predict the in-hospital mortality among AIS patients.


Sign in / Sign up

Export Citation Format

Share Document