Multicenter validation of a machine learning algorithm for 48 hour all-cause mortality prediction
AbstractPurposeThis study evaluates a machine-learning-based mortality prediction tool.Materials and MethodsWe conducted a retrospective study with data drawn from three academic health centers. Inpatients of at least 18 years of age and with at least one observation of each vital sign were included. Predictions were made at 12, 24, and 48 hours before death. Models fit to training data from each institution were evaluated on hold-out test data from the same institution and data from the remaining institutions. Predictions were compared to those of qSOFA and MEWS using area under the receiver operating characteristic curve (AUROC).ResultsFor training and testing on data from a single institution, machine learning predictions averaged AUROCs of 0.97, 0.96, and 0.95 across institutional test sets for 12-, 24-, and 48-hour predictions, respectively. When trained and tested on data from different hospitals, the algorithm achieved AUROC up to 0.95, 0.93, and 0.91, for 12-, 24-, and 48-hour predictions, respectively. MEWS and qSOFA had average 48-hour AUROCs of 0.86 and 0.82, respectively.ConclusionThis algorithm may help identify patients in need of increased levels of clinical care.