Application of Machine Learning Algorithms to Depression Screening and Attempt at Pattern Extraction of Patient-Reported Outcomes that Negatively Affect Classification Accuracy (Preprint)
BACKGROUND Smartphone applications have recently been used as a breakthrough technology for monitoring mental health conditions in cancer outpatient settings. However, the use of electronic patient-reported outcomes (ePROs) on mental conditions through smartphone applications raises new concerns, which includes the question of the accuracy of depression screening. Thus, research is essential for improving the depression-screening performance. OBJECTIVE This study aims to (1) test whether deep-learning-based algorithms can overcome the limitations of traditional statistical methods in terms of depression screening accuracy. In addition, the study aims to (2) explore ePRO patterns that adversely affect depression screening accuracy. METHODS As a deep learning-based algorithm, a feedforward neural network algorithm was used. As a traditional statistical method, a random intercept logistic regression was employed. To explore the ePRO patterns that negatively impact model accuracy, mental fluctuations, missing data, and compounding effects between mental fluctuations and missing data were tested. The performances of the algorithms and the effects of the ePRO patterns were measured through the receiver operating characteristic comparison test. RESULTS The results of the study show that the performance of the deep-learning-based models was superior to that of the traditional statistical approach. The study found that mental fluctuations statistically reduced the accuracy of depression-screening models. A weak association between ePRO omissions and screening accuracy was found. Moreover, the compounding effects that had a negative effect on the depression screening accuracy were statistically significant. CONCLUSIONS Although well-trained deep-learning-based models exhibit excellent performance, they still have some limitations. Thus, it is very important to focus on data quality to predict health outcomes when using data that is difficult to quantify, such as mental conditions.