Exploring Patient Needs in Online Health Communities Using Text Mining--Taking Diabetes and Depression as Examples (Preprint)
BACKGROUND Online Health Community (OHC) refers to a forum where patients, their family members, doctors and caregivers communicate with each other. Patients who participate in OHCs can obtain benefits for disease treatments and health management, so identifying the categories of patient needs and how they are satisfied are significant to determining theories of patient demand and community construction. OBJECTIVE (1) Explore the needs of patients in the Internet environment. (2) Distinguish the similarities and differences of patient needs among OHCs of different types and concerning different diseases. (3) Proposed a method for automatically identifying patient demands in Internet environments. METHODS This study used a combination of manual annotation and computer-aided method to mine value of 9936 posts collected from four OHCs in China. On one hand, we recruited 7 diabetes or depression medical experts to label text according to a theoretical framework, forming patient need theory in Internet environments, which is designed for the first two research goals. On the other hand, based on the corpus constructed by manual annotation, this research used Natural Language Processing (NLP) and Machine Learning (ML) to train a model for automatically identifying patient demands, which is planned to reach the third research purpose. RESULTS According to statistical results, the proportion of posts related to patient needs in OHCs was approximately 91%, and posts concerned with Emotional Support (18%), Information (28%) and Socialization (44%) needs were the top three most prevalent categories. However, when OHCs were divided according to user composition and disease type, patient needs were diverse: the chief demand was Socialization in Patient Interaction OHCs (65%), Diabetes OHCs (50%), and Depression OHCs (69%), while Information (96%) was the chief demand in Patient-Doctor Interaction OHCs. A model was trained to identify patient needs taking Linguistic Features (LF) and Category Keyword Features (CKF) as input and Random Forest as the classifier, of which the F1 value was higher than 0.80 on test set. CONCLUSIONS Patient needs in the Internet environment mainly include Emotional Support needs, Information needs and Socialization needs. Differences in community type and disease type can lead to diverse patient needs in OHCs. It is practical to use computer-aided methods to identify patient needs in OHCs automatically.