Using text mining techniques to identify healthcare providers at risk: an exploratory study (Preprint)
BACKGROUND Regulatory bodies such as healthcare inspectorates can identify risks of healthcare providers by analyzing patient complaints. Text mining techniques (automatic text analysis based on machine learning), might help by identifying specific patterns and signals for risks on quality and safety issues. OBJECTIVE The aim of this study was to explore whether text mining techniques might be used to identify healthcare providers at risk. METHODS We performed an exploratory study on a complaints database of the Dutch Health and Youth Care Inspectorate with more than 22000 written complaints. We studied a range of supervised machine learning techniques to automatically determine the severity of incoming complaints. We investigated several features based on the complaints’ content, including sentiment analysis, to decide which were helpful for severity prediction. Finally, we took the list of health care providers and their organization-specific complaints to determine the average severity of complaints per organization. We performed a keyword analysis in order to give the Inspectorate insight in the patterns and severity per organization. RESULTS The data preparation and preprocessing were time-consuming one-off costs, mainly because we had to create a safe and efficient digital research environment. A straightforward text classification approach using a bag-of-words feature representation worked best for severity prediction. The usage of sentiment analysis for severity prediction was not helpful. Finally, we produced a list of n-grams of healthcare providers with the most complaints to inform the Inspectorate about the specific combination of words for these organizations. CONCLUSIONS Text mining techniques can support inspectorates with fully automatic analysis of complaints. They can give insights in patterns, detect possible blind spots, or support prioritizing follow-up supervision activities by sorting complaints on severity per organization or per sector. An appropriate data science and ICT infrastructure is crucial and indispensable for applied text mining.