Clinical Term Normalization Using Learned Edit Patterns and Subconcept Matching (Preprint)
BACKGROUND Clinical terms mentioned in clinical text are often not in their standardized forms as listed in clinical terminologies due to linguistic and stylistic variations. However, many downstream automated applications require clinical terms mapped to their corresponding concepts in clinical terminologies thus necessitating the task of clinical term normalization. OBJECTIVE In this paper, a system for clinical term normalization is presented which utilizes edit patterns to convert clinical terms into their normalized forms. METHODS The edit patterns are automatically learned from UMLS as well as from the given training data. The edit patterns are generalized sequences of edits which are derived from edit distance computations. The edit patterns are both character-based as well as word-based and are learned separately for different semantic types. Besides these edit patterns, the system also normalizes clinical terms through the subconcepts mentioned in them. RESULTS The system was evaluated on the MCN corpus as part of the 2019 n2c2 Track 3 shared task of clinical term normalization. It obtained 80.79% accuracy on the standard test data. The paper includes ablation studies to evaluate contributions of different components of the system. A challenging part of the task was disambiguation when a clinical term could be normalized to multiple concepts. CONCLUSIONS The learned edit patterns led the system to perform well on the normalization task. Given that the system is based on patterns, it is human-interpretable and is also capable of giving insights about common variations of clinical terms mentioned in clinical text that are different from their standardized forms. CLINICALTRIAL