Development of a Natural Language Processing Pipeline for Calculating Colonoscopy Quality Indicators: Comparison of Manual Review and Natural Language Processing (Preprint)
BACKGROUND Manual data extraction for colonoscopy quality indicators is time- and labor-intensive. Natural language processing (NLP), a computer-based linguistics and technique, offers the automation of reporting from unstructured free text reports to extract important clinical information. The application of information extraction using NLP includes identification of clinical information such as adverse events and clinical work optimization such as quality control and patient management. OBJECTIVE We developed a natural language processing pipeline to manage Korean–English colonoscopy reports and evaluated its performance on automatically assessing adenoma detection rate (ADR), sessile serrated lesion detection rate (SDR), and surveillance interval (SI). METHODS The NLP tool was developed using 2000 screening colonoscopy records (1425 pathology reports) at Seoul National University Hospital Gangnam Center. Tests were performed on another 1,000 colonoscopy records to compare a manual review (MR) by five human annotators and the NLP pipeline. Additionally, data from 54,562 colonoscopies of 12,264 patients (aged ≥50 years) from 2010 to 2019 were analyzed using the NLP pipeline for colonoscopy quality indicators. RESULTS The overall accuracy of the test dataset was 95.8% (958/1000) for NLP vs. 93.1% (931/1000) for MR (P=.008). The mean total ADR in the test set was 46.8% (468/1000) with NLP vs. 47.2% (472/1000) with MR. The mean total SDR was 6.4% (64/1000) with NLP vs. 6.5% (65/1000) with MR. Calculating the SI revealed a similar performance between both methods. The mean ADR and SDR of the 25 endoscopists in the 10-year dataset were 42.0% (881/2098) and 3.3% (69/2098), respectively, indicating wide individual variability (16.3% (263/1615)–56.2% (1014/1936) in ADR and 0.4% (6/1615)–6.6% (124/1876) in SDR). The SI recommendation suggested a large difference in ADR and SDR based on the endoscopist’s performance. CONCLUSIONS The NLP pipeline can accurately and automatically calculate ADR, SDR, and SI from a multi-language colonoscopy report. It could be an important tool for improving colonoscopy quality and clinical decision support. CLINICALTRIAL This study was approved by the Institutional Review Board of SNUH (IRB 1909-093-670).