Development of a Natural Language Processing Pipeline for Calculating Colonoscopy Quality Indicators: Comparison of Manual Review and Natural Language Processing (Preprint)

2021 ◽  
Author(s):  
Jung Ho Bae ◽  
Hyun Wook Han ◽  
Sun Young Yang ◽  
Gyuseon Song ◽  
Soonok Sa ◽  
...  

BACKGROUND Manual data extraction for colonoscopy quality indicators is time- and labor-intensive. Natural language processing (NLP), a computer-based linguistics and technique, offers the automation of reporting from unstructured free text reports to extract important clinical information. The application of information extraction using NLP includes identification of clinical information such as adverse events and clinical work optimization such as quality control and patient management. OBJECTIVE We developed a natural language processing pipeline to manage Korean–English colonoscopy reports and evaluated its performance on automatically assessing adenoma detection rate (ADR), sessile serrated lesion detection rate (SDR), and surveillance interval (SI). METHODS The NLP tool was developed using 2000 screening colonoscopy records (1425 pathology reports) at Seoul National University Hospital Gangnam Center. Tests were performed on another 1,000 colonoscopy records to compare a manual review (MR) by five human annotators and the NLP pipeline. Additionally, data from 54,562 colonoscopies of 12,264 patients (aged ≥50 years) from 2010 to 2019 were analyzed using the NLP pipeline for colonoscopy quality indicators. RESULTS The overall accuracy of the test dataset was 95.8% (958/1000) for NLP vs. 93.1% (931/1000) for MR (P=.008). The mean total ADR in the test set was 46.8% (468/1000) with NLP vs. 47.2% (472/1000) with MR. The mean total SDR was 6.4% (64/1000) with NLP vs. 6.5% (65/1000) with MR. Calculating the SI revealed a similar performance between both methods. The mean ADR and SDR of the 25 endoscopists in the 10-year dataset were 42.0% (881/2098) and 3.3% (69/2098), respectively, indicating wide individual variability (16.3% (263/1615)–56.2% (1014/1936) in ADR and 0.4% (6/1615)–6.6% (124/1876) in SDR). The SI recommendation suggested a large difference in ADR and SDR based on the endoscopist’s performance. CONCLUSIONS The NLP pipeline can accurately and automatically calculate ADR, SDR, and SI from a multi-language colonoscopy report. It could be an important tool for improving colonoscopy quality and clinical decision support. CLINICALTRIAL This study was approved by the Institutional Review Board of SNUH (IRB 1909-093-670).

2021 ◽  
Author(s):  
Abul Hasan ◽  
Mark Levene ◽  
David Weston ◽  
Renate Fromson ◽  
Nicolas Koslover ◽  
...  

BACKGROUND The COVID-19 pandemic has created a pressing need for integrating information from disparate sources, in order to assist decision makers. Social media is important in this respect, however, to make sense of the textual information it provides and be able to automate the processing of large amounts of data, natural language processing methods are needed. Social media posts are often noisy, yet they may provide valuable insights regarding the severity and prevalence of the disease in the population. In particular, machine learning techniques for triage and diagnosis could allow for a better understanding of what social media may offer in this respect. OBJECTIVE This study aims to develop an end-to-end natural language processing pipeline for triage and diagnosis of COVID-19 from patient-authored social media posts, in order to provide researchers and other interested parties with additional information on the symptoms, severity and prevalence of the disease. METHODS The text processing pipeline first extracts COVID-19 symptoms and related concepts such as severity, duration, negations, and body parts from patients’ posts using conditional random fields. An unsupervised rule-based algorithm is then applied to establish relations between concepts in the next step of the pipeline. The extracted concepts and relations are subsequently used to construct two different vector representations of each post. These vectors are applied separately to build support vector machine learning models to triage patients into three categories and diagnose them for COVID-19. RESULTS We report that Macro- and Micro-averaged F_{1\ }scores in the range of 71-96% and 61-87%, respectively, for the triage and diagnosis of COVID-19, when the models are trained on human labelled data. Our experimental results indicate that similar performance can be achieved when the models are trained using predicted labels from concept extraction and rule-based classifiers, thus yielding end-to-end machine learning. Also, we highlight important features uncovered by our diagnostic machine learning models and compare them with the most frequent symptoms revealed in another COVID-19 dataset. In particular, we found that the most important features are not always the most frequent ones. CONCLUSIONS Our preliminary results show that it is possible to automatically triage and diagnose patients for COVID-19 from natural language narratives using a machine learning pipeline, in order to provide additional information on the severity and prevalence of the disease through the eyes of social media.


2000 ◽  
Vol 33 (1) ◽  
pp. 1-10 ◽  
Author(s):  
Jacob S. Elkins ◽  
Carol Friedman ◽  
Bernadette Boden-Albala ◽  
Ralph L. Sacco ◽  
George Hripcsak

2020 ◽  
Author(s):  
Carlos R Oliveira ◽  
Patrick Niccolai ◽  
Anette Michelle Ortiz ◽  
Sangini S Sheth ◽  
Eugene D Shapiro ◽  
...  

BACKGROUND Accurate identification of new diagnoses of human papillomavirus–associated cancers and precancers is an important step toward the development of strategies that optimize the use of human papillomavirus vaccines. The diagnosis of human papillomavirus cancers hinges on a histopathologic report, which is typically stored in electronic medical records as free-form, or unstructured, narrative text. Previous efforts to perform surveillance for human papillomavirus cancers have relied on the manual review of pathology reports to extract diagnostic information, a process that is both labor- and resource-intensive. Natural language processing can be used to automate the structuring and extraction of clinical data from unstructured narrative text in medical records and may provide a practical and effective method for identifying patients with vaccine-preventable human papillomavirus disease for surveillance and research. OBJECTIVE This study's objective was to develop and assess the accuracy of a natural language processing algorithm for the identification of individuals with cancer or precancer of the cervix and anus. METHODS A pipeline-based natural language processing algorithm was developed, which incorporated machine learning and rule-based methods to extract diagnostic elements from the narrative pathology reports. To test the algorithm’s classification accuracy, we used a split-validation study design. Full-length cervical and anal pathology reports were randomly selected from 4 clinical pathology laboratories. Two study team members, blinded to the classifications produced by the natural language processing algorithm, manually and independently reviewed all reports and classified them at the document level according to 2 domains (diagnosis and human papillomavirus testing results). Using the manual review as the gold standard, the algorithm’s performance was evaluated using standard measurements of accuracy, recall, precision, and F-measure. RESULTS The natural language processing algorithm’s performance was validated on 949 pathology reports. The algorithm demonstrated accurate identification of abnormal cytology, histology, and positive human papillomavirus tests with accuracies greater than 0.91. Precision was lowest for anal histology reports (0.87, 95% CI 0.59-0.98) and highest for cervical cytology (0.98, 95% CI 0.95-0.99). The natural language processing algorithm missed 2 out of the 15 abnormal anal histology reports, which led to a relatively low recall (0.68, 95% CI 0.43-0.87). CONCLUSIONS This study outlines the development and validation of a freely available and easily implementable natural language processing algorithm that can automate the extraction and classification of clinical data from cervical and anal cytology and histology.


Radiology ◽  
2002 ◽  
Vol 224 (1) ◽  
pp. 157-163 ◽  
Author(s):  
George Hripcsak ◽  
John H. M. Austin ◽  
Philip O. Alderson ◽  
Carol Friedman

2014 ◽  
Vol 79 (5) ◽  
pp. AB116-AB117 ◽  
Author(s):  
Gottumukkala S. Raju ◽  
William a. Ross ◽  
Phillip Lum ◽  
Patrick M. Lynch ◽  
Rebecca S. Slack ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document