Development of a Natural Language Processing Pipeline for Calculating Colonoscopy Quality Indicators: Comparison of Manual Review and Natural Language Processing (Preprint)

Mapping Intimacies ◽

10.2196/preprints.35257 ◽

2021 ◽

Author(s):

Jung Ho Bae ◽

Hyun Wook Han ◽

Sun Young Yang ◽

Gyuseon Song ◽

Soonok Sa ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Quality Indicators ◽

Language Processing ◽

Detection Rate ◽

Clinical Information ◽

Processing Pipeline ◽

The Mean ◽

Colonoscopy Quality ◽

Manual Review

BACKGROUND Manual data extraction for colonoscopy quality indicators is time- and labor-intensive. Natural language processing (NLP), a computer-based linguistics and technique, offers the automation of reporting from unstructured free text reports to extract important clinical information. The application of information extraction using NLP includes identification of clinical information such as adverse events and clinical work optimization such as quality control and patient management. OBJECTIVE We developed a natural language processing pipeline to manage Korean–English colonoscopy reports and evaluated its performance on automatically assessing adenoma detection rate (ADR), sessile serrated lesion detection rate (SDR), and surveillance interval (SI). METHODS The NLP tool was developed using 2000 screening colonoscopy records (1425 pathology reports) at Seoul National University Hospital Gangnam Center. Tests were performed on another 1,000 colonoscopy records to compare a manual review (MR) by five human annotators and the NLP pipeline. Additionally, data from 54,562 colonoscopies of 12,264 patients (aged ≥50 years) from 2010 to 2019 were analyzed using the NLP pipeline for colonoscopy quality indicators. RESULTS The overall accuracy of the test dataset was 95.8% (958/1000) for NLP vs. 93.1% (931/1000) for MR (P=.008). The mean total ADR in the test set was 46.8% (468/1000) with NLP vs. 47.2% (472/1000) with MR. The mean total SDR was 6.4% (64/1000) with NLP vs. 6.5% (65/1000) with MR. Calculating the SI revealed a similar performance between both methods. The mean ADR and SDR of the 25 endoscopists in the 10-year dataset were 42.0% (881/2098) and 3.3% (69/2098), respectively, indicating wide individual variability (16.3% (263/1615)–56.2% (1014/1936) in ADR and 0.4% (6/1615)–6.6% (124/1876) in SDR). The SI recommendation suggested a large difference in ADR and SDR based on the endoscopist’s performance. CONCLUSIONS The NLP pipeline can accurately and automatically calculate ADR, SDR, and SI from a multi-language colonoscopy report. It could be an important tool for improving colonoscopy quality and clinical decision support. CLINICALTRIAL This study was approved by the Institutional Review Board of SNUH (IRB 1909-093-670).

Download Full-text

Development of a generalizable natural language processing pipeline to extract physician-reported pain from clinical reports: Generated using publicly-available datasets and tested on institutional clinical reports for cancer patients with bone metastases

Journal of Biomedical Informatics ◽

10.1016/j.jbi.2021.103864 ◽

2021 ◽

pp. 103864

Author(s):

Hossein Naseri ◽

Kamran Kafi ◽

Sonia Skamene ◽

Marwan Tolba ◽

Mame Daro Faye ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Bone Metastases ◽

Cancer Patients ◽

Language Processing ◽

Processing Pipeline

Download Full-text

Mo1110 Quality Improvement Natural Language Processing Colonoscopy Evaluation Tool (QUINCE): A Flexible, Portable Tool to Extract Pathology Results for Colonoscopy Quality Reporting

Gastroenterology ◽

10.1016/s0016-5085(16)32187-4 ◽

2016 ◽

Vol 150 (4) ◽

pp. S637 ◽

Cited By ~ 2

Author(s):

Andrew J. Gawron ◽

Jennifer A. Pacheco ◽

Bill Scuba ◽

Wendy Chapman ◽

Tonya Kaltenbach ◽

...

Keyword(s):

Quality Improvement ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Evaluation Tool ◽

Quality Reporting ◽

Colonoscopy Quality

Download Full-text

Triage and diagnosis of COVID-19 from medical social media (Preprint)

10.2196/preprints.30397 ◽

2021 ◽

Author(s):

Abul Hasan ◽

Mark Levene ◽

David Weston ◽

Renate Fromson ◽

Nicolas Koslover ◽

...

Keyword(s):

Machine Learning ◽

Social Media ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Learning Models ◽

Rule Based ◽

Additional Information ◽

Processing Pipeline ◽

Machine Learning Models

BACKGROUND The COVID-19 pandemic has created a pressing need for integrating information from disparate sources, in order to assist decision makers. Social media is important in this respect, however, to make sense of the textual information it provides and be able to automate the processing of large amounts of data, natural language processing methods are needed. Social media posts are often noisy, yet they may provide valuable insights regarding the severity and prevalence of the disease in the population. In particular, machine learning techniques for triage and diagnosis could allow for a better understanding of what social media may offer in this respect. OBJECTIVE This study aims to develop an end-to-end natural language processing pipeline for triage and diagnosis of COVID-19 from patient-authored social media posts, in order to provide researchers and other interested parties with additional information on the symptoms, severity and prevalence of the disease. METHODS The text processing pipeline first extracts COVID-19 symptoms and related concepts such as severity, duration, negations, and body parts from patients’ posts using conditional random fields. An unsupervised rule-based algorithm is then applied to establish relations between concepts in the next step of the pipeline. The extracted concepts and relations are subsequently used to construct two different vector representations of each post. These vectors are applied separately to build support vector machine learning models to triage patients into three categories and diagnose them for COVID-19. RESULTS We report that Macro- and Micro-averaged F_{1\ }scores in the range of 71-96% and 61-87%, respectively, for the triage and diagnosis of COVID-19, when the models are trained on human labelled data. Our experimental results indicate that similar performance can be achieved when the models are trained using predicted labels from concept extraction and rule-based classifiers, thus yielding end-to-end machine learning. Also, we highlight important features uncovered by our diagnostic machine learning models and compare them with the most frequent symptoms revealed in another COVID-19 dataset. In particular, we found that the most important features are not always the most frequent ones. CONCLUSIONS Our preliminary results show that it is possible to automatically triage and diagnose patients for COVID-19 from natural language narratives using a machine learning pipeline, in order to provide additional information on the severity and prevalence of the disease through the eyes of social media.

Download Full-text

Coding Neuroradiology Reports for the Northern Manhattan Stroke Study: A Comparison of Natural Language Processing and Manual Review

Computers and Biomedical Research ◽

10.1006/cbmr.1999.1535 ◽

2000 ◽

Vol 33 (1) ◽

pp. 1-10 ◽

Cited By ~ 40

Author(s):

Jacob S. Elkins ◽

Carol Friedman ◽

Bernadette Boden-Albala ◽

Ralph L. Sacco ◽

George Hripcsak

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Stroke Study ◽

Manual Review

Download Full-text

Natural Language Processing for Surveillance of Cervical and Anal Cancer and Precancer: Algorithm Development and Split-Validation Study (Preprint)

10.2196/preprints.20826 ◽

2020 ◽

Author(s):

Carlos R Oliveira ◽

Patrick Niccolai ◽

Anette Michelle Ortiz ◽

Sangini S Sheth ◽

Eugene D Shapiro ◽

...

Keyword(s):

Natural Language Processing ◽

Human Papillomavirus ◽

Natural Language ◽

Language Processing ◽

Medical Records ◽

Processing Algorithm ◽

Accurate Identification ◽

Pathology Reports ◽

Manual Review ◽

Natural Language Processing Algorithm

BACKGROUND Accurate identification of new diagnoses of human papillomavirus–associated cancers and precancers is an important step toward the development of strategies that optimize the use of human papillomavirus vaccines. The diagnosis of human papillomavirus cancers hinges on a histopathologic report, which is typically stored in electronic medical records as free-form, or unstructured, narrative text. Previous efforts to perform surveillance for human papillomavirus cancers have relied on the manual review of pathology reports to extract diagnostic information, a process that is both labor- and resource-intensive. Natural language processing can be used to automate the structuring and extraction of clinical data from unstructured narrative text in medical records and may provide a practical and effective method for identifying patients with vaccine-preventable human papillomavirus disease for surveillance and research. OBJECTIVE This study's objective was to develop and assess the accuracy of a natural language processing algorithm for the identification of individuals with cancer or precancer of the cervix and anus. METHODS A pipeline-based natural language processing algorithm was developed, which incorporated machine learning and rule-based methods to extract diagnostic elements from the narrative pathology reports. To test the algorithm’s classification accuracy, we used a split-validation study design. Full-length cervical and anal pathology reports were randomly selected from 4 clinical pathology laboratories. Two study team members, blinded to the classifications produced by the natural language processing algorithm, manually and independently reviewed all reports and classified them at the document level according to 2 domains (diagnosis and human papillomavirus testing results). Using the manual review as the gold standard, the algorithm’s performance was evaluated using standard measurements of accuracy, recall, precision, and F-measure. RESULTS The natural language processing algorithm’s performance was validated on 949 pathology reports. The algorithm demonstrated accurate identification of abnormal cytology, histology, and positive human papillomavirus tests with accuracies greater than 0.91. Precision was lowest for anal histology reports (0.87, 95% CI 0.59-0.98) and highest for cervical cytology (0.98, 95% CI 0.95-0.99). The natural language processing algorithm missed 2 out of the 15 abnormal anal histology reports, which led to a relatively low recall (0.68, 95% CI 0.43-0.87). CONCLUSIONS This study outlines the development and validation of a freely available and easily implementable natural language processing algorithm that can automate the extraction and classification of clinical data from cervical and anal cytology and histology.

Download Full-text

A natural language processing pipeline to advance the use of Twitter data for digital epidemiology of adverse pregnancy outcomes

Journal of Biomedical Informatics X ◽

10.1016/j.yjbinx.2020.100076 ◽

2020 ◽

Vol 8 ◽

pp. 100076 ◽

Cited By ~ 1

Author(s):

Ari Z. Klein ◽

Haitao Cai ◽

Davy Weissenbacher ◽

Lisa D. Levine ◽

Graciela Gonzalez-Hernandez

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Pregnancy Outcomes ◽

Adverse Pregnancy Outcomes ◽

Processing Pipeline ◽

Twitter Data ◽

Digital Epidemiology ◽

Adverse Pregnancy

Download Full-text

Natural Language Processing to Assess End-of-Life Quality Indicators in Breast Cancer Patients with Leptomeningeal Disease (SA528C)

Journal of Pain and Symptom Management ◽

10.1016/j.jpainsymman.2018.12.206 ◽

2019 ◽

Vol 57 (2) ◽

pp. 454-455

Author(s):

Kate Brizzi ◽

Charlotta Lindvall ◽

Sophia Zupanc

Keyword(s):

Breast Cancer ◽

Natural Language Processing ◽

Natural Language ◽

End Of Life ◽

Cancer Patients ◽

Quality Indicators ◽

Language Processing ◽

Life Quality ◽

Leptomeningeal Disease ◽

Breast Cancer Patients

Download Full-text

Use of Natural Language Processing to Translate Clinical Information from a Database of 889,921 Chest Radiographic Reports

Radiology ◽

10.1148/radiol.2241011118 ◽

2002 ◽

Vol 224 (1) ◽

pp. 157-163 ◽

Cited By ~ 115

Author(s):

George Hripcsak ◽

John H. M. Austin ◽

Philip O. Alderson ◽

Carol Friedman

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Clinical Information

Download Full-text

Extracting clinical information from free-text of pathology and operation notes via Chinese natural language processing

2010 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW) ◽

10.1109/bibmw.2010.5703867 ◽

2010 ◽

Cited By ~ 1

Author(s):

Qiang Zeng ◽

Xiaoyan Zhang ◽

Zuofeng Li ◽

Lei Liu ◽

Weide Zhang

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Clinical Information ◽

Free Text

Download Full-text

166 Natural Language Processing (NLP) As an Alternative to Manual Reporting of Colonoscopy Quality Metrics

Gastrointestinal Endoscopy ◽

10.1016/j.gie.2014.02.048 ◽

2014 ◽

Vol 79 (5) ◽

pp. AB116-AB117 ◽

Cited By ~ 2

Author(s):

Gottumukkala S. Raju ◽

William a. Ross ◽

Phillip Lum ◽

Patrick M. Lynch ◽

Rebecca S. Slack ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Quality Metrics ◽

Colonoscopy Quality ◽

Manual Reporting

Download Full-text