A rule-based grapheme-phone converter and stress determination for Brazilian Portuguese natural language processing

Natural language processing versus rule-based text analysis: Comparing BERT score and readability indices to predict crowdfunding outcomes

Journal of Business Venturing Insights ◽

10.1016/j.jbvi.2021.e00276 ◽

2021 ◽

Vol 16 ◽

pp. e00276

Author(s):

C.S. Richard Chan ◽

Charuta Pethe ◽

Steven Skiena

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Text Analysis ◽

Rule Based

Download Full-text

Triage and diagnosis of COVID-19 from medical social media (Preprint)

10.2196/preprints.30397 ◽

2021 ◽

Author(s):

Abul Hasan ◽

Mark Levene ◽

David Weston ◽

Renate Fromson ◽

Nicolas Koslover ◽

...

Keyword(s):

Machine Learning ◽

Social Media ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Learning Models ◽

Rule Based ◽

Additional Information ◽

Processing Pipeline ◽

Machine Learning Models

BACKGROUND The COVID-19 pandemic has created a pressing need for integrating information from disparate sources, in order to assist decision makers. Social media is important in this respect, however, to make sense of the textual information it provides and be able to automate the processing of large amounts of data, natural language processing methods are needed. Social media posts are often noisy, yet they may provide valuable insights regarding the severity and prevalence of the disease in the population. In particular, machine learning techniques for triage and diagnosis could allow for a better understanding of what social media may offer in this respect. OBJECTIVE This study aims to develop an end-to-end natural language processing pipeline for triage and diagnosis of COVID-19 from patient-authored social media posts, in order to provide researchers and other interested parties with additional information on the symptoms, severity and prevalence of the disease. METHODS The text processing pipeline first extracts COVID-19 symptoms and related concepts such as severity, duration, negations, and body parts from patients’ posts using conditional random fields. An unsupervised rule-based algorithm is then applied to establish relations between concepts in the next step of the pipeline. The extracted concepts and relations are subsequently used to construct two different vector representations of each post. These vectors are applied separately to build support vector machine learning models to triage patients into three categories and diagnose them for COVID-19. RESULTS We report that Macro- and Micro-averaged F_{1\ }scores in the range of 71-96% and 61-87%, respectively, for the triage and diagnosis of COVID-19, when the models are trained on human labelled data. Our experimental results indicate that similar performance can be achieved when the models are trained using predicted labels from concept extraction and rule-based classifiers, thus yielding end-to-end machine learning. Also, we highlight important features uncovered by our diagnostic machine learning models and compare them with the most frequent symptoms revealed in another COVID-19 dataset. In particular, we found that the most important features are not always the most frequent ones. CONCLUSIONS Our preliminary results show that it is possible to automatically triage and diagnose patients for COVID-19 from natural language narratives using a machine learning pipeline, in order to provide additional information on the severity and prevalence of the disease through the eyes of social media.

Download Full-text

Description of a Rule-based System for the i2b2 Challenge in Natural Language Processing for Clinical Data

Journal of the American Medical Informatics Association ◽

10.1197/jamia.m3083 ◽

2009 ◽

Vol 16 (4) ◽

pp. 571-575 ◽

Cited By ~ 10

Author(s):

L. C. Childs ◽

R. Enelow ◽

L. Simonsen ◽

N. H. Heintzelman ◽

K. M. Kowalski ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Clinical Data ◽

Rule Based ◽

Rule Based System

Download Full-text

Clinical trial cohort selection based on multi-level rule-based natural language processing system

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocz109 ◽

2019 ◽

Vol 26 (11) ◽

pp. 1218-1226 ◽

Cited By ~ 7

Author(s):

Long Chen ◽

Yu Gu ◽

Xin Ji ◽

Chao Lou ◽

Zhiyong Sun ◽

...

Keyword(s):

Clinical Trials ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Rule Based ◽

Language System ◽

Unified Medical Language System ◽

Rule Based System ◽

Medical Language ◽

Cohort Selection

Abstract Objective Identifying patients who meet selection criteria for clinical trials is typically challenging and time-consuming. In this article, we describe our clinical natural language processing (NLP) system to automatically assess patients’ eligibility based on their longitudinal medical records. This work was part of the 2018 National NLP Clinical Challenges (n2c2) Shared-Task and Workshop on Cohort Selection for Clinical Trials. Materials and Methods The authors developed an integrated rule-based clinical NLP system which employs a generic rule-based framework plugged in with lexical-, syntactic- and meta-level, task-specific knowledge inputs. In addition, the authors also implemented and evaluated a general clinical NLP (cNLP) system which is built with the Unified Medical Language System and Unstructured Information Management Architecture. Results and Discussion The systems were evaluated as part of the 2018 n2c2-1 challenge, and authors’ rule-based system obtained an F-measure of 0.9028, ranking fourth at the challenge and had less than 1% difference from the best system. While the general cNLP system didn’t achieve performance as good as the rule-based system, it did establish its own advantages and potential in extracting clinical concepts. Conclusion Our results indicate that a well-designed rule-based clinical NLP system is capable of achieving good performance on cohort selection even with a small training data set. In addition, the investigation of a Unified Medical Language System-based general cNLP system suggests that a hybrid system combining these 2 approaches is promising to surpass the state-of-the-art performance.

Download Full-text

An approach to natural language processing in the rule-based expert system

Proceedings of the 1990 ACM annual conference on Cooperation - CSC '90 ◽

10.1145/100348.100381 ◽

1990 ◽

Cited By ~ 1

Author(s):

Jan Kazimierczak

Keyword(s):

Natural Language Processing ◽

Expert System ◽

Natural Language ◽

Language Processing ◽

Rule Based

Download Full-text

Assessment of Natural Language Processing Methods for Ascertaining the Expanded Disability Status Scale Score From the Electronic Health Records of Patients With Multiple Sclerosis: Algorithm Development and Validation Study

JMIR Medical Informatics ◽

10.2196/25157 ◽

2022 ◽

Vol 10 (1) ◽

pp. e25157

Author(s):

Zhen Yang ◽

Chloé Pou-Prom ◽

Ashley Jones ◽

Michaelia Banning ◽

David Dai ◽

...

Keyword(s):

Natural Language Processing ◽

Electronic Health Records ◽

Natural Language ◽

Language Processing ◽

Expanded Disability Status Scale ◽

Health Records ◽

Rule Based ◽

Disability Status ◽

Edss Score ◽

Electronic Health

Background The Expanded Disability Status Scale (EDSS) score is a widely used measure to monitor disability progression in people with multiple sclerosis (MS). However, extracting and deriving the EDSS score from unstructured electronic health records can be time-consuming. Objective We aimed to compare rule-based and deep learning natural language processing algorithms for detecting and predicting the total EDSS score and EDSS functional system subscores from the electronic health records of patients with MS. Methods We studied 17,452 electronic health records of 4906 MS patients followed at one of Canada’s largest MS clinics between June 2015 and July 2019. We randomly divided the records into training (80%) and test (20%) data sets, and compared the performance characteristics of 3 natural language processing models. First, we applied a rule-based approach, extracting the EDSS score from sentences containing the keyword “EDSS.” Next, we trained a convolutional neural network (CNN) model to predict the 19 half-step increments of the EDSS score. Finally, we used a combined rule-based–CNN model. For each approach, we determined the accuracy, precision, recall, and F-score compared with the reference standard, which was manually labeled EDSS scores in the clinic database. Results Overall, the combined keyword-CNN model demonstrated the best performance, with accuracy, precision, recall, and an F-score of 0.90, 0.83, 0.83, and 0.83 respectively. Respective figures for the rule-based and CNN models individually were 0.57, 0.91, 0.65, and 0.70, and 0.86, 0.70, 0.70, and 0.70. Because of missing data, the model performance for EDSS subscores was lower than that for the total EDSS score. Performance improved when considering notes with known values of the EDSS subscores. Conclusions A combined keyword-CNN natural language processing model can extract and accurately predict EDSS scores from patient records. This approach can be automated for efficient information extraction in clinical and research settings.

Download Full-text

Machine translation using natural language processing

MATEC Web of Conferences ◽

10.1051/matecconf/201927702004 ◽

2019 ◽

Vol 277 ◽

pp. 02004

Author(s):

Middi Venkata Sai Rishita ◽

Middi Appala Raju ◽

Tanvir Ahmed Harris

Keyword(s):

Neural Network ◽

Neural Networks ◽

Natural Language Processing ◽

Natural Language ◽

Machine Translation ◽

Language Processing ◽

Deep Neural Network ◽

English Text ◽

French Translation ◽

Rule Based

Machine Translation is the translation of text or speech by a computer with no human involvement. It is a popular topic in research with different methods being created, like rule-based, statistical and examplebased machine translation. Neural networks have made a leap forward to machine translation. This paper discusses the building of a deep neural network that functions as a part of end-to-end translation pipeline. The completed pipeline would accept English text as input and return the French Translation. The project has three main parts which are preprocessing, creation of models and Running the model on English Text.

Download Full-text

Recent advances in processing negation

Natural Language Engineering ◽

10.1017/s1351324920000534 ◽

2020 ◽

pp. 1-10

Author(s):

Roser Morante ◽

Eduardo Blanco

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Future Directions ◽

Rule Based ◽

Computational Approaches ◽

Recent Advances ◽

Linguistic Phenomenon

Abstract Negation is a complex linguistic phenomenon present in all human languages. It can be seen as an operator that transforms an expression into another expression whose meaning is in some way opposed to the original expression. In this article, we survey previous work on negation with an emphasis on computational approaches. We start defining negation and two important concepts: scope and focus of negation. Then, we survey work in natural language processing that considers negation primarily as a means to improve the results in some task. We also provide information about corpora containing negation annotations in English and other languages, which usually include a combination of annotations of negation cues, scopes, foci, and negated events. We continue the survey with a description of automated approaches to process negation, ranging from early rule-based systems to systems built with traditional machine learning and neural networks. Finally, we conclude with some reflections on current progress and future directions.

Download Full-text

Penggunaan Natural Language Processing Pada Chatbot Untuk Media Informasi Pertanian

Indonesian Journal of Applied Informatics ◽

10.20961/ijai.v4i2.38688 ◽

2020 ◽

Vol 4 (2) ◽

pp. 55

Author(s):

Rifa Khoirunisa

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Rule Based

<p>Pertanian merupakan pemanfaatan sumber daya yang dilakukan oleh manusia untuk menghasilkan bahan baku pangan serta mengelola lingkungan di sekitarnya. Dengan berkembangnya industri 4.0 sekarang ini, banyak bidang yang menggunakan kecerdasan buatan termasuk pertanian. Penelitian ini bertujuan untuk membantu petani mengetahui beberapa informasi penting yang berhubungan dengan pertanian berdasarkan relevansi waktu yang berlaku. Dalam penelitian ini digunakan konsep Natural Language Processing. Natural Language Processing adalah menganalisis teks dengan cara terkomputerasi, pada penelitian ini NLP digunakan untuk mencari kata dasar pada kalimat yang dimasukkan oleh user. Proses menjawab pertanyaan yang dimasukkan oleh user menggunakan metode pharsing kalimat, kemudian metode lemmatization untuk mencari kata dasar, lalu dari kata dasar menggunakan rule based untuk mencari jawaban yang sesuai dengan pertanyaan berdasarkan kata dasar. Hasil dari penelitian ini adalah prototype sistem yang dapat digunakan oleh petani untuk mengetahui informasi mengenai pertanian sesuai dengan relevansi waktu yang berlaku, contoh : harga bibit, pemberian pupuk, daerah panen, harga panen. Dari hasil pengujian didapatkan 86,12% aplikasi dapat mempharsing kalimat, 70% aplikasi dapat menjawab relevansi waktu sesuai permintaan user dan 73,33% dapat menampilkan jawaban yang sesuai dengan pertanyaan user.</p>

Download Full-text