Emerging Technologies of Natural Language-Enabled Chatbots: A Review and Trend Forecast Using Intelligent Ontology Extraction and Patent Analytics

Complexity ◽

10.1155/2021/5511866 ◽

2021 ◽

Vol 2021 ◽

pp. 1-26

Author(s):

Min-Hua Chao ◽

Amy J. C. Trappey ◽

Chun-Ting Wu

Keyword(s):

Natural Language ◽

Language Processing ◽

Latent Dirichlet Allocation ◽

Emerging Technologies ◽

Digital Transformation ◽

Clustering Method ◽

Critical Part ◽

Ontology Extraction ◽

Key Topics ◽

User Friendly

Natural language processing (NLP) is a critical part of the digital transformation. NLP enables user-friendly interactions between machine and human by making computers understand human languages. Intelligent chatbot is an essential application of NLP to allow understanding of users’ utterance and responding in understandable sentences for specific applications simulating human-to-human conversations and interactions for problem solving or Q&As. This research studies emerging technologies for NLP-enabled intelligent chatbot development using a systematic patent analytic approach. Some intelligent text-mining techniques are applied, including document term frequency analysis for key terminology extractions, clustering method for identifying the subdomains, and Latent Dirichlet Allocation for finding the key topics of patent set. This research utilizes the Derwent Innovation database as the main source for global intelligent chatbot patent retrievals.

Download Full-text

NLP-based personal learning assistant for school education

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v11i5.pp4522-4530 ◽

2021 ◽

Vol 11 (5) ◽

pp. 4522

Author(s):

Ann Neethu Mathew ◽

Rohini V. ◽

Joy Paulose

Keyword(s):

Natural Language ◽

Language Processing ◽

Intelligent Tutoring System ◽

Digital Transformation ◽

Educational Systems ◽

Retrieval Performance ◽

Language Understanding ◽

Personal Learning ◽

Computer Based ◽

The Subject

Computer-based knowledge and computation systems are becoming major sources of leverage for multiple industry segments. Hence, educational systems and learning processes across the world are on the cusp of a major digital transformation. This paper seeks to explore the concept of an artificial intelligence and natural language processing (NLP) based intelligent tutoring system (ITS) in the context of computer education in primary and secondary schools. One of the components of an ITS is a learning assistant, which can enable students to seek assistance as and when they need, wherever they are. As part of this research, a pilot prototype chatbot was developed, to serve as a learning assistant for the subject Scratch (Scratch is a graphical utility used to teach school children the concepts of programming). By the use of an open source natural language understanding (NLU) or NLP library, and a slackbased UI, student queries were input to the chatbot, to get the sought explanation as the answer. Through a two-stage testing process, the chatbot’s NLP extraction and information retrieval performance were evaluated. The testing results showed that the ontology modelling for such a learning assistant was done relatively accurately, and shows its potential to be pursued as a cloud-based solution in future.

Download Full-text

Topic Modeling for Keyword Extraction: using Natural Language Processing methods for keyword extraction in Portal Min@s

Revista de Estudos da Linguagem ◽

10.17851/2237-2083.23.3.695-726 ◽

2015 ◽

Vol 23 (3) ◽

pp. 695 ◽

Cited By ~ 1

Author(s):

Arnaldo Candido Junior ◽

Célia Magalhães ◽

Helena Caseli ◽

Régis Zangirolami

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Keyword Extraction ◽

Processing Methods ◽

Dirichlet Allocation

Este artigo tem o objetivo da avaliar a aplicação de dois métodos automáticos eficientes na extração de palavras-chave, usados pelas comunidades da Linguística de Corpus e do Processamento da Língua Natural para gerar palavras-chave de textos literários: o WordSmith Tools e o Latent Dirichlet Allocation (LDA). As duas ferramentas escolhidas para este trabalho têm suas especificidades e técnicas diferentes de extração, o que nos levou a uma análise orientada para a sua performance. Objetivamos entender, então, como cada método funciona e avaliar sua aplicação em textos literários. Para esse fim, usamos análise humana, com conhecimento do campo dos textos usados. O método LDA foi usado para extrair palavras-chave por meio de sua integração com o Portal Min@s: Corpora de Fala e Escrita, um sistema geral de processamento de corpora, concebido para diferentes pesquisas de Linguística de Corpus. Os resultados do experimento confirmam a eficácia do WordSmith Tools e do LDA na extração de palavras-chave de um corpus literário, além de apontar que é necessária a análise humana das listas em um estágio anterior aos experimentos para complementar a lista gerada automaticamente, cruzando os resultados do WordSmith Tools e do LDA. Também indicam que a intuição linguística do analista humano sobre as listas geradas separadamente pelos dois métodos usados neste estudo foi mais favorável ao uso da lista de palavras-chave do WordSmith Tools.

Download Full-text

Latent Dirichlet Allocation in predicting clinical trial terminations

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-019-0973-y ◽

2019 ◽

Vol 19 (1) ◽

Author(s):

Simon Geletta ◽

Lendie Follett ◽

Marcia Laugerman

Keyword(s):

Clinical Trial ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Latent Dirichlet Allocation ◽

Structured Data ◽

Unstructured Data ◽

Future Research ◽

Funding Agencies ◽

Dirichlet Allocation

Abstract Background This study used natural language processing (NLP) and machine learning (ML) techniques to identify reliable patterns from within research narrative documents to distinguish studies that complete successfully, from the ones that terminate. Recent research findings have reported that at least 10 % of all studies that are funded by major research funding agencies terminate without yielding useful results. Since it is well-known that scientific studies that receive funding from major funding agencies are carefully planned, and rigorously vetted through the peer-review process, it was somewhat daunting to us that study-terminations are this prevalent. Moreover, our review of the literature about study terminations suggested that the reasons for study terminations are not well understood. We therefore aimed to address that knowledge gap, by seeking to identify the factors that contribute to study failures. Method We used data from the clinicialTrials.gov repository, from which we extracted both structured data (study characteristics), and unstructured data (the narrative description of the studies). We applied natural language processing techniques to the unstructured data to quantify the risk of termination by identifying distinctive topics that are more frequently associated with trials that are terminated and trials that are completed. We used the Latent Dirichlet Allocation (LDA) technique to derive 25 “topics” with corresponding sets of probabilities, which we then used to predict study-termination by utilizing random forest modeling. We fit two distinct models – one using only structured data as predictors and another model with both structured data and the 25 text topics derived from the unstructured data. Results In this paper, we demonstrate the interpretive and predictive value of LDA as it relates to predicting clinical trial failure. The results also demonstrate that the combined modeling approach yields robust predictive probabilities in terms of both sensitivity and specificity, relative to a model that utilizes the structured data alone. Conclusions Our study demonstrated that the use of topic modeling using LDA significantly raises the utility of unstructured data in better predicating the completion vs. termination of studies. This study sets the direction for future research to evaluate the viability of the designs of health studies.

Download Full-text

Exploring the Non-Medical impacts of Covid-19 using Natural Language Processing

10.20944/preprints202011.0056.v1 ◽

2020 ◽

Author(s):

Amol Agade ◽

Samta Balpande

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Global Economy ◽

Latent Dirichlet Allocation ◽

Topic Modelling ◽

Financial Industry ◽

Distance Map ◽

The Media ◽

Non Negative Matrix Factorization

Ongoing COVID-19 Pandemic has resulted into massive damage to various platforms of global economy which has caused disruption to human livelihood. Natural Language Processing has been extensively used in different organizations to categorize sentiments, perform recommendation, summarizing information and topic modelling. This research aims to understand the non-medical impact of COVID-19 on global economy by leveraging the natural language processing methodology. This methodology comprises of text classification which includes topic modelling on unstructured COVID-19 media articles dataset provided by Anacode. Like other Natural Language Processing algorithms, Latent Dirichlet allocation (LDA) and Non-negative matrix factorization (NMF) has been proposed to classify the media articles dataset in order to analyze COVID-19 pandemic impacts in the different sectors of global economy. Model Accuracy was examined based on the coherence and perplexity score which came out to be 0.51 and -10.90 using LDA algorithm. Both the LDA and NMF algorithm identified similar prevalent topics that was impacted by COVID-19 pandemic in multiple sectors of economy. Through intertopic distance map visualization produced by LDA algorithm, it can be reciprocated that general industries which includes children schooling, parental care, and family gatherings had the major impact followed by business sector and the financial industry.

Download Full-text

FindThatQuote: A Question-Answering Web-based System to Locate Quotes using Deep Learning and Natural-Language Processing

10.5121/csit.2021.110909 ◽

2021 ◽

Author(s):

Nathan Ji ◽

Yu Sun

Keyword(s):

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Success Rate ◽

Language Processing ◽

Question Answering ◽

Specific Information ◽

Web Based ◽

User Friendly ◽

Web Based System

The digital age gives us access to a multitude of both information and mediums in which we can interpret information. A majority of the time, many people find interpreting such information difficult as the medium may not be as user friendly as possible. This project has examined the inquiry of how one can identify specific information in a given text based on a question. This inquiry is intended to streamline one's ability to determine the relevance of a given text relative to his objective. The project has an overall 80% success rate given 10 articles with three questions asked per article. This success rate indicates that this project is likely applicable to those who are asking for content level questions within an article.

Download Full-text

Capturing and analyzing social representations. A first application of Natural Language Processing techniques to reader’s comments in COVID-19 news. Argentina, 2020

10.31235/osf.io/3pcdu ◽

2020 ◽

Author(s):

German Rosati ◽

Laia Domenech ◽

Adriana Chazarreta ◽

Tomas Maguire

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Latent Dirichlet Allocation ◽

Social Representations ◽

Web Crawler ◽

First Approximation ◽

Processing Techniques ◽

Dirichlet Allocation

We present a first approximation to the quantification of social representations about the COVID-19, using news comments. A web crawler was developed for constructing the dataset of reader’s comments. We detect relevant topics in the dataset using Latent Dirichlet Allocation, and analyze its evolution during time. Finally, we show a first prototype to the prediction of the majority topics, using FastText.

Download Full-text

Latent Dirichlet Allocation in Predicting Clinical Trial Terminations

10.21203/rs.2.12904/v4 ◽

2019 ◽

Author(s):

Simon Geletta ◽

Lendie Follett ◽

Marcia R Laugerman

Keyword(s):

Clinical Trial ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Latent Dirichlet Allocation ◽

Structured Data ◽

Unstructured Data ◽

Future Research ◽

Funding Agencies ◽

Dirichlet Allocation

Abstract This study used natural language processing (NLP) and machine learning (ML) techniques to identify reliable patterns from within research narrative documents to distinguish studies that complete successfully, from the ones that terminate. Recent research findings have reported that at least ten percent of all studies that are funded by major research funding agencies terminate without yielding useful results. Since it is well-known that scientific studies that receive funding from major funding agencies are carefully planned, and rigorously vetted through the peer-review process, it was somewhat daunting to us that study-terminations are this prevalent. Moreover, our review of the literature about study terminations suggested that the reasons for study terminations are not well understood. We therefore aimed to address that knowledge gap, by seeking to identify the factors that contribute to study failures.Method: We used data from the clinicialTrials.gov repository, from which we extracted both structured data (study characteristics), and unstructured data (the narrative description of the studies). We applied natural language processing techniques to the unstructured data to quantify the risk of termination by identifying distinctive topics that are more frequently associated with trials that are terminated and trials that are completed. We used the Latent Dirichlet Allocation (LDA) technique to derive 25 “topics” with corresponding sets of probabilities, which we then used to predict study-termination by utilizing random forest modeling. We fit two distinct models – one using only structured data as predictors and another model with both structured data and the 25 text topics derived from the unstructured data.Results: In this paper, we demonstrate the interpretive and predictive value of LDA as it relates to predicting clinical trial failure. The results also demonstrate that the combined modeling approach yields robust predictive probabilities in terms of both sensitivity and specificity, relative to a model that utilizes the structured data alone.Conclusions: Our study demonstrated that the use of topic modeling using LDA significantly raises the utility of unstructured data in better predicating the completion vs. termination of studies. This study sets the direction for future research to evaluate the viability of the designs of health studies.

Download Full-text

Klasifikasi Laporan Keluhan Pelayanan Publik Berdasarkan Instansi Menggunakan Metode LDA-SVM

Jurnal Teknologi Informasi dan Ilmu Komputer ◽

10.25126/jtiik.2021863768 ◽

2021 ◽

Vol 8 (6) ◽

pp. 1265

Author(s):

Muhammad Alkaff ◽

Andreyan Rizky Baskara ◽

Irham Maulani

Keyword(s):

Support Vector Machine ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Latent Dirichlet Allocation ◽

Confusion Matrix ◽

Support Vector ◽

Topic Distribution ◽

The Government ◽

Dirichlet Allocation

Sebuah sistem layanan untuk menyampaikan aspirasi dan keluhan masyarakat terhadap layanan pemerintah Indonesia, bernama Lapor! Pemerintah sudah lama memanfaatkan sistem tersebut untuk menjawab permasalahan masyarakat Indonesia terkait permasalahan birokrasi. Namun, peningkatan volume laporan dan pemilahan laporan yang dilakukan oleh operator dengan membaca setiap keluhan yang masuk melalui sistem menyebabkan sering terjadi kesalahan dimana operator meneruskan laporan tersebut ke instansi yang salah. Oleh karena itu, diperlukan suatu solusi yang dapat menentukan konteks laporan secara otomatis dengan menggunakan teknik Natural Language Processing. Penelitian ini bertujuan untuk membangun klasifikasi laporan secara otomatis berdasarkan topik laporan yang ditujukan kepada instansi yang berwenang dengan menggabungkan metode Latent Dirichlet Allocation (LDA) dan Support Vector Machine (SVM). Proses pemodelan topik untuk setiap laporan dilakukan dengan menggunakan metode LDA. Metode ini mengekstrak laporan untuk menemukan pola tertentu dalam dokumen yang akan menghasilkan keluaran dalam nilai distribusi topik. Selanjutnya, proses klasifikasi untuk menentukan laporan agensi tujuan dilakukan dengan menggunakan SVM berdasarkan nilai topik yang diekstraksi dengan metode LDA. Performa model LDA-SVM diukur dengan menggunakan confusion matrix dengan menghitung nilai akurasi, presisi, recall, dan F1 Score. Hasil pengujian menggunakan teknik split train-test dengan skor 70:30 menunjukkan bahwa model menghasilkan kinerja yang baik dengan akurasi 79,85%, presisi 79,98%, recall 72,37%, dan Skor F1 74,67%. AbstractA service system to convey aspirations and complaints from the public against Indonesia's government services, named Lapor! The Government has used the Government for a long time to answer the problems of the Indonesian people related to bureaucratic problems. However, the increasing volume of reports and the sorting of reports carried out by operators by reading every complaint that comes through the system cause frequent errors where operators forward the reports to the wrong agencies. Therefore, we need a solution that can automatically determine the report's context using Natural Language Processing techniques. This study aims to build automatic report classifications based on report topics addressed to authorized agencies by combining Latent Dirichlet Allocation (LDA) and Support Vector Machine (SVM). The topic-modeling process for each report was carried out using the LDA method. This method extracts reports to find specific patterns in documents that will produce output in topic distribution values. Furthermore, the classification process to determine the report's destination agency carried out using the SVM based on the value of the topics extracted by the LDA method. The LDA-SVM model's performance is measured using a confusion matrix by calculating the value of accuracy, precision, recall, and F1 Score. The test results using the train-test split technique with a 70:30 show that the model produces good performance with 79.85% accuracy, 79.98% precision, 72.37% recall, and 74.67% F1 Score

Download Full-text

Latent Dirichlet Allocation in Predicting Clinical Trial Terminations

10.21203/rs.2.12904/v3 ◽

2019 ◽

Author(s):

Simon Geletta ◽

Lendie Follett ◽

Marcia R Laugerman

Keyword(s):

Clinical Trial ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Latent Dirichlet Allocation ◽

Structured Data ◽

Unstructured Data ◽

Future Research ◽

Funding Agencies ◽

Dirichlet Allocation

Abstract Background: This study used natural language processing (NLP) and machine learning (ML) techniques to identify reliable patterns from within research narrative documents to distinguish studies that complete successfully, from the ones that terminate. Recent research findings have reported that at least ten percent of all studies that are funded by major research funding agencies terminate without yielding useful results. Since it is well-known that scientific studies that receive funding from major funding agencies are carefully planned, and rigorously vetted through the peer-review process, it was somewhat daunting to us that study-terminations are this prevalent. Moreover, our review of the literature about study terminations suggested that the reasons for study terminations are not well understood. We therefore aimed to address that knowledge gap, by seeking to identify the factors that contribute to study failures.Method: We used data from the clinicialTrials.gov repository, from which we extracted both structured data (study characteristics), and unstructured data (the narrative description of the studies). We applied natural language processing techniques to the unstructured data to quantify the risk of termination by identifying distinctive topics that are more frequently associated with trials that are terminated and trials that are completed. We used the Latent Dirichlet Allocation (LDA) technique to derive 25 “topics” with corresponding sets of probabilities, which we then used to predict study-termination by utilizing random forest modeling. We fit two distinct models – one using only structured data as predictors and another model with both structured data and the 25 text topics derived from the unstructured data.Results: In this paper, we demonstrate the interpretive and predictive value of LDA as it relates to predicting clinical trial failure. The results also demonstrate that the combined modeling approach yields robust predictive probabilities in terms of both sensitivity and specificity, relative to a model that utilizes the structured data alone.Conclusions: Our study demonstrated that the use of topic modeling using LDA significantly raises the utility of unstructured data in better predicating the completion vs. termination of studies. This study sets the direction for future research to evaluate the viability of the designs of health studies.

Download Full-text

Understanding Communication in an Online Cancer Forum: Content Analysis Study

JMIR Cancer ◽

10.2196/29555 ◽

2021 ◽

Vol 7 (3) ◽

pp. e29555

Author(s):

Anietie Andy ◽

Uduak Andy

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Family Member ◽

Language Processing ◽

Latent Dirichlet Allocation ◽

Family Members ◽

Support Needs ◽

Online Forums ◽

Linguistic Inquiry ◽

Hospital Visits

Background Cancer affects individuals, their family members, and friends, and increasingly, some of these individuals are turning to online cancer forums to express their thoughts/feelings and seek support such as asking cancer-related questions. The thoughts/feelings expressed and the support needed from these online forums may differ depending on if (1) an individual has or had cancer or (2) an individual is a family member or friend of an individual who has or had cancer; the language used in posts in these forums may reflect these differences. Objective Using natural language processing methods, we aim to determine the differences in the support needs and concerns expressed in posts published on an online cancer forum by (1) users who self-declare to have or had cancer compared with (2) users who self-declare to be family members or friends of individuals with or that had cancer. Methods Using latent Dirichlet allocation (LDA), which is a natural language processing algorithm and Linguistic Inquiry and Word Count (LIWC), a psycholinguistic dictionary, we analyzed posts published on an online cancer forum with the aim to delineate the language features associated with users in these different groups. Results Users who self-declare to have or had cancer were more likely to post about LDA topics related to hospital visits (Cohen d=0.671) and use words associated with LIWC categories related to health (Cohen d=0.635) and anxiety (Cohen d=0.126). By contrast, users who declared to be family members or friends tend to post about LDA topics related to losing a family member (Cohen d=0.702) and LIWC categories focusing on the past (Cohen d=0.465) and death (Cohen d=0.181) were more associated with these users. Conclusions Using LDA and LIWC, we show that there are differences in the support needs and concerns expressed in posts published on an online cancer forum by users with cancer compared with family members or friends of those with cancer. Hence, responders to online cancer forums need to be cognizant of these differences in support needs and concerns and tailor their responses based on these findings.

Download Full-text