Performing literature review using text mining, Part II: Expanding domain knowledge with abbreviation identification

Investigating Diseases and Chemicals in COVID-19 Literature with Text Mining (Preprint)

10.2196/preprints.21503 ◽

2020 ◽

Author(s):

Amir Karami ◽

Brandon Bookstaver ◽

Melissa Nolan

Keyword(s):

Text Mining ◽

Literature Review ◽

Topic Modeling ◽

Large Scale ◽

Clinical Manifestations ◽

International Health ◽

Research Papers ◽

Strategic Plans ◽

Funding Agencies ◽

The Relationship

BACKGROUND The COVID-19 pandemic has impacted nearly all aspects of life and has posed significant threats to international health and the economy. Given the rapidly unfolding nature of the current pandemic, there is an urgent need to streamline literature synthesis of the growing scientific research to elucidate targeted solutions. While traditional systematic literature review studies provide valuable insights, these studies have restrictions, including analyzing a limited number of papers, having various biases, being time-consuming and labor-intensive, focusing on a few topics, incapable of trend analysis, and lack of data-driven tools. OBJECTIVE This study fills the mentioned restrictions in the literature and practice by analyzing two biomedical concepts, clinical manifestations of disease and therapeutic chemical compounds, with text mining methods in a corpus containing COVID-19 research papers and find associations between the two biomedical concepts. METHODS This research has collected papers representing COVID-19 pre-prints and peer-reviewed research published in 2020. We used frequency analysis to find highly frequent manifestations and therapeutic chemicals, representing the importance of the two biomedical concepts. This study also applied topic modeling to find the relationship between the two biomedical concepts. RESULTS We analyzed 9,298 research papers published through May 5, 2020 and found 3,645 disease-related and 2,434 chemical-related articles. The most frequent clinical manifestations of disease terminology included COVID-19, SARS, cancer, pneumonia, fever, and cough. The most frequent chemical-related terminology included Lopinavir, Ritonavir, Oxygen, Chloroquine, Remdesivir, and water. Topic modeling provided 25 categories showing relationships between our two overarching categories. These categories represent statistically significant associations between multiple aspects of each category, some connections of which were novel and not previously identified by the scientific community. CONCLUSIONS Appreciation of this context is vital due to the lack of a systematic large-scale literature review survey and the importance of fast literature review during the current COVID-19 pandemic for developing treatments. This study is beneficial to researchers for obtaining a macro-level picture of literature, to educators for knowing the scope of literature, to journals for exploring most discussed disease symptoms and pharmaceutical targets, and to policymakers and funding agencies for creating scientific strategic plans regarding COVID-19.

Download Full-text

Text Mining for Literature Review and Knowledge Discovery in Cancer Risk Assessment and Research

PLoS ONE ◽

10.1371/journal.pone.0033427 ◽

2012 ◽

Vol 7 (4) ◽

pp. e33427 ◽

Cited By ~ 38

Author(s):

Anna Korhonen ◽

Diarmuid Ó Séaghdha ◽

Ilona Silins ◽

Lin Sun ◽

Johan Högberg ◽

...

Keyword(s):

Risk Assessment ◽

Text Mining ◽

Cancer Risk ◽

Literature Review ◽

Knowledge Discovery ◽

Cancer Risk Assessment

Download Full-text

Service Quality Evaluation Using Text Mining: A Systematic Literature Review

Lecture Notes in Business Information Processing - Perspectives in Business Informatics Research ◽

10.1007/978-3-030-61140-8_11 ◽

2020 ◽

pp. 159-173

Author(s):

Filip Vencovský

Keyword(s):

Text Mining ◽

Literature Review ◽

Service Quality ◽

Systematic Literature Review ◽

Quality Evaluation ◽

Service Quality Evaluation

Download Full-text

Twitter and Research: A Systematic Literature Review Through Text Mining

IEEE Access ◽

10.1109/access.2020.2983656 ◽

2020 ◽

Vol 8 ◽

pp. 67698-67717 ◽

Cited By ~ 11

Author(s):

Amir Karami ◽

Morgan Lundy ◽

Frank Webb ◽

Yogesh K. Dwivedi

Keyword(s):

Text Mining ◽

Literature Review ◽

Systematic Literature Review

Download Full-text

Incremental Ontology Population and Enrichment through Semantic-based Text Mining

International Journal on Semantic Web and Information Systems ◽

10.4018/ijswis.2015070103 ◽

2015 ◽

Vol 11 (3) ◽

pp. 44-66 ◽

Cited By ~ 9

Author(s):

Saira Gillani ◽

Andrea Ko

Keyword(s):

Text Mining ◽

Domain Knowledge ◽

Learning System ◽

Domain Specific ◽

Automatic Categorization ◽

New Concepts ◽

Domain Specific Knowledge ◽

E Learning ◽

Ontology Population ◽

It Audit

Higher education and professional trainings often apply innovative e-learning systems, where ontologies are used for structuring domain knowledge. To provide up-to-date knowledge for the students, ontology has to be maintained regularly. It is especially true for IT audit and security domain, because technology is changing fast. However manual ontology population and enrichment is a complex task that require professional experience involving a lot of efforts. The authors' paper deals with the challenges and possible solutions for semi-automatic ontology enrichment and population. ProMine has two main contributions; one is the semantic-based text mining approach for automatically identifying domain-specific knowledge elements; the other is the automatic categorization of these extracted knowledge elements by using Wiktionary. ProMine ontology enrichment solution was applied in IT audit domain of an e-learning system. After ten cycles of the application ProMine, the number of automatically identified new concepts are tripled and ProMine categorized new concepts with high precision and recall.

Download Full-text

Crowdsourcing: a systematic review of the literature using text mining

Industrial Management & Data Systems ◽

10.1108/imds-08-2020-0474 ◽

2020 ◽

Vol 120 (11) ◽

pp. 2041-2065

Author(s):

Ioanna Pavlidou ◽

Savvas Papagiannidis ◽

Eric Tsui

Keyword(s):

Text Mining ◽

Literature Review ◽

Systematic Literature Review ◽

Relevant Literature ◽

Reliability And Validity ◽

Future Research ◽

Content Type ◽

Main Research ◽

Critical Elements ◽

A Value

PurposeThis study is a systematic literature review of crowdsourcing that aims to present the research evidence so far regarding the extent to which it can contribute to organisational performance and produce innovations and provide insights on how organisations can operationalise it successfully.Design/methodology/approachThe systematic literature review revolved around a text mining methodology analysing 106 papers.FindingsThe themes identified are performance, innovation, operational aspects and motivations. The review revealed a few potential directions for future research in each of the themes considered.Practical implicationsThis study helps researchers to consider the recent themes on crowdsourcing and identify potential areas for research. At the same time, it provides practitioners with an understanding of the usefulness and process of crowdsourcing and insights on what the critical elements are in order to organise a successful crowdsourcing project.Originality/valueThis study employed quantitative content analysis in order to identify the main research themes with higher reliability and validity. It is also the first review on crowdsourcing that incorporates the relevant literature on crowdfunding as a value-creation tool.

Download Full-text

Evidence-based software engineering: systematic literature review process based on visual text mining

10.11606/t.55.2012.tde-18072012-102032 ◽

2012 ◽

Cited By ~ 1

Author(s):

Katia Romero Felizardo Scannavino

Keyword(s):

Software Engineering ◽

Text Mining ◽

Literature Review ◽

Systematic Literature Review ◽

Review Process ◽

Evidence Based ◽

Visual Text

Download Full-text

Performing literature review using text mining, Part I: Retrieving technology infrastructure using Google Scholar and APIs

2017 IEEE International Conference on Big Data (Big Data) ◽

10.1109/bigdata.2017.8258313 ◽

2017 ◽

Cited By ~ 2

Author(s):

Dazhi Yang ◽

Allan N. Zhang ◽

Wenjing Yan

Keyword(s):

Text Mining ◽

Literature Review ◽

Google Scholar ◽

Technology Infrastructure

Download Full-text

Text Mining Literature Review on Indonesian Social Media

Jurnal Edukasi dan Penelitian Informatika (JEPIN) ◽

10.26418/jp.v7i2.47975 ◽

2021 ◽

Vol 7 (2) ◽

pp. 226

Author(s):

Angelina Pramana Thenata

Keyword(s):

Social Media ◽

Text Mining ◽

Literature Review ◽

Systematic Literature Review ◽

Naive Bayes ◽

Confusion Matrix ◽

Naïve Bayes

Era sekarang jumlah berita dari berbagai media sosial yang tersebar dalam waktu singkat dan kebutuhan masyarakat untuk mengkonsumsi berita dalam berbagai referensi dapat mempengaruhi kehidupan masyarakat. Hal ini menyebabkan data yang tersebar dapat dikumpulkan dan dimanfaatkan oleh pemerintah, pengusaha, analisis, ataupun peneliti untuk mengidentifikasi tren, mengembangkan bisnis, memprediksi perilaku pelanggan dan lain sebagainya. Pengumpulan data berita dari media sosial tersebut dapat menggunakan text mining yang melibatkan algoritma yakni Naive Bayes, K-NN, dan SVM. Namun, penggunaan algoritma pada studi kasus yang tidak sesuai dapat memberikan hasil yang tidak optimal. Oleh karena itu, penelitian ini akan menganalisis algoritma text mining yang diimplementasikan pada media sosial berbahasa Indonesia dengan memakai metode systematic literature review. Metode ini dimulai dengan melakukan tahap planning yang menetapkan pertanyaan penelitian, kata pencarian, sumber literatur digital, dan standard literatur. Dilanjutkan dengan tahap conducting yang memilih dan mencocokan standard literatur, serta ekstraksi data. Kemudian tahap reporting yang melakukan analisis hasil ekstraksi data sehingga bisa menemumkan informasi dan pengetahuan. Tolak ukur yang menjadi acuan untuk perbandingan yakni pengujian confusion matrix berupa accuracy, precision, dan recall. Adapun hasil dari penelitian ini ditemukan algoritma Naive Bayes memberikan hasil yang stabil tapi kurang optimal jika diterapkan pada studi kasus media sosial berbahasa Indonesia. Sedangkan algortima K-NN dan SVM ditemukan memberikan hasil yang optimal jika diterapkan pada studi kasus media sosial berbahasa Indonesia yang dibuktikan dengan accuracy (50%-98.13%), precision (58.22%-98.48%), dan recall (21.05%-98%).

Download Full-text

Machine Learning Techniques in the Energy Consumption of Buildings: A Systematic Literature Review Using Text Mining and Bibliometric Analysis

Energies ◽

10.3390/en14227810 ◽

2021 ◽

Vol 14 (22) ◽

pp. 7810

Author(s):

Ahmed Abdelaziz ◽

Vitor Santos ◽

Miguel Sales Dias

Keyword(s):

Energy Consumption ◽

Text Mining ◽

Literature Review ◽

Systematic Literature Review ◽

Electricity Consumption ◽

Machine Learning Techniques ◽

Intelligent Computing ◽

Factors Affecting ◽

Energy Domain ◽

High Level

The high level of energy consumption of buildings is significantly influencing occupant behavior changes towards improved energy efficiency. This paper introduces a systematic literature review with two objectives: to understand the more relevant factors affecting energy consumption of buildings and to find the best intelligent computing (IC) methods capable of classifying and predicting energy consumption of different types of buildings. Adopting the PRISMA method, the paper analyzed 822 manuscripts from 2013 to 2020 and focused on 106, based on title and abstract screening and on manuscripts with experiments. A text mining process and a bibliometric map tool (VOS viewer) were adopted to find the most used terms and their relationships, in the energy and IC domains. Our approach shows that the terms “consumption,” “residential,” and “electricity” are the more relevant terms in the energy domain, in terms of the ratio of important terms (TITs), whereas “cluster” is the more commonly used term in the IC domain. The paper also shows that there are strong relations between “Residential Energy Consumption” and “Electricity Consumption,” “Heating” and “Climate. Finally, we checked and analyzed 41 manuscripts in detail, summarized their major contributions, and identified several research gaps that provide hints for further research.

Download Full-text