Performing literature review using text mining, Part II: Expanding domain knowledge with abbreviation identification

Author(s):  
Dazhi Yang ◽  
Jihoon Hong
2020 ◽  
Author(s):  
Amir Karami ◽  
Brandon Bookstaver ◽  
Melissa Nolan

BACKGROUND The COVID-19 pandemic has impacted nearly all aspects of life and has posed significant threats to international health and the economy. Given the rapidly unfolding nature of the current pandemic, there is an urgent need to streamline literature synthesis of the growing scientific research to elucidate targeted solutions. While traditional systematic literature review studies provide valuable insights, these studies have restrictions, including analyzing a limited number of papers, having various biases, being time-consuming and labor-intensive, focusing on a few topics, incapable of trend analysis, and lack of data-driven tools. OBJECTIVE This study fills the mentioned restrictions in the literature and practice by analyzing two biomedical concepts, clinical manifestations of disease and therapeutic chemical compounds, with text mining methods in a corpus containing COVID-19 research papers and find associations between the two biomedical concepts. METHODS This research has collected papers representing COVID-19 pre-prints and peer-reviewed research published in 2020. We used frequency analysis to find highly frequent manifestations and therapeutic chemicals, representing the importance of the two biomedical concepts. This study also applied topic modeling to find the relationship between the two biomedical concepts. RESULTS We analyzed 9,298 research papers published through May 5, 2020 and found 3,645 disease-related and 2,434 chemical-related articles. The most frequent clinical manifestations of disease terminology included COVID-19, SARS, cancer, pneumonia, fever, and cough. The most frequent chemical-related terminology included Lopinavir, Ritonavir, Oxygen, Chloroquine, Remdesivir, and water. Topic modeling provided 25 categories showing relationships between our two overarching categories. These categories represent statistically significant associations between multiple aspects of each category, some connections of which were novel and not previously identified by the scientific community. CONCLUSIONS Appreciation of this context is vital due to the lack of a systematic large-scale literature review survey and the importance of fast literature review during the current COVID-19 pandemic for developing treatments. This study is beneficial to researchers for obtaining a macro-level picture of literature, to educators for knowing the scope of literature, to journals for exploring most discussed disease symptoms and pharmaceutical targets, and to policymakers and funding agencies for creating scientific strategic plans regarding COVID-19.


PLoS ONE ◽  
2012 ◽  
Vol 7 (4) ◽  
pp. e33427 ◽  
Author(s):  
Anna Korhonen ◽  
Diarmuid Ó Séaghdha ◽  
Ilona Silins ◽  
Lin Sun ◽  
Johan Högberg ◽  
...  

IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 67698-67717 ◽  
Author(s):  
Amir Karami ◽  
Morgan Lundy ◽  
Frank Webb ◽  
Yogesh K. Dwivedi

Author(s):  
Saira Gillani ◽  
Andrea Ko

Higher education and professional trainings often apply innovative e-learning systems, where ontologies are used for structuring domain knowledge. To provide up-to-date knowledge for the students, ontology has to be maintained regularly. It is especially true for IT audit and security domain, because technology is changing fast. However manual ontology population and enrichment is a complex task that require professional experience involving a lot of efforts. The authors' paper deals with the challenges and possible solutions for semi-automatic ontology enrichment and population. ProMine has two main contributions; one is the semantic-based text mining approach for automatically identifying domain-specific knowledge elements; the other is the automatic categorization of these extracted knowledge elements by using Wiktionary. ProMine ontology enrichment solution was applied in IT audit domain of an e-learning system. After ten cycles of the application ProMine, the number of automatically identified new concepts are tripled and ProMine categorized new concepts with high precision and recall.


2020 ◽  
Vol 120 (11) ◽  
pp. 2041-2065
Author(s):  
Ioanna Pavlidou ◽  
Savvas Papagiannidis ◽  
Eric Tsui

PurposeThis study is a systematic literature review of crowdsourcing that aims to present the research evidence so far regarding the extent to which it can contribute to organisational performance and produce innovations and provide insights on how organisations can operationalise it successfully.Design/methodology/approachThe systematic literature review revolved around a text mining methodology analysing 106 papers.FindingsThe themes identified are performance, innovation, operational aspects and motivations. The review revealed a few potential directions for future research in each of the themes considered.Practical implicationsThis study helps researchers to consider the recent themes on crowdsourcing and identify potential areas for research. At the same time, it provides practitioners with an understanding of the usefulness and process of crowdsourcing and insights on what the critical elements are in order to organise a successful crowdsourcing project.Originality/valueThis study employed quantitative content analysis in order to identify the main research themes with higher reliability and validity. It is also the first review on crowdsourcing that incorporates the relevant literature on crowdfunding as a value-creation tool.


2021 ◽  
Vol 7 (2) ◽  
pp. 226
Author(s):  
Angelina Pramana Thenata

Era sekarang jumlah berita dari berbagai media sosial yang tersebar dalam waktu singkat dan kebutuhan masyarakat untuk mengkonsumsi berita dalam berbagai referensi dapat mempengaruhi kehidupan masyarakat. Hal ini menyebabkan data yang tersebar dapat dikumpulkan dan dimanfaatkan oleh pemerintah, pengusaha, analisis, ataupun peneliti untuk mengidentifikasi tren, mengembangkan bisnis, memprediksi perilaku pelanggan dan lain sebagainya. Pengumpulan data berita dari media sosial tersebut dapat menggunakan text mining yang melibatkan algoritma yakni Naive Bayes, K-NN, dan SVM. Namun, penggunaan algoritma pada studi kasus yang tidak sesuai dapat memberikan hasil yang tidak optimal. Oleh karena itu, penelitian ini akan menganalisis algoritma text mining yang diimplementasikan pada media sosial berbahasa Indonesia dengan memakai metode systematic literature review. Metode ini dimulai dengan melakukan tahap planning yang menetapkan pertanyaan penelitian, kata pencarian, sumber literatur digital, dan standard literatur. Dilanjutkan dengan tahap conducting yang memilih dan mencocokan standard literatur, serta ekstraksi data. Kemudian tahap reporting yang melakukan analisis hasil ekstraksi data sehingga bisa menemumkan informasi dan pengetahuan. Tolak ukur yang menjadi acuan untuk perbandingan yakni pengujian confusion matrix berupa accuracy, precision, dan recall. Adapun hasil dari penelitian ini ditemukan algoritma Naive Bayes memberikan hasil yang stabil tapi kurang optimal jika diterapkan pada studi kasus media sosial berbahasa Indonesia. Sedangkan algortima K-NN dan SVM ditemukan memberikan hasil yang optimal jika diterapkan pada studi kasus media sosial berbahasa Indonesia yang dibuktikan dengan accuracy (50%-98.13%), precision (58.22%-98.48%), dan recall (21.05%-98%).  


Energies ◽  
2021 ◽  
Vol 14 (22) ◽  
pp. 7810
Author(s):  
Ahmed Abdelaziz ◽  
Vitor Santos ◽  
Miguel Sales Dias

The high level of energy consumption of buildings is significantly influencing occupant behavior changes towards improved energy efficiency. This paper introduces a systematic literature review with two objectives: to understand the more relevant factors affecting energy consumption of buildings and to find the best intelligent computing (IC) methods capable of classifying and predicting energy consumption of different types of buildings. Adopting the PRISMA method, the paper analyzed 822 manuscripts from 2013 to 2020 and focused on 106, based on title and abstract screening and on manuscripts with experiments. A text mining process and a bibliometric map tool (VOS viewer) were adopted to find the most used terms and their relationships, in the energy and IC domains. Our approach shows that the terms “consumption,” “residential,” and “electricity” are the more relevant terms in the energy domain, in terms of the ratio of important terms (TITs), whereas “cluster” is the more commonly used term in the IC domain. The paper also shows that there are strong relations between “Residential Energy Consumption” and “Electricity Consumption,” “Heating” and “Climate. Finally, we checked and analyzed 41 manuscripts in detail, summarized their major contributions, and identified several research gaps that provide hints for further research.


Sign in / Sign up

Export Citation Format

Share Document