The Use of Visual Text Mining to Support the Study Selection Activity in Systematic Literature Reviews: A Replication Study

Visual Text Mining: Ensuring the Presence of Relevant Studies in Systematic Literature Reviews

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194015500114 ◽

2015 ◽

Vol 25 (05) ◽

pp. 909-928 ◽

Cited By ~ 3

Author(s):

Katia Romero Felizardo ◽

Ellen Francine Barbosa ◽

Rafael Messias Martins ◽

Pedro Henrique Dias Valle ◽

José Carlos Maldonado

Keyword(s):

Text Mining ◽

Replication Study ◽

Primary Study ◽

Original Experiment ◽

Study Selection ◽

Literature Reviews ◽

Selection Step ◽

Visual Text ◽

And Performance ◽

Level Of Experience

One of the activities associated with the Systematic Literature Review (SLR) process is the selection review of primary studies. When the researcher faces large volumes of primary studies to be analyzed, the process used to select studies can be arduous. In a previous experiment, we conducted a pilot test to compare the performance and accuracy of PhD students in conducting the selection review activity manually and using Visual Text Mining (VTM) techniques. The goal of this paper is to describe a replication study involving PhD and Master students. The replication study uses the same experimental design and materials of the original experiment. This study also aims to investigate whether the researcher's level of experience with conducting SLRs and research in general impacts the outcome of the primary study selection step of the SLR process. The replication results have confirmed the outcomes of the original experiment, i.e., VTM is promising and can improve the performance of the selection review of primary studies. We also observed that both accuracy and performance increase in function of the researcher's experience level in conducting SLRs. The use of VTM can indeed be beneficial during the selection review activity.

Download Full-text

Using Visual Text Mining to Support the Study Selection Activity in Systematic Literature Reviews

2011 International Symposium on Empirical Software Engineering and Measurement ◽

10.1109/esem.2011.16 ◽

2011 ◽

Cited By ~ 14

Author(s):

Katia R. Felizardo ◽

Norsaremah Salleh ◽

Rafael M. Martins ◽

Emilia Mendes ◽

Stephen G. MacDonell ◽

...

Keyword(s):

Text Mining ◽

Study Selection ◽

Literature Reviews ◽

Visual Text

Download Full-text

PRM94 - ALIGNING TEXT MINING AND MACHINE LEARNING ALGORITHMS WITH BEST PRACTICES FOR STUDY SELECTION IN SYSTEMATIC LITERATURE REVIEWS

Value in Health ◽

10.1016/j.jval.2018.09.2215 ◽

2018 ◽

Vol 21 ◽

pp. S371

Author(s):

E Popoff ◽

JP Jansen ◽

M Besada ◽

S Cope ◽

S Kanters

Keyword(s):

Machine Learning ◽

Text Mining ◽

Best Practices ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Study Selection ◽

Literature Reviews

Download Full-text

Aligning text mining and machine learning algorithms with best practices for study selection in systematic literature reviews

Systematic Reviews ◽

10.1186/s13643-020-01520-5 ◽

2020 ◽

Vol 9 (1) ◽

Author(s):

E. Popoff ◽

M. Besada ◽

J. P. Jansen ◽

S. Cope ◽

S. Kanters

Keyword(s):

Machine Learning ◽

Text Mining ◽

Sensitivity And Specificity ◽

Full Text ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Training Data ◽

Support Vector ◽

Study Selection ◽

Literature Reviews

Abstract Background Despite existing research on text mining and machine learning for title and abstract screening, the role of machine learning within systematic literature reviews (SLRs) for health technology assessment (HTA) remains unclear given lack of extensive testing and of guidance from HTA agencies. We sought to address two knowledge gaps: to extend ML algorithms to provide a reason for exclusion—to align with current practices—and to determine optimal parameter settings for feature-set generation and ML algorithms. Methods We used abstract and full-text selection data from five large SLRs (n = 3089 to 12,769 abstracts) across a variety of disease areas. Each SLR was split into training and test sets. We developed a multi-step algorithm to categorize each citation into the following categories: included; excluded for each PICOS criterion; or unclassified. We used a bag-of-words approach for feature-set generation and compared machine learning algorithms using support vector machines (SVMs), naïve Bayes (NB), and bagged classification and regression trees (CART) for classification. We also compared alternative training set strategies: using full data versus downsampling (i.e., reducing excludes to balance includes/excludes because machine learning algorithms perform better with balanced data), and using inclusion/exclusion decisions from abstract versus full-text screening. Performance comparisons were in terms of specificity, sensitivity, accuracy, and matching the reason for exclusion. Results The best-fitting model (optimized sensitivity and specificity) was based on the SVM algorithm using training data based on full-text decisions, downsampling, and excluding words occurring fewer than five times. The sensitivity and specificity of this model ranged from 94 to 100%, and 54 to 89%, respectively, across the five SLRs. On average, 75% of excluded citations were excluded with a reason and 83% of these citations matched the reviewers’ original reason for exclusion. Sensitivity significantly improved when both downsampling and abstract decisions were used. Conclusions ML algorithms can improve the efficiency of the SLR process and the proposed algorithms could reduce the workload of a second reviewer by identifying exclusions with a relevant PICOS reason, thus aligning with HTA guidance. Downsampling can be used to improve study selection, and improvements using full-text exclusions have implications for a learn-as-you-go approach.

Download Full-text

A method to support search string building in systematic literature reviews through visual text mining

Proceedings of the 30th Annual ACM Symposium on Applied Computing - SAC '15 ◽

10.1145/2695664.2695902 ◽

2015 ◽

Cited By ~ 6

Author(s):

Germano Duarte Mergel ◽

Milene Selbach Silveira ◽

Tiago Silva da Silva

Keyword(s):

Text Mining ◽

Search String ◽

Literature Reviews ◽

Visual Text

Download Full-text

A Roadmap for Composing Automatic Literature Reviews: A Text Mining Approach

Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering - Data and Information in Online Environments ◽

10.1007/978-3-030-77417-2_17 ◽

2021 ◽

pp. 229-239

Author(s):

Eugênio Monteiro da Silva Júnior ◽

Moisés Lima Dutra

Keyword(s):

Text Mining ◽

Literature Reviews

Download Full-text

An Approach Based on Visual Text Mining to Support Categorization and Classification in the Systematic Mapping

10.14236/ewic/ease2010.5 ◽

2010 ◽

Cited By ~ 5

Author(s):

Katia Romero Felizardo ◽

Elisa Yumi Nakagawa ◽

Daniel R.C. Feitosa ◽

Rosane Minghim ◽

José Carlos Maldonado

Keyword(s):

Text Mining ◽

Systematic Mapping ◽

Visual Text

Download Full-text

Evidence-based software engineering: systematic literature review process based on visual text mining

10.11606/t.55.2012.tde-18072012-102032 ◽

2012 ◽

Cited By ~ 1

Author(s):

Katia Romero Felizardo Scannavino

Keyword(s):

Software Engineering ◽

Text Mining ◽

Literature Review ◽

Systematic Literature Review ◽

Review Process ◽

Evidence Based ◽

Visual Text

Download Full-text

A Survey of Selected Software Technologies for Text Mining

Software Applications ◽

10.4018/978-1-60566-060-8.ch068 ◽

2009 ◽

pp. 1164-1181

Author(s):

Richard S. Segall ◽

Qingyu Zhang

Keyword(s):

Data Analysis ◽

Text Mining ◽

Text Analysis ◽

Web Mining ◽

Qualitative Data ◽

Future Trends ◽

Data Preparation ◽

Software Packages ◽

Visual Text ◽

Key Steps

This chapter presents background on text mining, and comparisons and summaries of seven selected software for text mining. The text mining software selected for discussion and comparison in this chapter are: Compare Suite by AKS-Labs, SAS Text Miner, Megaputer Text Analyst, Visual Text by Text Analysis International, Inc. (TextAI), Magaputer PolyAnalyst, WordStat by Provalis Research, and SPSS Clementine. This chapter not only discusses unique features of these text mining software packages but also compares the features offered by each in the following key steps in analyzing unstructured qualitative data: data preparation, data analysis, and result reporting. A brief discussion of Web mining and its software are also presented, as well as conclusions and future trends.

Download Full-text

Visual text mining using association rules

Computers & Graphics ◽

10.1016/j.cag.2007.01.023 ◽

2007 ◽

Vol 31 (3) ◽

pp. 316-326 ◽

Cited By ~ 37

Author(s):

A.A. Lopes ◽

R. Pinho ◽

F.V. Paulovich ◽

R. Minghim

Keyword(s):

Text Mining ◽

Association Rules ◽

Visual Text

Download Full-text