A principled approach to reproducible research: a comparative review towards scientific integrity in computational research

A comparative review of the challenges encountered in sentiment analysis of Indian regional language tweets vs English language tweets

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.21.12394 ◽

2018 ◽

Vol 7 (2.21) ◽

pp. 319

Author(s):

Saini Jacob Soman ◽

P Swaminathan ◽

R Anandan ◽

K Kalaivani

Keyword(s):

Sentiment Analysis ◽

Life Events ◽

Social Networking Sites ◽

Positive Emotions ◽

English Language ◽

Indian Languages ◽

Regional Language ◽

Comparative Review ◽

Computational Research ◽

Regional Languages

With the developed use of online medium these days for sharing views, sentiments and opinions about products, services, organization and people, micro blogging and social networking sites are acquiring a huge popularity. One of the biggest social media sites namely Twitter is used by several people to share their life events, views and opinion about different areas and concepts. Sentiment analysis is the computational research of reviews, opinions, attitudes, views and peoples’ emotions about different products, services, firms and topics through categorizing them as negative and positive emotions. Sentiment analysis of tweets is a challenging task. This paper makes a critical review on the comparison of the challenges associated with sentiment analysis of Tweets in English Language versus Indian Regional Languages. Five Indian languages namely Tamil, Malayalam, Telugu, Hindi and Bengali have been considered in this research and several challenges associated with the analysis of Twitter sentiments in those languages have been identified and conceptualized in the form of a framework in this research through systematic review.

Download Full-text

uap: reproducible and robust HTS data analysis

BMC Bioinformatics ◽

10.1186/s12859-019-3219-1 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 1

Author(s):

Christoph Kämpf ◽

Michael Specht ◽

Alexander Scholz ◽

Sven-Holger Puppel ◽

Gero Doose ◽

...

Keyword(s):

Data Analysis ◽

High Throughput Sequencing ◽

Workflow Management ◽

Work Flow ◽

Management Systems ◽

Reproducible Research ◽

Bioinformatic Tools ◽

Data Analyses ◽

Computational Research ◽

Work Flow Management

Abstract Background A lack of reproducibility has been repeatedly criticized in computational research. High throughput sequencing (HTS) data analysis is a complex multi-step process. For most of the steps a range of bioinformatic tools is available and for most tools manifold parameters need to be set. Due to this complexity, HTS data analysis is particularly prone to reproducibility and consistency issues. We have defined four criteria that in our opinion ensure a minimal degree of reproducible research for HTS data analysis. A series of workflow management systems is available for assisting complex multi-step data analyses. However, to the best of our knowledge, none of the currently available work flow management systems satisfies all four criteria for reproducible HTS analysis. Results Here we present , a workflow management system dedicated to robust, consistent, and reproducible HTS data analysis. is optimized for the application to omics data, but can be easily extended to other complex analyses. It is available under the GNU GPL v3 license at https://github.com/yigbt/uap. Conclusions is a freely available tool that enables researchers to easily adhere to reproducible research principles for HTS data analyses.

Download Full-text

Reproducible research and GIScience: an evaluation using AGILE conference papers

10.7287/peerj.preprints.26561v1 ◽

2018 ◽

Author(s):

Daniel Nüst ◽

Carlos Granell ◽

Barbara Hofer ◽

Markus Konkol ◽

Frank O Ostermann ◽

...

Keyword(s):

Data Analysis ◽

Information Science ◽

Geographic Information Science ◽

Geographic Information ◽

Reproducible Research ◽

Conference Series ◽

Current State ◽

Software Skills ◽

Computational Research ◽

Conference Papers

The demand for reproducibility of research is on the rise in disciplines concerned with data analysis and computational methods. In this work existing recommendations for reproducible research are reviewed and translated into criteria for assessing reproducibility of articles in the field of geographic information science (GIScience). Using a sample of GIScience research from the Association of Geographic Information Laboratories in Europe (AGILE) conference series, we assess the current state of reproducibility of publications in this field. Feedback on the assessment was collected by surveying the authors of the sample papers. The results show the reproducibility levels are low. Although authors support the ideals, the incentives are too small. Therefore we propose concrete actions for individual researchers and the AGILE conference series to improve transparency and reproducibility, such as imparting data and software skills, an award, paper badges, author guidelines for computational research, and Open Access publications.

Download Full-text

uap: Reproducible and Robust HTS Data Analysis

10.1101/690438 ◽

2019 ◽

Author(s):

Christoph Kämpf ◽

Michael Specht ◽

Alexander Scholz ◽

Sven-Holger Puppel ◽

Gero Doose ◽

...

Keyword(s):

Data Analysis ◽

High Throughput Sequencing ◽

Workflow Management ◽

Work Flow ◽

Management Systems ◽

Reproducible Research ◽

Bioinformatic Tools ◽

Data Analyses ◽

Computational Research ◽

Work Flow Management

AbstractBackgroundA lack of reproducibility has been repeatedly criticized in computational research. High throughput sequencing (HTS) data analysis is a complex multi-step process. For most of the steps a range of bioinformatic tools is available and for most tools manifold parameters need to be set. Due to this complexity, HTS data analysis is particularly prone to reproducibility and consistency issues. We have defined four criteria that in our opinion ensure a minimal degree of reproducible research for HTS data analysis. A series of workflow management systems is available for assisting complex multi-step data analyses. However, to the best of our knowledge, none of the currently available work flow management systems satisfies all four criteria for reproducible HTS analysis.ResultsHere we present uap, a workflow management system dedicated to robust, consistent, and reproducible HTS data analysis. uap is optimized for the application to omics data, but can be easily extended to other complex analyses. It is available under the GNU GPL v3 license at https://github.com/yigbt/uap.Conclusionsuap is a freely available tool that enables researchers to easily adhere to reproducible research principles for HTS data analyses.

Download Full-text

Reproducible research and GIScience: an evaluation using AGILE conference papers

PeerJ ◽

10.7717/peerj.5072 ◽

2018 ◽

Vol 6 ◽

pp. e5072 ◽

Cited By ~ 12

Author(s):

Daniel Nüst ◽

Carlos Granell ◽

Barbara Hofer ◽

Markus Konkol ◽

Frank O. Ostermann ◽

...

Keyword(s):

Data Analysis ◽

Open Access ◽

Computational Methods ◽

Information Science ◽

Geographic Information Science ◽

Geographic Information ◽

Reproducible Research ◽

Conference Series ◽

Computational Research ◽

Conference Papers

The demand for reproducible research is on the rise in disciplines concerned with data analysis and computational methods. Therefore, we reviewed current recommendations for reproducible research and translated them into criteria for assessing the reproducibility of articles in the field of geographic information science (GIScience). Using this criteria, we assessed a sample of GIScience studies from the Association of Geographic Information Laboratories in Europe (AGILE) conference series, and we collected feedback about the assessment from the study authors. Results from the author feedback indicate that although authors support the concept of performing reproducible research, the incentives for doing this in practice are too small. Therefore, we propose concrete actions for individual researchers and the GIScience conference series to improve transparency and reproducibility. For example, to support researchers in producing reproducible work, the GIScience conference series could offer awards and paper badges, provide author guidelines for computational research, and publish articles in Open Access formats.

Download Full-text

Enhancing reproducibility using interprofessional team best practices

Journal of Clinical and Translational Science ◽

10.1017/cts.2020.512 ◽

2020 ◽

pp. 1-8 ◽

Cited By ~ 1

Author(s):

Betsy Rolland ◽

Elizabeth S. Burnside ◽

Corrine I. Voils ◽

Manish N. Shah ◽

Allan R. Brasier

Keyword(s):

Best Practices ◽

Science Research ◽

Research Process ◽

Reproducible Research ◽

Scientific Integrity ◽

Team Science ◽

Translational Science ◽

Preclinical Research ◽

Minimal Impact ◽

The Impact

Abstract The pervasive problem of irreproducibility of preclinical research represents a substantial threat to the translation of CTSA-generated health interventions. Key stakeholders in the research process have proposed solutions to this challenge to encourage research practices that improve reproducibility. However, these proposals have had minimal impact, because they either 1. take place too late in the research process, 2. focus exclusively on the products of research instead of the processes of research, and/or 3. fail to take into account the driving incentives in the research enterprise. Because so much clinical and translational science is team-based, CTSA hubs have a unique opportunity to leverage Science of Team Science research to implement and support innovative, evidence-based, team-focused, reproducibility-enhancing activities at a project’s start, and across its evolution. Here, we describe the impact of irreproducibility on clinical and translational science, review its origins, and then describe stakeholders’ efforts to impact reproducibility, and why those efforts may not have the desired effect. Based on team-science best practices and principles of scientific integrity, we then propose ways for Translational Teams to build reproducible behaviors. We end with suggestions for how CTSAs can leverage team-based best practices and identify observable behaviors that indicate a culture of reproducible research.

Download Full-text

Do economics journals foster replicable research?

Septentrio Conference Series ◽

10.7557/5.4245 ◽

2017 ◽

Author(s):

Sven Vlaeminck ◽

Ralf Toepfer

Keyword(s):

Compliance Rate ◽

Data Availability ◽

Economic Research ◽

Reproducible Research ◽

Necessary Condition ◽

Scientific Integrity ◽

Empirical Economics ◽

Replication Studies ◽

Data Archives ◽

Data Policy

Watch the VIDEO here.Replications are pivotal for the credibility of empirical economics. It is widely recognized in economics that replication studies are a necessary condition for scientific integrity. Alarmingly, several studies indicate that a significant share of empirical economics research cannot be replicated. At the same time, the awareness among researchers, that empirically-based research is often based on shaky grounds, has increased in the last years. It becomes more and more evident that there is a need for more replications in economics to regain trust and credibility in empirical economics research.Though established scholarly journals have adopted replication policies in recent years, replication activities only slightly increased. Against this background our talk investigates if and how journals in economics foster replicable research. For this purpose, we will address two aspects:Journals’ data policies and their effective enforcement in economics: The first part of our talk presents the findings of a new study, in which we evaluated almost 600 articles published in 37 well-regarded journals with a data availability policy. First, we highlight the share of articles that fall under the data policy. Subsequently, the talk contrasts for how many of these data-based articles replication files were available in journals’ data archives and/or the supplemental information section of the article. Moreover, the exact requirements of journals’ data policies have been contrasted to the replication files available on journals’ web pages (respectively in their data archives).We developed a ‘compliance rate’ for each journal in our study. The higher the compliance rate the more do journals enforce their data policy. In the first part of our talk, we also discuss the question whether voluntary data policies are effective in fostering replicable research. For this purpose, we compare the compliance rate of journals with a voluntary data policy to their mandatory counterparts.Journals as publication outlets for replication studies: Though researchers agree that replication studies are needed to regain trust and credibility in empirical economic research, replication activities only slightly increased in recent years. One reason for that finding can be that in the current system replicating other people’s result does not progresses researcher’s career. Another reason is the paucity of publication outlets for such replication studies. In this part of our talk we also discuss whether established journals should implement replication sections or whether a journal which is entirely dedicated to replication, would be a better way to foster the publication of replication studies. As a showcase we will briefly introduce the newly founded “International Journal for Re-Views in Empirical Economics” (IREE).To conclude, we sketch the current and potential future developments in economics when it comes to reproducible research.

Download Full-text

Publishing computational research – A review of infrastructures for reproducible and transparent scholarly communication

10.5194/egusphere-egu2020-17013 ◽

2020 ◽

Cited By ~ 1

Author(s):

Markus Konkol ◽

Daniel Nüst ◽

Laura Goulier

Keyword(s):

Scientific Practice ◽

Good Reason ◽

Scholarly Communication ◽

Scientific Progress ◽

Data Availability ◽

Geographical Information ◽

Reproducible Research ◽

Software Management ◽

Management Plans ◽

Computational Research

Many research papers include results based on data that is analyzed using a computational analysis implemented, e.g., in R. Publishing these materials is perceived as being good scientific practice and essential for the scientific progress. For these reasons, organizations that provide funding increasingly demand applicants to outline data and software management plans as part of their research proposals. Furthermore, the author guidelines for paper submissions more often include a section on data availability, and some reviewers reject submissions that do not contain the underlying materials without good reason [1]. This trend towards open and reproducible research puts some pressure on authors to make the source code and data used to produce the computational results in their scientific papers accessible. Despite these developments, publishing reproducible manuscripts is difficult and time-consuming. Moreover, simply providing access to code scripts and data files does not guarantee computational reproducibility [2]. Fortunately, several projects work on applications to assist authors in publishing executable analyses alongside papers considering the requirements of the aforementioned stakeholders. The chief contribution of this poster is a review of software solutions designed to solve the problem of publishing executable computational research results [3]. We compare the applications with respect to aspects that are relevant for the involved stakeholders, e.g., provided features and deployment options, and also critically discuss trends and limitations. This comparison can be used as a decision support by publishers who want to comply with reproducibility principles, editors and program committees who would like to add reproducibility requirements to the author guidelines, applicants of research proposals in the process of creating data and software management plans, and authors looking for ways to distribute their work in a verifiable and reusable manner. We also include properties related to preservation relevant for librarians dealing with long-term accessibility of research materials.&#160;References:1) Stark, P. B. (2018). Before reproducibility must come preproducibility. Nature, 557(7706), 613-614.2) Konkol, M., Kray, C., & Pfeiffer, M. (2019). Computational reproducibility in geoscientific papers: Insights from a series of studies with geoscientists and a reproduction study. International Journal of Geographical Information Science, 33(2), 408-429.3) Konkol, M., N&#252;st, D., & Goulier, L. (2020). Publishing computational research - A review of infrastructures for reproducible and transparent scholarly communication. arXiv preprint arXiv:2001.00484.

Download Full-text

Learning from reproducing computational results: introducing three principles and the Reproduction Package

Philosophical Transactions of The Royal Society A Mathematical Physical and Engineering Sciences ◽

10.1098/rsta.2020.0069 ◽

2021 ◽

Vol 379 (2197) ◽

Cited By ~ 1

Author(s):

M. S. Krafczyk ◽

A. Shi ◽

A. Bhaskar ◽

D. Marinov ◽

V. Stodden

Keyword(s):

Uncertainty Quantification ◽

In Silico ◽

Computational Science ◽

Reproducible Research ◽

Computational Results ◽

Theme Issue ◽

Research Software ◽

Computational Research

We carry out efforts to reproduce computational results for seven published articles and identify barriers to computational reproducibility. We then derive three principles to guide the practice and dissemination of reproducible computational research: (i) Provide transparency regarding how computational results are produced; (ii) When writing and releasing research software, aim for ease of (re-)executability; (iii) Make any code upon which the results rely as deterministic as possible. We then exemplify these three principles with 12 specific guidelines for their implementation in practice. We illustrate the three principles of reproducible research with a series of vignettes from our experimental reproducibility work. We define a novel Reproduction Package , a formalism that specifies a structured way to share computational research artifacts that implements the guidelines generated from our reproduction efforts to allow others to build, reproduce and extend computational science. We make our reproduction efforts in this paper publicly available as exemplar Reproduction Packages . This article is part of the theme issue ‘Reliability and reproducibility in computational science: implementing verification, validation and uncertainty quantification in silico ’.

Download Full-text

Reproducible research and GIScience: an evaluation using AGILE conference papers

10.7287/peerj.preprints.26561 ◽

2018 ◽

Cited By ~ 1

Author(s):

Daniel Nüst ◽

Carlos Granell ◽

Barbara Hofer ◽

Markus Konkol ◽

Frank O Ostermann ◽

...

Keyword(s):

Data Analysis ◽

Information Science ◽

Geographic Information Science ◽

Geographic Information ◽

Reproducible Research ◽

Conference Series ◽

Current State ◽

Software Skills ◽

Computational Research ◽

Conference Papers

The demand for reproducibility of research is on the rise in disciplines concerned with data analysis and computational methods. In this work existing recommendations for reproducible research are reviewed and translated into criteria for assessing reproducibility of articles in the field of geographic information science (GIScience). Using a sample of GIScience research from the Association of Geographic Information Laboratories in Europe (AGILE) conference series, we assess the current state of reproducibility of publications in this field. Feedback on the assessment was collected by surveying the authors of the sample papers. The results show the reproducibility levels are low. Although authors support the ideals, the incentives are too small. Therefore we propose concrete actions for individual researchers and the AGILE conference series to improve transparency and reproducibility, such as imparting data and software skills, an award, paper badges, author guidelines for computational research, and Open Access publications.

Download Full-text