Reproducible Research Training at Scipy 2014

Statistics is a fundamental component of the scientific toolbox, but learning the basics of this area of mathematics is one of the most challenging parts of a research training. This book gives an up-to-date introduction to the classical techniques and modern extensions of linear-model analysis—one of the most useful approaches in the analysis of scientific data in the life and environmental sciences. The book emphasizes an estimation-based approach that takes account of recent criticisms of overuse of probability values and introduces the alternative approach using information criteria. The book is based on the use of the open-source R programming language for statistics and graphics, which is rapidly becoming the lingua franca in many areas of science. This second edition adds new chapters, including one discussing some of the complexities of linear-model analysis and another introducing reproducible research documents using the R Markdown package. Statistics is introduced through worked analyses performed in R using interesting data sets from ecology, evolutionary biology, and environmental science. The data sets and R scripts are available as supporting material.

Download Full-text

GemPy 1.0: open-source stochastic geological modeling and inversion

Geoscientific Model Development ◽

10.5194/gmd-12-1-2019 ◽

2019 ◽

Vol 12 (1) ◽

pp. 1-32 ◽

Cited By ~ 11

Author(s):

Miguel de la Varga ◽

Alexander Schaaf ◽

Florian Wellmann

Keyword(s):

Open Source ◽

Code Generation ◽

Raw Material ◽

Reproducible Research ◽

Geological Modeling ◽

Fault Surface ◽

Wide Range ◽

Geological Models ◽

Density Values ◽

Efficient Code

Abstract. The representation of subsurface structures is an essential aspect of a wide variety of geoscientific investigations and applications, ranging from geofluid reservoir studies, over raw material investigations, to geosequestration, as well as many branches of geoscientific research and applications in geological surveys. A wide range of methods exist to generate geological models. However, the powerful methods are behind a paywall in expensive commercial packages. We present here a full open-source geomodeling method, based on an implicit potential-field interpolation approach. The interpolation algorithm is comparable to implementations in commercial packages and capable of constructing complex full 3-D geological models, including fault networks, fault–surface interactions, unconformities and dome structures. This algorithm is implemented in the programming language Python, making use of a highly efficient underlying library for efficient code generation (Theano) that enables a direct execution on GPUs. The functionality can be separated into the core aspects required to generate 3-D geological models and additional assets for advanced scientific investigations. These assets provide the full power behind our approach, as they enable the link to machine-learning and Bayesian inference frameworks and thus a path to stochastic geological modeling and inversions. In addition, we provide methods to analyze model topology and to compute gravity fields on the basis of the geological models and assigned density values. In summary, we provide a basis for open scientific research using geological models, with the aim to foster reproducible research in the field of geomodeling.

Download Full-text

Reproducible research in computational economics: guidelines, integrated approaches, and open source software

Computational Economics ◽

10.1007/s10614-007-9084-4 ◽

2007 ◽

Vol 30 (1) ◽

pp. 19-40 ◽

Cited By ~ 11

Author(s):

Giovanni Baiocchi

Keyword(s):

Open Source ◽

Open Source Software ◽

Computational Economics ◽

Reproducible Research ◽

Integrated Approaches

Download Full-text

Creating and sharing reproducible research code the workflowr way

F1000Research ◽

10.12688/f1000research.20843.1 ◽

2019 ◽

Vol 8 ◽

pp. 1749 ◽

Cited By ~ 7

Author(s):

John D. Blischak ◽

Peter Carbonetto ◽

Matthew Stephens

Keyword(s):

Open Source ◽

Source Code ◽

R Package ◽

Version Control ◽

Regular Part ◽

Reproducible Research ◽

Careful Attention ◽

Research Practice ◽

Development History ◽

Novice Users

Making scientific analyses reproducible, well documented, and easily shareable is crucial to maximizing their impact and ensuring that others can build on them. However, accomplishing these goals is not easy, requiring careful attention to organization, workflow, and familiarity with tools that are not a regular part of every scientist's toolbox. We have developed an R package, workflowr, to help all scientists, regardless of background, overcome these challenges. Workflowr aims to instill a particular "workflow" — a sequence of steps to be repeated and integrated into research practice — that helps make projects more reproducible and accessible.This workflow integrates four key elements: (1) version control (via Git); (2) literate programming (via R Markdown); (3) automatic checks and safeguards that improve code reproducibility; and (4) sharing code and results via a browsable website. These features exploit powerful existing tools, whose mastery would take considerable study. However, the workflowr interface is simple enough that novice users can quickly enjoy its many benefits. By simply following the workflowr "workflow", R users can create projects whose results, figures, and development history are easily accessible on a static website — thereby conveniently shareable with collaborators by sending them a URL — and accompanied by source code and reproducibility safeguards. The workflowr R package is open source and available on CRAN, with full documentation and source code available at https://github.com/jdblischak/workflowr.

Download Full-text

Reviewing qualitative GIS research—Toward a wider usage of open‐source GIS and reproducible research practices

Geography Compass ◽

10.1111/gec3.12441 ◽

2019 ◽

Vol 13 (6) ◽

Cited By ~ 5

Author(s):

Jannes Muenchow ◽

Susann Schäfer ◽

Eric Krüger

Keyword(s):

Open Source ◽

Reproducible Research ◽

Research Practices ◽

Qualitative Gis ◽

Open Source Gis

Download Full-text

Reproducibility in plant pathology: where do we stand and a way forward.

10.31220/agrirxiv.2021.00082 ◽

2021 ◽

Author(s):

Adam H. Sparks ◽

Emerson del Ponte ◽

Kaique S. Alves ◽

Zachary S. L. Foster ◽

Niklaus J. Grünwald

Keyword(s):

Plant Pathology ◽

Open Source ◽

Open Science ◽

Scientific Study ◽

Reproducible Research ◽

Research Practices ◽

Open Research ◽

Computational Code ◽

Share Data ◽

Scientific Results

Abstract Open research practices have been highlighted extensively during the last ten years in many fields of scientific study as essential standards needed to promote transparency and reproducibility of scientific results. Scientific claims can only be evaluated based on how protocols, materials, equipment and methods were described; data were collected and prepared; and, analyses were conducted. Openly sharing protocols, data and computational code is central for current scholarly dissemination and communication, but in many fields, including plant pathology, adoption of these practices has been slow. We randomly selected 300 articles published from 2012 to 2018 across 21 journals representative of the plant pathology discipline and assigned them scores reflecting their openness and reproducibility. We found that most of the articles were not following protocols for open science, and were failing to share data or code in a reproducible way. We also propose that use of open-source tools facilitates reproducible work and analyses benefitting not just readers, but the authors as well. Finally, we also provide ideas and tools to promote open, reproducible research practices among plant pathologists.

Download Full-text

Current Protocols: Open and Reproducible Research on OSF

10.31219/osf.io/ztvu9 ◽

2019 ◽

Author(s):

Ian Sullivan ◽

Alexander Carl DeHaven ◽

David Thomas Mellor

Keyword(s):

Data Management ◽

Open Source ◽

Scientific Community ◽

Management Plan ◽

Version Control ◽

Reproducible Research ◽

Research Practices ◽

Share Data

By implementing more transparent research practices, authors have the opportunity to stand out and showcase work that is more reproducible, easier to build upon, and more credible. The scientist gains by making work easier to share and maintain within their own lab, and the scientific community gains by making underlying data or research materials more available for confirmation or making new discoveries. The following protocol gives the author step by step instructions for using the free and open source OSF to create a data management plan, preregister their study, use version control, share data and other research materials, or post a preprint for quick and easy dissemination.

Download Full-text

Using R in hydrology: a review of recent developments and future directions

10.5194/hess-2019-50 ◽

2019 ◽

Cited By ~ 1

Author(s):

Louise J. Slater ◽

Guillaume Thirel ◽

Shaun Harrigan ◽

Olivier Delaigue ◽

Alexander Hurley ◽

...

Keyword(s):

Open Source ◽

Data Science ◽

Meteorological Data ◽

Central Place ◽

Open Science ◽

Reproducible Research ◽

Data Archives ◽

Integrated Development Environments ◽

Recent Developments ◽

Short Courses

Abstract. The open-source programming language R has gained a central place in the hydrological sciences over the last decade, driven by the availability of diverse hydro-meteorological data archives and the development of open-source computational tools. The growth of R's usage in hydrology is reflected in the number of newly published hydrological packages, the strengthening of online user communities, and the popularity of training courses and events. In this paper, we explore the benefits and advantages of R's usage in hydrology, such as: the democratization of data science and numerical literacy, the enhancement of reproducible research and open science, the access to statistical tools, the ease of connecting R to and from other languages, and the support provided by a growing community. This paper provides an overview of important packages at every step of the hydrological workflow, from the retrieval of hydro-meteorological data, to spatial analysis and cartography, hydrological modelling, statistics, and the design of static and dynamic visualizations, presentations and documents. We discuss some of the challenges that arise when using R in hydrology and useful tools to overcome them, including the use of hydrological libraries, documentation and vignettes (long-form guides that illustrate how to use packages); the role of Integrated Development Environments (IDEs); and the challenges of Big Data and parallel computing in hydrology. Last, this paper provides a roadmap for R's future within hydrology, with R packages as a driver of progress in the hydrological sciences, Application Programming Interfaces (APIs) providing new avenues for data acquisition and provision, enhanced teaching of hydrology in R, and the continued growth of the community via short courses and events.

Download Full-text

NeuroPycon: An open-source Python toolbox for fast multi-modal and reproducible brain connectivity pipelines

10.1101/789842 ◽

2019 ◽

Author(s):

David Meunier ◽

Annalisa Pascarella ◽

Dmitrii Altukhov ◽

Mainak Jas ◽

Etienne Combrisson ◽

...

Keyword(s):

Open Source ◽

Brain Connectivity ◽

Reproducible Research ◽

Link Type ◽

Large Sets ◽

Current Implementation ◽

Wide Range ◽

Neuroimaging Software ◽

Brain Data ◽

Automatic Removal

AbstractRecent years have witnessed a massive push towards reproducible research in neuroscience. Unfortunately, this endeavor is often challenged by the large diversity of tools used, project-specific custom code and the difficulty to track all user-defined parameters. NeuroPycon is an open-source multi-modal brain data analysis toolkit which provides Python-based template pipelines for advanced multi-processing of MEG, EEG, functional and anatomical MRI data, with a focus on connectivity and graph theoretical analyses. Importantly, it provides shareable parameter files to facilitate replication of all analysis steps. NeuroPycon is based on the NiPype framework which facilitates data analyses by wrapping many commonly-used neuroimaging software tools into a common Python environment. In other words, rather than being a brain imaging software with is own implementation of standard algorithms for brain signal processing, NeuroPycon seamlessly integrates existing packages (coded in python, Matlab or other languages) into a unified python framework. Importantly, thanks to the multi-threaded processing and computational efficiency afforded by NiPype, NeuroPycon provides an easy option for fast parallel processing, which critical when handling large sets of multi-dimensional brain data. Moreover, its flexible design allows users to easily configure analysis pipelines by connecting distinct nodes to each other. Each node can be a Python-wrapped module, a user-defined function or a well-established tool (e.g. MNE-Python for MEG analysis, Radatools for graph theoretical metrics, etc.). Last but not least, the ability to use NeuroPycon parameter files to fully describe any pipeline is an important feature for reproducibility, as they can be shared and used for easy replication by others. The current implementation of NeuroPycon contains two complementary packages: The first, called ephypype, includes pipelines for electrophysiology analysis and a command-line interface for on the fly pipeline creation. Current implementations allow for MEG/EEG data import, pre-processing and cleaning by automatic removal of ocular and cardiac artefacts, in addition to sensor or source-level connectivity analyses. The second package, called graphpype, is designed to investigate functional connectivity via a wide range of graph-theoretical metrics, including modular partitions. The present article describes the philosophy, architecture, and functionalities of the toolkit and provides illustrative examples through interactive notebooks. NeuroPycon is available for download via github (https://github.com/neuropycon) and the two principal packages are documented online (https://neuropycon.github.io/ephypype/index.html. and https://neuropycon.github.io/graphpype/index.html). Future developments include fusion of multi-modal data (eg. MEG and fMRI or intracranial EEG and fMRI). We hope that the release of NeuroPycon will attract many users and new contributors, and facilitate the efforts of our community towards open source tool sharing and development, as well as scientific reproducibility.

Download Full-text