The AEMON‐J “Hacking Limnology” Workshop Series & Virtual Summit: Incorporating Data Science and Open Science in Aquatic Research

Open Code is not enough: Towards a replicable future for geographic data science

10.31235/osf.io/3hbnt ◽

2019 ◽

Author(s):

Levi John Wolf ◽

Sergio J. Rey ◽

Taylor M. Oshan

Keyword(s):

Spatial Data ◽

Data Science ◽

Current Model ◽

Open Science ◽

Social Changes ◽

Working Definition ◽

Geospatial Cyberinfrastructure ◽

Geographic Data ◽

Definition Of ◽

Healthy Part

Open science practices are a large and healthy part of computational geography and the burgeoning field of spatial data science. In many forms, open geospatial cyberinfrastructure adheres to a varying and informal set of practices and codes that empower levels of collaboration that are impossible otherwise. Pathbreaking work in geographical sciences has explicitly brought these concepts into focus for our current model of open science in geography. In practice, however, these blend together into a somewhat ill-advised but easy-to-use working definition of open science: you know open science when you see it (on GitHub). However, open science lags far behind the needs revealed by this level of collaboration. In this paper, we describe the concerns of open geographic data science, in terms of replicability and open science. We discuss the practical techniques that engender community-building in open science communities, and discuss the impacts that these kinds of social changes have on the technological architecture of scientific infrastructure.

Download Full-text

Teaching Computational Social Science Skills to Psychology Students: An Undergraduate Research Lab Case Study

Scholarship and Practice of Undergraduate Research ◽

10.18833/spur/4/1/5 ◽

2020 ◽

Vol 4 (1) ◽

pp. 5-14

Author(s):

Brian A. Eiler ◽

◽

Patrick C. Doyle ◽

Rosemary L. Al-Kire ◽

Heidi A. Wayment ◽

...

Keyword(s):

Social Science ◽

Data Science ◽

Undergraduate Research ◽

Psychology Students ◽

Psychological Research ◽

Open Science ◽

Computational Social Science ◽

Research Experience ◽

Science Practices

This article provides a case study of a student-focused research experience that introduced basic data science skills and their utility for psychological research, providing practical learning experiences for students interested in learning computational social science skills. Skills included programming; acquiring, visualizing, and managing data; performing specialized analyses; and building knowledge about open-science practices.

Download Full-text

FAIR Digital Objects for Science: From Data Pieces to Actionable Knowledge Units

Publications ◽

10.3390/publications8020021 ◽

2020 ◽

Vol 8 (2) ◽

pp. 21 ◽

Cited By ~ 2

Author(s):

Koenraad De Smedt ◽

Dimitris Koureas ◽

Peter Wittenburg

Keyword(s):

Data Science ◽

Open Science ◽

Research Data ◽

Use Cases ◽

Digital Object ◽

Data Interoperability ◽

Actionable Knowledge ◽

The Past ◽

Research Communities ◽

Digital Objects

Data science is facing the following major challenges: (1) developing scalable cross-disciplinary capabilities, (2) dealing with the increasing data volumes and their inherent complexity, (3) building tools that help to build trust, (4) creating mechanisms to efficiently operate in the domain of scientific assertions, (5) turning data into actionable knowledge units and (6) promoting data interoperability. As a way to overcome these challenges, we further develop the proposals by early Internet pioneers for Digital Objects as encapsulations of data and metadata made accessible by persistent identifiers. In the past decade, this concept was revisited by various groups within the Research Data Alliance and put in the context of the FAIR Guiding Principles for findable, accessible, interoperable and reusable data. The basic components of a FAIR Digital Object (FDO) as a self-contained, typed, machine-actionable data package are explained. A survey of use cases has indicated the growing interest of research communities in FDO solutions. We conclude that the FDO concept has the potential to act as the interoperable federative core of a hyperinfrastructure initiative such as the European Open Science Cloud (EOSC).

Download Full-text

Abordagens de reúso e a questão da reusabilidade dos dados científicos | Approaches for data reuse and the issue of scientific data reusability

Liinc em Revista ◽

10.18617/liinc.v15i2.4777 ◽

2019 ◽

Vol 15 (2) ◽

Author(s):

Renata Curty

Keyword(s):

Data Sharing ◽

Data Science ◽

Meta Analysis ◽

Science Research ◽

Open Science ◽

Scientific Data ◽

Data Reuse ◽

Data Repositories ◽

Documentation Quality ◽

Data Documentation

RESUMO As diretivas governamentais e institucionais em torno do compartilhamento de dados de pesquisas financiadas com dinheiro público têm impulsionado a rápida expansão de repositórios digitais de dados afim de disponibilizar esses ativos científicos para reutilização, com propósitos nem sempre antecipados, pelos pesquisadores que os produziram/coletaram. De modo contraditório, embora o argumento em torno do compartilhamento de dados seja fortemente sustentado no potencial de reúso e em suas consequentes contribuições para o avanço científico, esse tema permanece acessório às discussões em torno da ciência de dados e da ciência aberta. O presente artigo de revisão narrativa tem por objetivo lançar um olhar mais atento ao reúso de dados e explorar mais diretamente esse conceito, ao passo que propõe uma classificação inicial de cinco abordagens distintas para o reúso de dados de pesquisa (reaproveitamento, agregação, integração, metanálise e reanálise), com base em situações hipotéticas acompanhadas de casos de reúso de dados publicados na literatura científica. Também explora questões determinantes para a condição de reúso, relacionando a reusabilidade à qualidade da documentação que acompanha os dados. Oferece discussão sobre os desafios da documentação de dados, bem como algumas iniciativas e recomendações para que essas dificuldades sejam contornadas. Espera-se que os argumentos apresentados contribuam não somente para o avanço conceitual em torno do reúso e da reusabilidade de dados, mas também reverberem em ações relacionadas à documentação dos dados de modo a incrementar o potencial de reúso desses ativos científicos.Palavras-chave: Reúso de Dados; Reprodutibilidade Científica; Reusabilidade; Ciência Aberta; Dados de Pesquisa. ABSTRACT The availability of scientific assets through data repositories has been greatly increased as a result of government and institutional data sharing policies and mandates for publicly funded research, allowing data to be reused for purposes not always anticipated by primary researchers. Despite the fact that the argument favoring data sharing is strongly grounded in the possibilities of data reuse and its contributions to scientific advancement, this subject remains unobserved in discussions about data science and open science. This paper follows a narrative review method to take a closer look at data reuse in order to better conceptualize this term, while proposing an early classification of five distinct data reuse approaches (repurposing, aggregation, integration, meta-analysis and reanalysis) based on hypothetical cases and literature examples. It also explores the determinants of what constitutes reusable data, and the relationship between data reusability and documentation quality. It presents some challenges associated with data documentation and points out some initiatives and recommendations to overcome such problems. It expects to contribute not only for the conceptual advancement around the reusability and effective reuse of the data, but also to result in initiatives related to data documentation in order to increase the reuse potential of these scientific assets.Keywords:Data Reuse; Scientific Reproducibility; Reusability; Open Science; Research Data.

Download Full-text

Toward collaborative open data science in metabolomics using Jupyter Notebooks and cloud computing

Metabolomics ◽

10.1007/s11306-019-1588-0 ◽

2019 ◽

Vol 15 (10) ◽

Cited By ~ 7

Author(s):

Kevin M. Mendez ◽

Leighton Pritchard ◽

Stacey N. Reinke ◽

David I. Broadhurst

Keyword(s):

Cloud Computing ◽

Web Application ◽

Data Science ◽

Open Data ◽

Open Science ◽

Data Repository ◽

Data Repositories ◽

Fully Integrated ◽

Computing Platform ◽

Novices And Experts

Abstract Background A lack of transparency and reporting standards in the scientific community has led to increasing and widespread concerns relating to reproduction and integrity of results. As an omics science, which generates vast amounts of data and relies heavily on data science for deriving biological meaning, metabolomics is highly vulnerable to irreproducibility. The metabolomics community has made substantial efforts to align with FAIR data standards by promoting open data formats, data repositories, online spectral libraries, and metabolite databases. Open data analysis platforms also exist; however, they tend to be inflexible and rely on the user to adequately report their methods and results. To enable FAIR data science in metabolomics, methods and results need to be transparently disseminated in a manner that is rapid, reusable, and fully integrated with the published work. To ensure broad use within the community such a framework also needs to be inclusive and intuitive for both computational novices and experts alike. Aim of Review To encourage metabolomics researchers from all backgrounds to take control of their own data science, mould it to their personal requirements, and enthusiastically share resources through open science. Key Scientific Concepts of Review This tutorial introduces the concept of interactive web-based computational laboratory notebooks. The reader is guided through a set of experiential tutorials specifically targeted at metabolomics researchers, based around the Jupyter Notebook web application, GitHub data repository, and Binder cloud computing platform.

Download Full-text

Open science: a revolution in sight?

Interlending & Document Supply ◽

10.1108/ilds-06-2016-0020 ◽

2016 ◽

Vol 44 (4) ◽

pp. 155-160 ◽

Cited By ~ 3

Author(s):

Bernard Rentier

Keyword(s):

Open Access ◽

Data Science ◽

Electronic Publishing ◽

Open Science ◽

Free Access ◽

New Paradigm ◽

Content Type ◽

Access Policy ◽

Access To Knowledge ◽

The University

Purpose This paper aims to describe the evolution of scientific communication, largely represented by the publication process. It notes the disappearance of the traditional publication on paper and its progressive replacement by electronic publishing, a new paradigm implying radical changes in the whole mechanism. It aims also at warning the scientific community about the dangers of some new avenues and why, rather than subcontracting an essential part of its work, it must take back full control of its production. Design/methodology/approach The paper reviews the emerging concepts in scholarly publication and aims to answer frequently asked questions concerning free access to scientific literature as well as to data, science and knowledge in general. Findings The paper provides new observations concerning the level of compliance to institutional open access mandates and the poor relevance of journal prestige for quality evaluation of research and researchers. The results of introducing an open access policy at the University of Liège are noted. Social implications Open access is, for the first time in human history, an opportunity to provide free access to knowledge universally, regardless of either the wealth or the social status of the potentially interested readers. It is an essential breakthrough for developing countries. Originality/value Open access and Open Science in general must be considered as common values that should be shared freely. Free access to publicly generated knowledge should be explicitly included in universal human rights. There are still a number of obstacles hampering this goal, mostly the greed of intermediaries who persuade researchers to give their work for free, in exchange for prestige. The worldwide cause of Open Knowledge is thus a major universal issue for the twenty-first century.

Download Full-text

Social Media, Open Science, and Data Science Are Inextricably Linked

Neuron ◽

10.1016/j.neuron.2017.11.015 ◽

2017 ◽

Vol 96 (6) ◽

pp. 1219-1222 ◽

Cited By ~ 4

Author(s):

Bradley Voytek

Keyword(s):

Social Media ◽

Data Science ◽

Open Science

Download Full-text

Virtual European Solar & Planetary Access (VESPA): Progress and prospects

10.5194/epsc2020-190 ◽

2020 ◽

Author(s):

Stéphane Erard ◽

Baptiste Cecconi ◽

Pierre Le Sidaner ◽

Angelo Pio Rossi ◽

Carlos Brandt ◽

...

Keyword(s):

Distribution System ◽

Data Science ◽

Large Data ◽

Open Science ◽

Planetary Science ◽

Science Data ◽

Horizon 2020 ◽

Data Services ◽

Related Data ◽

Research And Innovation

The H2020 Europlanet-2020 programme, which ended on Aug 31st, 2019, included an activity called VESPA (Virtual European Solar and Planetary Access), which focused on adapting Virtual Observatory (VO) techniques to handle Planetary Science data [1] [2]. The outcome of this activity is a contributive data distribution system where data services are located and maintained in research institutes, declared in a registry, and accessed by several clients based on a specific access protocol. During Europlanet-2020, 52 data services were installed, including the complete ESA Planetary Science Archive, and the outcome of several EU funded projects. Data are described using the EPN-TAP protocol, which parameters describe acquisition and observing conditions as well as data characteristics (physical quantity, data type, etc). A main search portal has been developed to optimize the user experience, which queries all services together. Compliance with VO standards ensures that existing tools can be used as well, either to access or visualize the data. In addition, a bridge linking the VO and Geographic Information Systems (GIS) has been installed to address formats and tools used to study planetary surfaces; several large data infrastructures were also installed or upgraded (SSHADE for lab spectroscopy, PVOL for amateurs images, AMDA for plasma-related data).In the framework of the starting Europlanet-2024 programme, the VESPA activity will complete this system even further: 30-50 new data services will be installed, focusing on derived data, and experimental data produced in other Work Packages of Europlanet-2024; connections between PDS4 and EPN-TAP dictionaries will make PDS metadata searchable from the VESPA portal and vice versa; Solar System data present in astronomical VO catalogues will be made accessible, e.g. from the VizieR database. The search system will be connected with more powerful display and analysing tools: a run-on-demand platform will be installed, as well as Machine Learning capacities to process the available content. Finally, long-term sustainability will be improved by setting VESPA hubs to assist data providers in maintaining their services, and by using the new EU-funded European Open Science Cloud (EOSC). In addition to favoring data exploitation, VESPA will provide a handy and economical solution to Open Science challenges in the field.The Europlanet 2020 & 2024 Research Infrastructure project have received funding from the European Union's Horizon 2020 research and innovation programme under grant agreements No 654208 & 871149.[1]&#160;Erard et al 2018, Planet. Space Sci. 150, 65-85. 10.1016/j.pss.2017.05.013. ArXiv 1705.09727&#160;&#160;[2]&#160;Erard et al. 2020, Data Science Journal 19, 22. doi: 10.5334/dsj-2020-022.

Download Full-text

Methods for Open and Reproducible Materials Science

10.31235/osf.io/ag8zu ◽

2019 ◽

Author(s):

Sara L Wilson ◽

Micah Altman ◽

Rafael Jaramillo

Keyword(s):

Data Management ◽

Data Science ◽

Management Practices ◽

Materials Science ◽

Critical Role ◽

Structured Interview ◽

Open Science ◽

Manuscript Submission ◽

Attitudes And Practices ◽

Data Stewardship

Data stewardship in experimental materials science is increasingly complex and important. Progress in data science and inverse-design of materials give reason for optimism that advances can be made if appropriate data resources are made available. Data stewardship also plays a critical role in maintaining broad support for research in the face of well-publicized replication failures (in different fields) and frequently changing attitudes, norms, and sponsor requirements for open science. The present-day data management practices and attitudes in materials science are not well understood. In this article, we collect information on the practices of a selection of materials scientists at two leading universities, using a semi-structured interview instrument. An analysis of these interviews reveals that although data management is universally seen as important, data management practices vary widely. Based on this analysis, we conjecture that broad adoption of basic file-level data sharing at the time of manuscript submission would benefit the field without imposing substantial burdens on researchers. More comprehensive solutions for lifecycle open research in materials science will have to overcome substantial differences in attitudes and practices.

Download Full-text

The Medical Library Association Data Services Competency: a framework for data science and open science skills development

Journal of the Medical Library Association JMLA ◽

10.5195/jmla.2020.909 ◽

2020 ◽

Vol 108 (2) ◽

Cited By ~ 2

Author(s):

Lisa Federer ◽

Erin Diane Foster ◽

Ann Glusker ◽

Margaret Henderson ◽

Kevin Read ◽

...

Keyword(s):

Data Science ◽

Open Science ◽

Skills Development ◽

Data Services ◽

Medical Library ◽

Key Skills ◽

Association Data ◽

Course Of Study

Increasingly, users of health and biomedical libraries need assistance with challenges they face in working with their own and others’ data. Librarians have a unique opportunity to provide valuable support and assistance in data science and open science but may need to add to their expertise and skill set to have the most impact. This article describes the rationale for and development of the Medical Library Association Data Services Competency, which outlines a set of five key skills for data services and provides a course of study for gaining these skills.

Download Full-text