Adding Support for Theory in Open Science Big Data

Author(s):  
John A. Miller ◽  
Hao Peng ◽  
Michael E. Cotterell
Keyword(s):  
Big Data ◽  
2019 ◽  
Vol 11 (3) ◽  
pp. 255-273 ◽  
Author(s):  
Vicki Xafis ◽  
Markus K. Labude

Abstract There is a growing expectation, or even requirement, for researchers to deposit a variety of research data in data repositories as a condition of funding or publication. This expectation recognizes the enormous benefits of data collected and created for research purposes being made available for secondary uses, as open science gains increasing support. This is particularly so in the context of big data, especially where health data is involved. There are, however, also challenges relating to the collection, storage, and re-use of research data. This paper gives a brief overview of the landscape of data sharing via data repositories and discusses some of the key ethical issues raised by the sharing of health-related research data, including expectations of privacy and confidentiality, the transparency of repository governance structures, access restrictions, as well as data ownership and the fair attribution of credit. To consider these issues and the values that are pertinent, the paper applies the deliberative balancing approach articulated in the Ethics Framework for Big Data in Health and Research (Xafis et al. 2019) to the domain of Openness in Big Data and Data Repositories. Please refer to that article for more information on how this framework is to be used, including a full explanation of the key values involved and the balancing approach used in the case study at the end.


2020 ◽  
Vol 54 (4) ◽  
pp. 409-435
Author(s):  
Paolo Manghi ◽  
Claudio Atzori ◽  
Michele De Bonis ◽  
Alessia Bardi

PurposeSeveral online services offer functionalities to access information from “big research graphs” (e.g. Google Scholar, OpenAIRE, Microsoft Academic Graph), which correlate scholarly/scientific communication entities such as publications, authors, datasets, organizations, projects, funders, etc. Depending on the target users, access can vary from search and browse content to the consumption of statistics for monitoring and provision of feedback. Such graphs are populated over time as aggregations of multiple sources and therefore suffer from major entity-duplication problems. Although deduplication of graphs is a known and actual problem, existing solutions are dedicated to specific scenarios, operate on flat collections, local topology-drive challenges and cannot therefore be re-used in other contexts.Design/methodology/approachThis work presents GDup, an integrated, scalable, general-purpose system that can be customized to address deduplication over arbitrary large information graphs. The paper presents its high-level architecture, its implementation as a service used within the OpenAIRE infrastructure system and reports numbers of real-case experiments.FindingsGDup provides the functionalities required to deliver a fully-fledged entity deduplication workflow over a generic input graph. The system offers out-of-the-box Ground Truth management, acquisition of feedback from data curators and algorithms for identifying and merging duplicates, to obtain an output disambiguated graph.Originality/valueTo our knowledge GDup is the only system in the literature that offers an integrated and general-purpose solution for the deduplication graphs, while targeting big data scalability issues. GDup is today one of the key modules of the OpenAIRE infrastructure production system, which monitors Open Science trends on behalf of the European Commission, National funders and institutions.


2017 ◽  
Author(s):  
Michael P. Milham ◽  
R. Cameron Craddock ◽  
Arno Klein

AbstractDespite decades of research, visions of transforming neuropsychiatry through the development of brain imaging-based ‘growth charts’ or ‘lab tests’ have remained out of reach. In recent years, there is renewed enthusiasm about the prospect of achieving clinically useful tools capable of aiding the diagnosis and management of neuropsychiatric disorders. The present work explores the basis for this enthusiasm. We assert that there is no single advance that currently has the potential to drive the field of clinical brain imaging forward. Instead, there has been a constellation of advances that, if combined, could lead to the identification of objective brain imaging-based markers of illness. In particular, we focus on advances that are helping to: 1) elucidate the research agenda for biological psychiatry (e.g., neuroscience focus, precision medicine), 2) shift research models for clinical brain imaging (e.g., big data exploration, standardization), 3) break down research silos (e.g., open science, calls for reproducibility and transparency), and 4) improve imaging technologies and methods. While an arduous road remains ahead, these advances are repositioning the brain imaging community for long-term success.


Author(s):  
Mercè Crosas ◽  
Gary King ◽  
James Honaker ◽  
Latanya Sweeney

The vast majority of social science research uses small (megabyte- or gigabyte-scale) datasets. These fixed-scale datasets are commonly downloaded to the researcher’s computer where the analysis is performed. The data can be shared, archived, and cited with well-established technologies, such as the Dataverse Project, to support the published results. The trend toward big data—including large-scale streaming data—is starting to transform research and has the potential to impact policymaking as well as our understanding of the social, economic, and political problems that affect human societies. However, big data research poses new challenges to the execution of the analysis, archiving and reuse of the data, and reproduction of the results. Downloading these datasets to a researcher’s computer is impractical, leading to analyses taking place in the cloud, and requiring unusual expertise, collaboration, and tool development. The increased amount of information in these large datasets is an advantage, but at the same time it poses an increased risk of revealing personally identifiable sensitive information. In this article, we discuss solutions to these new challenges so that the social sciences can realize the potential of big data.


Author(s):  
Aïda Bafeta ◽  
Jason Bobe ◽  
Jon Clucas ◽  
Pattie Pramila Gonsalves ◽  
Célya Gruson-Daniel ◽  
...  

We are witnessing a dramatic transformation in the way we do science. In recent years, significant flaws with existing scientific methods have come to light, including lack of transparency, insufficient involvement of stakeholders, disconnection from the public, and limited reproducibility of research findings. These concerns have sparked a global movement to revolutionize scientific practice and the emergence of Open Science. This new approach to science extends principles of openness to the entire research cycle, from hypothesis generation to data collection, analysis, replication, and translation from research to practice. Open Science seeks to remove all barriers to conducting high quality, rigorous, and impactful scientific research by ensuring that the data, methods, and opportunities for collaboration are open to all. Emerging digital technologies and "big data" (see "Ten simple rules for responsible big data research") have further accelerated the Open Science movement by affording new approaches to data sharing, connecting researcher networks, and facilitating the dissemination of research findings. Open scientific practices are also having a profound impact on the health sciences and medical research, and specifically how we conduct clinical research with human participants. Human health research necessitates careful considerations for practicing science in an ethical manner. There is also a particular urgency to human health research since the goal is to help people, so doing good science takes on a different meaning than simply doing science well. It also implores the scientist to reassess the conventional view of human health research as a pursuit conducted by scientists on human subjects, and lays a greater emphasis on inclusive and ethical practices to ensure that the research takes into account the interests of those who would be most impacted by the research. Openness in the context of human health research also raises greater concerns about privacy and security and presents more opportunities for people, including participants of research studies, to contribute in every capacity. At the core of open health research, scientific discoveries are not only the product of collaboration across disciplines, but must also be owned by the community that is inclusive of researchers, health workers, and patients and their families. To guide successful open health research practices, it is essential to carefully consider and delineate its guiding principles. This editorial is aimed at individuals participating in health science in any capacity, including but not limited to people living with medical conditions, health professionals, study participants, and researchers spanning all types of disciplines. We present ten simple rules that, while not comprehensive, offer guidance for conducting health research with human participants in an open, ethical, and rigorous manner. These rules can be difficult, resource-intensive, and can conflict with one another. They are aspirational and are intended to accelerate and improve the quality of human health research. Work that fails to follow these rules is not necessarily an indication of poor quality research, especially if the reasons for breaking the rules are considered and articulated (see rule 6: document everything). While most of the responsibility of following these rules falls on researchers, anyone involved in human health research in any capacity can apply them.


2014 ◽  
Vol 10 (2) ◽  
Author(s):  
Robin Mansell

RESUMO Este trabalho examina o potencial para colaboração entre profissionais da ciência formal e grupos frouxamente conectados online que empregam crowdsourcing para gerar recursos de informação digital.  Quais são as diferenças entre os modos preferidos de governar a criação do conhecimento de cientistas e de outros grupos online? Faz-se uma distinção entre modos de governança constituídos e adaptativos, e as similaridades e diferenças entre o entendimento dos dois grupos a respeito da curadoria, verificação e abertura da informação são consideradas.  Sugere-se que a ciência aberta precisará tornar-se mais flexível, se for para construir colaborações com grupos frouxamente conectados em termos equitativos, respeitando seus respectivos valores e de modos que maximizem suas contribuições para a solução de problemas sociais.Palavras-chave: Ciência Aberta; Crowdsourcing; Informação Digital; Big Data. Governança; Autoridade; Curadoria.ABSTRACT This paper examines the potential for collaboration between formal science professionals and loosely connected online groups that employ crowdsourcing to generate digital information resources.  What are the differences between scientists’ and other online groups’ preferred modes of governing knowledge creation? A distinction is drawn between constituted and adaptive modes of governance and similarities and differences between the two groups’ understandings of information curation and verification and openness are considered. It is suggested that open science will need to become more flexible if it is to build collaborations with loosely connected groups on equitable terms that respect their respective values and in ways that maximise the contributions of these groups to social problem solving.Keywords: Open Science; Crowdsourcing; Digital Information; Big Data; Governance; Authority; Curation.


2017 ◽  
Vol 16 (02) ◽  
pp. C01 ◽  
Author(s):  
Nico Pitrelli

Computational social science represents an interdisciplinary approach to the study of reality based on advanced computer tools. From economics to political science, from journalism to sociology, digital approaches and techniques for the analysis and management of large quantities of data have now been adopted in several disciplines. The papers in this JCOM commentary focus on the use of such approaches and techniques in the research on science communication. As the papers point out, the most significant advantages of a computational approach in this sector include the chance to open up a range of new research opportunities: from the study of technical and scientific controversies to citizen science, from the definition of new norms and practices for science journalism to open science issues. On the other hand, difficulties are shared with other areas of application. The main risk is that the large quantity of data available can overwhelm the importance of theory. Instead, as the papers in this commentary demonstrate, big data should push scientists to pursue a deeper epistemological and methodological reflection also in the research on science communication.


Patterns ◽  
2021 ◽  
Vol 2 (10) ◽  
pp. 100347
Author(s):  
Natalia Norori ◽  
Qiyang Hu ◽  
Florence Marcelle Aellen ◽  
Francesca Dalia Faraci ◽  
Athina Tzovara
Keyword(s):  
Big Data ◽  

Sign in / Sign up

Export Citation Format

Share Document