scholarly journals Guidelines for a Standardized Filesystem Layout for Scientific Data

Data ◽  
2020 ◽  
Vol 5 (2) ◽  
pp. 43
Author(s):  
Florian Spreckelsen ◽  
Baltasar Rüchardt ◽  
Jan Lebert ◽  
Stefan Luther ◽  
Ulrich Parlitz ◽  
...  

Storing scientific data on the filesystem in a meaningful and transparent way is no trivial task. In particular, when the data have to be accessed after their originator has left the lab, the importance of a standardized filesystem layout cannot be underestimated. It is desirable to have a structure that allows for the unique categorization of all kinds of data from experimental results to publications. They have to be accessible to a broad variety of workflows, e.g., via graphical user interface as well as via command line, in order to find widespread acceptance. Furthermore, the inclusion of already existing data has to be as simple as possible. We propose a three-level layout to organize and store scientific data that incorporates the full chain of scientific data management from data acquisition to analysis to publications. Metadata are saved in a standardized way and connect original data to analyses and publications as well as to their originators. A simple software tool to check a file structure for compliance with the proposed structure is presented.

Author(s):  
Florian Spreckelsen ◽  
Baltasar Rüchardt ◽  
Jan Lebert ◽  
Stefan Luther ◽  
Ulrich Parlitz ◽  
...  

Storing scientific data on the file system in a meaningful and transparent way is no trivial task. In particular when the data have to be accessed after their originator has left the lab the importance of a standardized file structure cannot be underestimated. It is desirable to have a structure that allows for the unique categorization of all kinds of data from experimental results to publications. It has to be accessible to a broad variety of workflows, e.g., via graphical user interface as well as via command line, in order to find widespread acceptance. Furthermore, the inclusion of already existing data has to be as simple as possible. We propose a three-level structure to organize and store scientific data that incorporates the full chain of scientific data management from data acquisition to analysis to publications. Metadata are saved in a standardized way and connect original data to analyses and publication as well as to their originators. A simple software tool to check a file structure for compliance with the proposed structure is presented.


2015 ◽  
Vol 14 ◽  
pp. CIN.S26470 ◽  
Author(s):  
Richard P. Finney ◽  
Qing-Rong Chen ◽  
Cu V. Nguyen ◽  
Chih Hao Hsu ◽  
Chunhua Yan ◽  
...  

The name Alview is a contraction of the term Alignment Viewer. Alview is a compiled to native architecture software tool for visualizing the alignment of sequencing data. Inputs are files of short-read sequences aligned to a reference genome in the SAM/BAM format and files containing reference genome data. Outputs are visualizations of these aligned short reads. Alview is written in portable C with optional graphical user interface (GUI) code written in C, C++, and Objective-C. The application can run in three different ways: as a web server, as a command line tool, or as a native, GUI program. Alview is compatible with Microsoft Windows, Linux, and Apple OS X. It is available as a web demo at https://cgwb.nci.nih.gov/cgi-bin/alview . The source code and Windows/Mac/Linux executables are available via https://github.com/NCIP/alview .


2019 ◽  
Vol 15 (2) ◽  
Author(s):  
Patrícia Rocha Bello Bertin ◽  
Juliana Meireles Fortaleza ◽  
Adriana Cristina Da Silva ◽  
Massayuki Franco Okawachi ◽  
Márcia De Oliveira Cardoso

RESUMO O fenômeno Big Data e o quarto paradigma da ciência – a e-Science – demandam das instituições de ciência e tecnologia um apropriado gerenciamento e preservação dos dados de pesquisa, de modo a possibilitar o acesso, uso e compartilhamento dos dados originais e assim alcançar sustentabilidade e competitividade no sistema científico e tecnológico moderno. O presente trabalho comenta e analisa a Política de Governança de Dados, Informação e Conhecimento da Embrapa, com foco nas questões relacionadas à gestão de dados de pesquisa. Espera-se que essa Política possa ser instrumental para outras organizações do sistema de C&T nacional no desenvolvimento de seus próprios normativos.Palavras-chave: Dados Científicos; Ciência Intensiva em Dados; Acesso; Compartilhamento; Preservação; Gerenciamento.ABSTRACT The Big Data phenomenon and the fourth science paradigm - e-Science - demand from science and technology institutions proper management and preservation of research data, for access, use and sharing of original data and thus achieve sustainability. and competitiveness in the modern scientific and technological system. This paper comments and analyzes Embrapa’s Data Governance, Information and Knowledge Policy, focusing on issues related to scientific data management. It is hoped that this Policy can be instrumental to other organizations in the national S&T system in developing their own standards.Keywords: Scientific Data; Data Intensive Science; Access; Sharing; Preservation; Management.


2018 ◽  
Author(s):  
Christian Trachsel ◽  
Christian Panse ◽  
Tobias Kockmann ◽  
Witold E. Wolski ◽  
Jonas Grossmann ◽  
...  

AbstractOptimizing methods for liquid chromatography coupled to mass spectrometry (LC-MS) is a non-trivial task. Here we present rawDiag, a software tool supporting rational method optimization by providing MS operator-tailored diagnostic plots of scan level metadata. rawDiag is implemented as R package and can be executed on the command line, or through a graphical user interface (GUI) for less experienced users. The code runs platform independent and can process a hundred raw files in less than three minutes on current consumer hardware as we show by our benchmark. In order to demonstrate the functionality of our package, we included a real-world example taken from our daily core facility business.


2017 ◽  
Vol 26 (01) ◽  
pp. 212-213

Agarwal V, Podchiyska T, Banda JM, Goel V, Leung TI, Minty EP, Sweeney TE, Gyang E, Shah NH. Learning statistical models of phenotypes using noisy labeled training data. J Am Med Inform Assoc 2016;23(6):1166-73 https://academic.oup.com/jamia/article-lookup/doi/10.1093/jamia/ocw028 Harmanci A, Gerstein M. Quantification of private information leakage from phenotype-genotype data: linking attacks. Nat Methods 2016;13(3):251-6 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4834871/ Pfiffner PB, Pinyol I, Natter MD, Mandl KD. C3-PRO: Connecting ResearchKit to the Health System Using i2b2 and FHIR. PloS One 2016;11(3):e0152722 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4816293/ Wilkinson MD, Dumontier M, Aalbersberg IJJ, Appleton G, Axton M, Baak A, Blomberg N, Boiten JW, da Silva Santos LB, Bourne PE, Bouwman J, Brookes AJ, Clark T, Crosas M, Dillo I, Dumon O, Edmunds S, Evelo CT, Finkers R, Gonzalez-Beltran A, Gray AJ, Groth P, Goble C, Grethe JS, Heringa J, ‘t Hoen PA, Hooft R, Kuhn T, Kok R, Kok J, Lusher SJ, Martone ME, Mons A, Packer AL, Persson B, Rocca-Serra P, Roos M, van Schaik R, Sansone SA, Schultes E, Sengstag T, Slater T, Strawn G, Swertz MA, Thompson M, van der Lei J, van Mulligen E, Velterop J, Waagmeester A, Wittenburg P, Wolstencroft K, Zhao J, Mons B. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 2016;3:160018 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4792175/ Springer DB, Tarassenko L, Clifford GD. Logistic regression-HSMM-based heart sound segmentation. IEEE Trans Biomed Eng 2016 Apr;63(4):822-32


Author(s):  
Hans-Peter Kriegel ◽  
Peer Kröger ◽  
Christiaan Hendrikus van der Meijden ◽  
Henriette Obermaier ◽  
Joris Peters ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document