scholarly journals The Rise of “Digital Biology”: We need not only open, FAIR but also sustainable data!

Author(s):  
Philippe Grandcolas

Biology has already experienced great divides that decreased its global coherence and its ability to answer important scientific and societal concerns. For example in the XXth century, the so-called “Life Sciences” developed remarkably in comparison to Natural History sciences. This way, the approaches on model organisms dominated or prevented other approaches from being carried out on more diverse organisms, which may have given a misleading feeling of generality for the results obtained. Another great divide is at risk of developing now with the rise of what could be called “Digital Biology,” separating from other “material-based” approaches in its tendency to consider digital data only. Some biologists adopt a somewhat essentialist view of species and DNA, considering that enough knowledge is now accumulated, and that species records can be kept and saved as digital data only (Grandcolas 2017). Examples of this include occurrence records without specimens or auxiliary documents, taxonomic descriptions based on photographs, DNA sequences without vouchers, and, lastly, DNA sequences without taxonomic names. This tendency puts at risk the sustainability, growth, and coherence of biological knowledge that is organized in a system wherein all data and notions are connected via specimens, with names and sequences being a retrieval means (Troudet et al. 2018). This tendency also ignores the robust foundation of biology, the data of which are linked to collections, vouchers, and stocks. The foundation of physical specimens exists for data concerning any live beings, be they rare wild species or selected lines of model organisms. There are now many calls for open and FAIR science, with results, methods, tools, and data not only findable, accessible, and interoperable but also re-usable. More than FAIR and digitally re-usable, data need to be sustainable. It is needed that their meaning and significance can be re-analysed, re-interpreted by going back as far as possible to material vouchers. We urge then scientists to consider this question by providing all necessary material elements to make open and FAIR data sustainable as well.

2019 ◽  
Vol 15 (01) ◽  
pp. 1-8
Author(s):  
Ashish C Patel ◽  
C G Joshi

Current data storage technologies cannot keep pace longer with exponentially growing amounts of data through the extensive use of social networking photos and media, etc. The "digital world” with 4.4 zettabytes in 2013 has predicted it to reach 44 zettabytes by 2020. From the past 30 years, scientists and researchers have been trying to develop a robust way of storing data on a medium which is dense and ever-lasting and found DNA as the most promising storage medium. Unlike existing storage devices, DNA requires no maintenance, except the need to store at a cool and dark place. DNA has a small size with high density; just 1 gram of dry DNA can store about 455 exabytes of data. DNA stores the informations using four bases, viz., A, T, G, and C, while CDs, hard disks and other devices stores the information using 0’s and 1’s on the spiral tracks. In the DNA based storage, after binarization of digital file into the binary codes, encoding and decoding are important steps in DNA based storage system. Once the digital file is encoded, the next step is to synthesize arbitrary single-strand DNA sequences and that can be stored in the deep freeze until use.When there is a need for information to be recovered, it can be done using DNA sequencing. New generation sequencing (NGS) capable of producing sequences with very high throughput at a much lower cost about less than 0.1 USD for one MB of data than the first sequencing technologies. Post-sequencing processing includes alignment of all reads using multiple sequence alignment (MSA) algorithms to obtain different consensus sequences. The consensus sequence is decoded as the reversal of the encoding process. Most prior DNA data storage efforts sequenced and decoded the entire amount of stored digital information with no random access, but nowadays it has become possible to extract selective files (e.g., retrieving only required image from a collection) from a DNA pool using PCR-based random access. Various scientists successfully stored up to 110 zettabytes data in one gram of DNA. In the future, with an efficient encoding, error corrections, cheaper DNA synthesis,and sequencing, DNA based storage will become a practical solution for storage of exponentially growing digital data.


2021 ◽  
Author(s):  
Pragya Topal ◽  
Divita Garg ◽  
Rajendra S. Fartyal

As drosophilids are versatile, low maintenance and non-harming model organisms, they can be easily used in all fields of life sciences like Genetics, Biotechnology, Cancer biology, Genomics, Reproductive biology, Developmental biology, Micro chemical studies, ecology and much more. For using such a model organism, we need to learn capturing, rearing and culturing their progeny along with basic identification and differentiation between males and females. This chapter is being emphasized on techniques of capturing these flies with different and effective techniques. Along with it, most species-specific baits are discussed to catch more yield. Culture food media, a set measurement of different ingredients is used to rear the collected sample. The reasons for using each ingredient are also discussed in this chapter. At last, this chapter highlights the basic clues to identify different species in the field and lab along with learning distinguishing characteristics of males and females easily and effectively.


2021 ◽  
Author(s):  
Manuela Mejía Estrada ◽  
Luz Fernanda Jiménez-Segura ◽  
Iván Soto Calderón

The Barcoding was proposed motivated by the mismatch between the low number of taxonomists that contrasts with the large number of species, the method requires the construction of reference collections of DNA sequences that represent existing biodiversity. Freshwater fishes are key indicators for understanding biogeography around the world. Colombia with 1610 species of freshwater fishes is the second richest country in the world in this group. However, genetic information of the species continues to be limited, the contribution to a reference library of DNA barcodes for Colombian freshwater fishes highlights the importance of biological collections and seeks to strengthen inventories and taxonomy of such collections in future studies. This dataset contributes to the knowledge on the DNA barcodes and occurrence records of 96 species of Freshwater fishes from Colombia. The species represented in this dataset correspond to an addition to BOLD public databases of 39 species. Forty-nine specimens were collected in Atrato bassin and 708 in Magdalena-Cauca bassin during the period of 2010 to 2020, two species (Loricariichthys brunneus and Poecilia sphenops) are considered exotic to the Atrato, Cauca and Magdalena basins and four species (Oncorhynchu mykiss, Oreochromis niloticus, Parachromis friedrichsthalii and Xiphophorus helleri) are exotic to Colombian hydrogeographic regions. All specimens are deposited in the CIUA collection at University of Antioquia and have their DNA barcodes made publicly available in the Barcode of Life Data System (BOLD) online database and the distribution dataset can be freely accessed through the Global Biodiversity Information Facility (GBIF).


2008 ◽  
Vol 2 (2) ◽  
pp. 105-110
Author(s):  
Graham Pryor

The Digital Curation Centre’s promotion of expertise and good practice in digital data curation is no mere exercise in theory. Through its new eScience Liaison initiative the DCC has kept a close eye on its founding principle, that the necessity for the physical and life sciences to share access to digital research resources is due mainly to issues characteristic of eScience. This article describes some of the principal liaison activities that have been addressed within that community since the summer of 2007.


2019 ◽  
Author(s):  
Melanie E. F. LaCava ◽  
Ellen O. Aikens ◽  
Libby C. Megna ◽  
Gregg Randolph ◽  
Charley Hubbard ◽  
...  

AbstractAdvances in DNA sequencing have made it feasible to gather genomic data for non-model organisms and large sets of individuals, often using methods for sequencing subsets of the genome. Several of these methods sequence DNA associated with endonuclease restriction sites (various RAD and GBS methods). For use in taxa without a reference genome, these methods rely on de novo assembly of fragments in the sequencing library. Many of the software options available for this application were originally developed for other assembly types and we do not know their accuracy for reduced representation libraries. To address this important knowledge gap, we simulated data from the Arabidopsis thaliana and Homo sapiens genomes and compared de novo assemblies by six software programs that are commonly used or promising for this purpose (ABySS, CD-HIT, Stacks, Stacks2, Velvet and VSEARCH). We simulated different mutation rates and types of mutations, and then applied the six assemblers to the simulated datasets, varying assembly parameters. We found substantial variation in software performance across simulations and parameter settings. ABySS failed to recover any true genome fragments, and Velvet and VSEARCH performed poorly for most simulations. Stacks and Stacks2 produced accurate assemblies of simulations containing SNPs, but the addition of insertion and deletion mutations decreased their performance. CD-HIT was the only assembler that consistently recovered a high proportion of true genome fragments. Here, we demonstrate the substantial difference in the accuracy of assemblies from different software programs and the importance of comparing assemblies that result from different parameter settings.


2020 ◽  
Author(s):  
Stefan Hans ◽  
Daniela Zöller ◽  
Juliane Hammer ◽  
Johanna Stucke ◽  
Sandra Spieß ◽  
...  

Abstract Conditional gene inactivation is a powerful tool to determine gene function when constitutive mutations result in detrimental effects. The most commonly used technique to achieve conditional gene inactivation employs the Cre/loxP system and its ability to delete DNA sequences flanked by two loxP sites. However, targeting critical exons or an entire gene with two loxP sites is time and labor consuming. To circumvent these issues, we developed Cre-Controlled CRISPR (3C) mutagenesis. 3C mutagenesis is simple, fast and allows gene inactivation in a Cre-dependent manner. In contrast to loxP-flanked alleles, the recombined cells become fluorescently visible enabling the isolation of these cells and their subjection to various omics techniques. Moreover, 3C will be scalable and will enable the conditional inactivation of multiple genes simultaneously. Hence, 3C mutagenesis provides a valuable alternative to the production of loxP-flanked alleles and should be applicable to all model organisms amenable to single integration transgenesis.


2021 ◽  
Vol 13 ◽  
Author(s):  
Daria Laptinskaya ◽  
Olivia Caroline Küster ◽  
Patrick Fissler ◽  
Franka Thurm ◽  
Christine A. F. Von Arnim ◽  
...  

An active lifestyle as well as cognitive and physical training (PT) may benefit cognition by increasing cognitive reserve, but the underlying neurobiological mechanisms of this reserve capacity are not well understood. To investigate these mechanisms of cognitive reserve, we focused on electrophysiological correlates of cognitive performance, namely on an event-related measure of auditory memory and on a measure of global coherence. Both measures have shown to be sensitive markers for cognition and might therefore be suitable to investigate potential training- and lifestyle-related changes. Here, we report on the results of an electrophysiological sub-study that correspond to previously published behavioral findings. Altogether, 65 older adults with subjective or objective cognitive impairment and aged 60–88 years were assigned to a 10-week cognitive (n = 19) or a 10-week PT (n = 21) or to a passive control group (n = 25). In addition, self-reported lifestyle was assessed at baseline. We did not find an effect of both training groups on electroencephalography (EEG) measures of auditory memory decay or global coherence (ps ≥ 0.29) and a more active lifestyle was not associated with improved global coherence (p = 0.38). Results suggest that a 10-week unimodal cognitive or PT and an active lifestyle in older adults at risk for dementia are not strongly related to improvements in electrophysiological correlates of cognition.


2015 ◽  
Vol 9 ◽  
pp. BBI.S12467 ◽  
Author(s):  
Xiaoxi Dong ◽  
Anatoly Yambartsev ◽  
Stephen A. Ramsey ◽  
Lina D Thomas ◽  
Natalia Shulzhenko ◽  
...  

Omics technologies enable unbiased investigation of biological systems through massively parallel sequence acquisition or molecular measurements, bringing the life sciences into the era of Big Data. A central challenge posed by such omics datasets is how to transform these data into biological knowledge, for example, how to use these data to answer questions such as: Which functional pathways are involved in cell differentiation? Which genes should we target to stop cancer? Network analysis is a powerful and general approach to solve this problem consisting of two fundamental stages, network reconstruction, and network interrogation. Here we provide an overview of network analysis including a step-by-step guide on how to perform and use this approach to investigate a biological question. In this guide, we also include the software packages that we and others employ for each of the steps of a network analysis workflow.


Genetics ◽  
2002 ◽  
Vol 162 (2) ◽  
pp. 579-589 ◽  
Author(s):  
Saumitri Bhattacharyya ◽  
Michael L Rolfsmeier ◽  
Michael J Dixon ◽  
Kara Wagoner ◽  
Robert S Lahue

Abstract Trinucleotide repeats (TNRs) undergo frequent mutations in families affected by TNR diseases and in model organisms. Much of the instability is conferred in cis by the sequence and length of the triplet tract. Trans-acting factors also modulate TNR instability risk, on the basis of such evidence as parent-of-origin effects. To help identify trans-acting modifiers, a screen was performed to find yeast mutants with altered CTG·CAG repeat mutation frequencies. The RTG2 gene was identified as one such modifier. In rtg2 mutants, expansions of CTG·CAG repeats show a modest increase in rate, depending on the starting tract length. Surprisingly, contractions were suppressed in an rtg2 background. This creates a situation in a model system where expansions outnumber contractions, as in humans. The rtg2 phenotype was apparently specific for CTG·CAG repeat instability, since no changes in mutation rate were observed for dinucleotide repeats or at the CAN1 reporter gene. This feature sets rtg2 mutants apart from most other mutants that affect genetic stability both for TNRs and at other DNA sequences. It was also found that RTG2 acts independently of its normal partners RTG1 and RTG3, suggesting a novel function of RTG2 that helps modify CTG·CAG repeat mutation risk.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Stefan Hans ◽  
Daniela Zöller ◽  
Juliane Hammer ◽  
Johanna Stucke ◽  
Sandra Spieß ◽  
...  

AbstractConditional gene inactivation is a powerful tool to determine gene function when constitutive mutations result in detrimental effects. The most commonly used technique to achieve conditional gene inactivation employs the Cre/loxP system and its ability to delete DNA sequences flanked by two loxP sites. However, targeting a gene with two loxP sites is time and labor consuming. Here, we show Cre-Controlled CRISPR (3C) mutagenesis to circumvent these issues. 3C relies on gRNA and Cre-dependent Cas9-GFP expression from the same transgene. Exogenous or transgenic supply of Cre results in Cas9-GFP expression and subsequent mutagenesis of the gene of interest. The recombined cells become fluorescently visible enabling their isolation and subjection to various omics techniques. Hence, 3C mutagenesis provides a valuable alternative to the production of loxP-flanked alleles. It might even enable the conditional inactivation of multiple genes simultaneously and should be applicable to other model organisms amenable to single integration transgenesis.


Sign in / Sign up

Export Citation Format

Share Document