From single nuclei to whole genome assemblies

Mapping Intimacies ◽

10.1101/625814 ◽

2019 ◽

Cited By ~ 3

Author(s):

Merce Montoliu-Nerin ◽

Marisol Sánchez-García ◽

Claudia Bergin ◽

Manfred Grabherr ◽

Barbara Ellis ◽

...

Keyword(s):

Single Cell ◽

Large Scale ◽

Genomic Data ◽

Life Cycles ◽

Genomic Research ◽

Metagenomic Data ◽

Model Organisms ◽

Genomic Study ◽

And Function ◽

Genome Assemblies

SummaryA large proportion of Earth's biodiversity constitutes organisms that cannot be cultured, have cryptic life-cycles and/or live submerged within their substrates1–4. Genomic data are key to unravel both their identity and function5. The development of metagenomic methods6,7 and the advent of single cell sequencing8–10 have revolutionized the study of life and function of cryptic organisms by upending the need for large and pure biological material, and allowing generation of genomic data from complex or limited environmental samples. Genome assemblies from metagenomic data have so far been restricted to organisms with small genomes, such as bacteria11, archaea12 and certain eukaryotes13. On the other hand, single cell technologies have allowed the targeting of unicellular organisms, attaining a better resolution than metagenomics8,9,14–16, moreover, it has allowed the genomic study of cells from complex organisms one cell at a time17,18. However, single cell genomics are not easily applied to multicellular organisms formed by consortia of diverse taxa, and the generation of specific workflows for sequencing and data analysis is needed to expand genomic research to the entire tree of life, including sponges19, lichens3,20, intracellular parasites21,22, and plant endophytes23,24. Among the most important plant endophytes are the obligate mutualistic symbionts, arbuscular mycorrhizal (AM) fungi, that pose an additional challenge with their multinucleate coenocytic mycelia25. Here, the development of a novel single nuclei sequencing and assembly workflow is reported. This workflow allows, for the first time, the generation of reference genome assemblies from large scale, unbiased sorted, and sequenced AM fungal nuclei circumventing tedious, and often impossible, culturing efforts. This method opens infinite possibilities for studies of evolution and adaptation in these important plant symbionts and demonstrates that reference genomes can be generated from complex non-model organisms by isolating only a handful of their nuclei.

Download Full-text

MIC-Drop: A platform for large-scale in vivo CRISPR screens

Science ◽

10.1126/science.abi8870 ◽

2021 ◽

pp. eabi8870

Author(s):

Saba Parvez ◽

Chelsea Herdman ◽

Manu Beerens ◽

Korak Chakraborti ◽

Zachary P. Harmer ◽

...

Keyword(s):

Large Scale ◽

Cultured Cells ◽

Cardiac Development ◽

Droplet Microfluidics ◽

Model Organisms ◽

Genetic Screens ◽

Large Numbers ◽

And Function ◽

Genome Scale

CRISPR-Cas9 can be scaled up for large-scale screens in cultured cells, but CRISPR screens in animals have been challenging because generating, validating, and keeping track of large numbers of mutant animals is prohibitive. Here, we report Multiplexed Intermixed CRISPR Droplets (MIC-Drop), a platform combining droplet microfluidics, single-needle en masse CRISPR ribonucleoprotein injections, and DNA barcoding to enable large-scale functional genetic screens in zebrafish. The platform can efficiently identify genes responsible for morphological or behavioral phenotypes. In one application, we show MIC-Drop can identify small molecule targets. Furthermore, in a MIC-Drop screen of 188 poorly characterized genes, we discover several genes important for cardiac development and function. With the potential to scale to thousands of genes, MIC-Drop enables genome-scale reverse-genetic screens in model organisms.

Download Full-text

iProteinDB: an integrative database of Drosophila post-translational modifications

10.1101/386268 ◽

2018 ◽

Cited By ~ 2

Author(s):

Yanhui Hu ◽

Richelle Sopko ◽

Verena Chung ◽

Romain A. Studer ◽

Sean D. Landry ◽

...

Keyword(s):

Protein Interactions ◽

Protein Function ◽

Large Scale ◽

Model Organisms ◽

General Strategy ◽

Post Translational Modification ◽

Post Translational Modifications ◽

Functional Sites ◽

Evolutionarily Conserved ◽

And Function

AbstractPost-translational modification (PTM) serves as a regulatory mechanism for protein function, influencing stability, protein interactions, activity and localization, and is critical in many signaling pathways. The best characterized PTM is phosphorylation, whereby a phosphate is added to an acceptor residue, commonly serine, threonine and tyrosine. As proteins are often phosphorylated at multiple sites, identifying those sites that are important for function is a challenging problem. Considering that many phosphorylation sites may be non-functional, prioritizing evolutionarily conserved phosphosites provides a general strategy to identify the putative functional sites with regards to regulation and function. To facilitate the identification of conserved phosphosites, we generated a large-scale phosphoproteomics dataset from Drosophila embryos collected from six closely-related species. We built iProteinDB (https://www.flyrnai.org/tools/iproteindb/), a resource integrating these data with other high-throughput PTM datasets, including vertebrates, and manually curated information for Drosophila. At iProteinDB, scientists can view the PTM landscape for any Drosophila protein and identify predicted functional phosphosites based on a comparative analysis of data from closely-related Drosophila species. Further, iProteinDB enables comparison of PTM data from Drosophila to that of orthologous proteins from other model organisms, including human, mouse, rat, Xenopus laevis, Danio rerio, and Caenorhabditis elegans.

Download Full-text

Prediction of condition-specific regulatory genes using machine learning

Nucleic Acids Research ◽

10.1093/nar/gkaa264 ◽

2020 ◽

Vol 48 (11) ◽

pp. e62-e62 ◽

Cited By ~ 2

Author(s):

Qi Song ◽

Jiyoung Lee ◽

Shamima Akter ◽

Matthew Rogers ◽

Ruth Grene ◽

...

Keyword(s):

Machine Learning ◽

Transcription Factors ◽

Single Cell ◽

Control Cell ◽

Genomic Data ◽

Regulatory Genes ◽

Genomic Research ◽

Open Chromatin ◽

Data Set ◽

Better Than

Abstract Recent advances in genomic technologies have generated data on large-scale protein–DNA interactions and open chromatin regions for many eukaryotic species. How to identify condition-specific functions of transcription factors using these data has become a major challenge in genomic research. To solve this problem, we have developed a method called ConSReg, which provides a novel approach to integrate regulatory genomic data into predictive machine learning models of key regulatory genes. Using Arabidopsis as a model system, we tested our approach to identify regulatory genes in data sets from single cell gene expression and from abiotic stress treatments. Our results showed that ConSReg accurately predicted transcription factors that regulate differentially expressed genes with an average auROC of 0.84, which is 23.5–25% better than enrichment-based approaches. To further validate the performance of ConSReg, we analyzed an independent data set related to plant nitrogen responses. ConSReg provided better rankings of the correct transcription factors in 61.7% of cases, which is three times better than other plant tools. We applied ConSReg to Arabidopsis single cell RNA-seq data, successfully identifying candidate regulatory genes that control cell wall formation. Our methods provide a new approach to define candidate regulatory genes using integrated genomic data in plants.

Download Full-text

Transcriptome Analysis in Domesticated Species: Challenges and Strategies

Bioinformatics and Biology Insights ◽

10.4137/bbi.s29334 ◽

2015 ◽

Vol 9S4 ◽

pp. BBI.S29334 ◽

Cited By ~ 4

Author(s):

Jessica P. Hekman ◽

Jennifer L Johnson ◽

Anna V. Kukekova

Keyword(s):

Complex Traits ◽

Gene Networks ◽

Association Studies ◽

Cultural Value ◽

Genomic Research ◽

Model Organisms ◽

Genome Wide Association Studies ◽

Rna Seq ◽

Genome Wide ◽

Genome Assemblies

Domesticated species occupy a special place in the human world due to their economic and cultural value. In the era of genomic research, domesticated species provide unique advantages for investigation of diseases and complex phenotypes. RNA sequencing, or RNA-seq, has recently emerged as a new approach for studying transcriptional activity of the whole genome, changing the focus from individual genes to gene networks. RNA-seq analysis in domesticated species may complement genome-wide association studies of complex traits with economic importance or direct relevance to biomedical research. However, RNA-seq studies are more challenging in domesticated species than in model organisms. These challenges are at least in part associated with the lack of quality genome assemblies for some domesticated species and the absence of genome assemblies for others. In this review, we discuss strategies for analyzing RNA-seq data, focusing particularly on questions and examples relevant to domesticated species.

Download Full-text

Construction of whole genomes from scaffolds using single cell strand-seq data

10.1101/271510 ◽

2018 ◽

Cited By ~ 4

Author(s):

Mark Hills ◽

Ester Falconer ◽

Kieran O’Neil ◽

Ashley D. Sanders ◽

Kerstin Howe ◽

...

Keyword(s):

Single Cell ◽

Sequence Data ◽

Model Organisms ◽

Tasmanian Devil ◽

Template Strand ◽

Modern Molecular Biology ◽

Whole Genomes ◽

Dna Strand ◽

Genome Assemblies

Accurate reference genome sequences provide the foundation for modern molecular biology and genomics as the interpretation of sequence data to study evolution, gene expression and epigenetics depends heavily on the quality of the genome assembly used for its alignment. Correctly organising sequenced fragments such as contigs and scaffolds in relation to each other is a critical and often challenging step in the construction of robust genome references. We previously identified misoriented regions in the mouse and human reference assemblies using Strand-seq, a single cell sequencing technique that preserves DNA directionality1, 2. Here we demonstrate the ability of Strand-seq to build and correct full-length chromosomes, by identifying which scaffolds belong to the same chromosome and determining their correct order and orientation, without the need for overlapping sequences. We demonstrate that Strand-seq exquisitely maps assembly fragments into large related groups and chromosome-sized clusters without using new assembly data. Using template strand inheritance as a bi-allelic marker, we employ genetic mapping principles to cluster scaffolds that are derived from the same chromosome and order them within the chromosome based solely on directionality of DNA strand inheritance. We prove the utility of our approach by generating improved genome assemblies for several model organisms including the ferret, pig, Xenopus, zebrafish, Tasmanian devil and the Guinea pig.

Download Full-text

Building de novo reference genome assemblies of complex eukaryotic microorganisms from single nuclei

Scientific Reports ◽

10.1038/s41598-020-58025-3 ◽

2020 ◽

Vol 10 (1) ◽

Cited By ~ 3

Author(s):

Merce Montoliu-Nerin ◽

Marisol Sánchez-García ◽

Claudia Bergin ◽

Manfred Grabherr ◽

Barbara Ellis ◽

...

Keyword(s):

Large Scale ◽

Method Development ◽

De Novo ◽

Sequence Data ◽

Arbuscular Mycorrhizal ◽

Am Fungi ◽

Life Cycles ◽

Suitable Model ◽

Eukaryotic Microorganisms ◽

Genome Assemblies

AbstractThe advent of novel sequencing techniques has unraveled a tremendous diversity on Earth. Genomic data allow us to understand ecology and function of organisms that we would not otherwise know existed. However, major methodological challenges remain, in particular for multicellular organisms with large genomes. Arbuscular mycorrhizal (AM) fungi are important plant symbionts with cryptic and complex multicellular life cycles, thus representing a suitable model system for method development. Here, we report a novel method for large scale, unbiased nuclear sorting, sequencing, and de novo assembling of AM fungal genomes. After comparative analyses of three assembly workflows we discuss how sequence data from single nuclei can best be used for different downstream analyses such as phylogenomics and comparative genomics of single nuclei. Based on analysis of completeness, we conclude that comprehensive de novo genome assemblies can be produced from six to seven nuclei. The method is highly applicable for a broad range of taxa, and will greatly improve our ability to study multicellular eukaryotes with complex life cycles.

Download Full-text

Construction of Whole Genomes from Scaffolds Using Single Cell Strand-Seq Data

International Journal of Molecular Sciences ◽

10.3390/ijms22073617 ◽

2021 ◽

Vol 22 (7) ◽

pp. 3617

Author(s):

Mark Hills ◽

Ester Falconer ◽

Kieran O’Neill ◽

Ashley D. Sanders ◽

Kerstin Howe ◽

...

Keyword(s):

Single Cell ◽

Sequence Data ◽

Model Organisms ◽

Tasmanian Devil ◽

Template Strand ◽

Modern Molecular Biology ◽

Whole Genomes ◽

Dna Strand ◽

Genome Assemblies

Accurate reference genome sequences provide the foundation for modern molecular biology and genomics as the interpretation of sequence data to study evolution, gene expression, and epigenetics depends heavily on the quality of the genome assembly used for its alignment. Correctly organising sequenced fragments such as contigs and scaffolds in relation to each other is a critical and often challenging step in the construction of robust genome references. We previously identified misoriented regions in the mouse and human reference assemblies using Strand-seq, a single cell sequencing technique that preserves DNA directionality Here we demonstrate the ability of Strand-seq to build and correct full-length chromosomes by identifying which scaffolds belong to the same chromosome and determining their correct order and orientation, without the need for overlapping sequences. We demonstrate that Strand-seq exquisitely maps assembly fragments into large related groups and chromosome-sized clusters without using new assembly data. Using template strand inheritance as a bi-allelic marker, we employ genetic mapping principles to cluster scaffolds that are derived from the same chromosome and order them within the chromosome based solely on directionality of DNA strand inheritance. We prove the utility of our approach by generating improved genome assemblies for several model organisms including the ferret, pig, Xenopus, zebrafish, Tasmanian devil and the Guinea pig.

Download Full-text

A Chromosome-Scale Assembly of the En ormous (32 Gb) Axolotl Genome

10.1101/373548 ◽

2018 ◽

Cited By ~ 3

Author(s):

Jeramiah J. Smith ◽

Nataliya Timoshevskaya ◽

Vladimir A. Timoshevskiy ◽

Melissa C. Keinath ◽

Drew Hardy ◽

...

Keyword(s):

Large Scale ◽

Genome Structure ◽

Large Deletion ◽

Ambystoma Mexicanum ◽

Biological Research ◽

Genome Wide ◽

Genetic Stocks ◽

Gene Structures ◽

And Function ◽

Genome Assemblies

ABSTRACTThe axolotl (Ambystoma mexicanum) provides critical models for studying regeneration, evolution and development. However, its large genome (~32 gigabases) presents a formidable barrier to genetic analyses. Recent efforts have yielded genome assemblies consisting of thousands of unordered scaffolds that resolve gene structures, but do not yet permit large scale analyses of genome structure and function. We adapted an established mapping approach to leverage dense SNP typing information and for the first time assemble the axolotl genome into 14 chromosomes. Moreover, we used fluorescence in situ hybridization to verify the structure of these 14 scaffolds and assign each to its corresponding physical chromosome. This new assembly covers 27.3 gigabases and encompasses 94% of annotated gene models on chromosomal scaffolds. We show the assembly’s utility by resolving genome-wide orthologies between the axolotl and other vertebrates, identifying the footprints of historical introgression events that occurred during the development of axolotl genetic stocks, and precisely mapping several phenotypes including a large deletion underlying the cardiac mutant. This chromosome-scale assembly will greatly facilitate studies of the axolotl in biological research.

Download Full-text

Stable Soil Microbial Functional Structure Responding to Biodiversity Loss Based on Metagenomic Evidences

Frontiers in Microbiology ◽

10.3389/fmicb.2021.716764 ◽

2021 ◽

Vol 12 ◽

Author(s):

Huaihai Chen ◽

Kayan Ma ◽

Yu Huang ◽

Zhiyuan Yao ◽

Chengjin Chu

Keyword(s):

Large Scale ◽

Global Climate ◽

Biodiversity Loss ◽

Functional Structure ◽

Metagenomic Data ◽

Ecosystem Functions ◽

Soil Microbial ◽

Microbial Species ◽

The Stability ◽

And Function

Anthropogenic disturbances and global climate change are causing large-scale biodiversity loss and threatening ecosystem functions. However, due to the lack of knowledge on microbial species loss, our understanding on how functional profiles of soil microbes respond to diversity decline is still limited. Here, we evaluated the biotic homogenization of global soil metagenomic data to examine whether microbial functional structure is resilient to significant diversity reduction. Our results showed that although biodiversity loss caused a decrease in taxonomic species by 72%, the changes in the relative abundance of diverse functional categories were limited. The stability of functional structures associated with microbial species richness decline in terrestrial systems suggests a decoupling of taxonomy and function. The changes in functional profile with biodiversity loss were function-specific, with broad-scale metabolism functions decreasing and typical nutrient-cycling functions increasing. Our results imply high levels of microbial physiological versatility in the face of significant biodiversity decline, which, however, does not necessarily mean that a loss in total functional abundance, such as microbial activity, can be overlooked in the background of unprecedented species extinction.

Download Full-text

Computational methods for the integrative analysis of single-cell data

Briefings in Bioinformatics ◽

10.1093/bib/bbaa042 ◽

2020 ◽

Cited By ~ 2

Author(s):

Mattia Forcato ◽

Oriana Romano ◽

Silvio Bicciato

Keyword(s):

Single Cell ◽

Computational Methods ◽

Genomic Data ◽

Integrative Analysis ◽

Joint Analysis ◽

New Wave ◽

Multimodal Signals ◽

And Function ◽

Genomic Signals ◽

Molecular Layers

Abstract Recent advances in single-cell technologies are providing exciting opportunities for dissecting tissue heterogeneity and investigating cell identity, fate and function. This is a pristine, exploding field that is flooding biologists with a new wave of data, each with its own specificities in terms of complexity and information content. The integrative analysis of genomic data, collected at different molecular layers from diverse cell populations, holds promise to address the full-scale complexity of biological systems. However, the combination of different single-cell genomic signals is computationally challenging, as these data are intrinsically heterogeneous for experimental, technical and biological reasons. Here, we describe the computational methods for the integrative analysis of single-cell genomic data, with a focus on the integration of single-cell RNA sequencing datasets and on the joint analysis of multimodal signals from individual cells.

Download Full-text