Contrasting genome dynamics between domesticated and wild yeasts

Contrasting evolutionary genome dynamics between domesticated and wild yeasts

Nature Genetics ◽

10.1038/ng.3847 ◽

2017 ◽

Vol 49 (6) ◽

pp. 913-924 ◽

Cited By ~ 152

Author(s):

Jia-Xing Yue ◽

Jing Li ◽

Louise Aigrain ◽

Johan Hallin ◽

Karl Persson ◽

...

Keyword(s):

Evolutionary Dynamics ◽

Phenotypic Diversity ◽

Population Level ◽

Structural Rearrangements ◽

Reciprocal Translocations ◽

Genome Dynamics ◽

Long Read ◽

Wild Yeasts ◽

Definition Of ◽

Genome Assemblies

Abstract Structural rearrangements have long been recognized as an important source of genetic variation, with implications in phenotypic diversity and disease, yet their detailed evolutionary dynamics remain elusive. Here we use long-read sequencing to generate end-to-end genome assemblies for 12 strains representing major subpopulations of the partially domesticated yeast Saccharomyces cerevisiae and its wild relative Saccharomyces paradoxus. These population-level high-quality genomes with comprehensive annotation enable precise definition of chromosomal boundaries between cores and subtelomeres and a high-resolution view of evolutionary genome dynamics. In chromosomal cores, S. paradoxus shows faster accumulation of balanced rearrangements (inversions, reciprocal translocations and transpositions), whereas S. cerevisiae accumulates unbalanced rearrangements (novel insertions, deletions and duplications) more rapidly. In subtelomeres, both species show extensive interchromosomal reshuffling, with a higher tempo in S. cerevisiae. Such striking contrasts between wild and domesticated yeasts are likely to reflect the influence of human activities on structural genome evolution.

Download Full-text

Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies

10.1101/2020.03.15.992941 ◽

2020 ◽

Cited By ~ 15

Author(s):

Arang Rhie ◽

Brian P. Walenz ◽

Sergey Koren ◽

Adam M. Phillippy

Keyword(s):

De Novo ◽

High Accuracy ◽

Link Type ◽

Base Level ◽

Project Home Page ◽

Set Operations ◽

Assembly Evaluation ◽

Long Read ◽

Genome Assemblies ◽

Reference Genomes

AbstractRecent long-read assemblies often exceed the quality and completeness of available reference genomes, making validation challenging. Here we present Merqury, a novel tool for reference-free assembly evaluation based on efficient k-mer set operations. By comparing k-mers in a de novo assembly to those found in unassembled high-accuracy reads, Merqury estimates base-level accuracy and completeness. For trios, Merqury can also evaluate haplotype-specific accuracy, completeness, phase block continuity, and switch errors. Multiple visualizations, such as k-mer spectrum plots, can be generated for evaluation. We demonstrate on both human and plant genomes that Merqury is a fast and robust method for assembly validation.Availability of data and materialProject name: MerquryProject home page: https://github.com/marbl/merqury, https://github.com/marbl/merylArchived version: https://github.com/marbl/merqury/releases/tag/v1.0Operating system(s): Platform independentProgramming language: C++, Java, PerlOther requirements: gcc 4.8 or higher, java 1.6 or higherLicense: Public domain (see https://github.com/marbl/merqury/blob/master/README.license) Any restrictions to use by non-academics: No restrictions applied

Download Full-text

Towards complete and error-free genome assemblies of all vertebrate species

Nature ◽

10.1038/s41586-021-03451-0 ◽

2021 ◽

Vol 592 (7856) ◽

pp. 737-746 ◽

Cited By ~ 1

Author(s):

Arang Rhie ◽

Shane A. McCarthy ◽

Olivier Fedrigo ◽

Joana Damas ◽

Giulio Formenti ◽

...

Keyword(s):

Cost Effective ◽

Lessons Learned ◽

Vertebrate Species ◽

High Quality ◽

Protein Coding ◽

Sequencing Technologies ◽

Long Read ◽

Genome Assemblies ◽

Assembly Error ◽

Reference Genomes

AbstractHigh-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species1–4. To address this issue, the international Genome 10K (G10K) consortium5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences.

Download Full-text

Nanopore sequencing and Hi-C scaffolding provide insight into the evolutionary dynamics of transposable elements and piRNA production in wild strains of Drosophila melanogaster

Nucleic Acids Research ◽

10.1093/nar/gkz1080 ◽

2019 ◽

Vol 48 (1) ◽

pp. 290-303 ◽

Cited By ~ 7

Author(s):

Christopher E Ellison ◽

Weihuan Cao

Keyword(s):

Drosophila Melanogaster ◽

Evolutionary Dynamics ◽

De Novo ◽

Population Level ◽

Chromosome Length ◽

Nanopore Sequencing ◽

Long Reads ◽

Wild Strains ◽

Genome Assemblies ◽

Insight Into

Abstract Illumina sequencing has allowed for population-level surveys of transposable element (TE) polymorphism via split alignment approaches, which has provided important insight into the population dynamics of TEs. However, such approaches are not able to identify insertions of uncharacterized TEs, nor can they assemble the full sequence of inserted elements. Here, we use nanopore sequencing and Hi-C scaffolding to produce de novo genome assemblies for two wild strains of Drosophila melanogaster from the Drosophila Genetic Reference Panel (DGRP). Ovarian piRNA populations and Illumina split-read TE insertion profiles have been previously produced for both strains. We find that nanopore sequencing with Hi-C scaffolding produces highly contiguous, chromosome-length scaffolds, and we identify hundreds of TE insertions that were missed by Illumina-based methods, including a novel micropia-like element that has recently invaded the DGRP population. We also find hundreds of piRNA-producing loci that are specific to each strain. Some of these loci are created by strain-specific TE insertions, while others appear to be epigenetically controlled. Our results suggest that Illumina approaches reveal only a portion of the repetitive sequence landscape of eukaryotic genomes and that population-level resequencing using long reads is likely to provide novel insight into the evolutionary dynamics of repetitive elements.

Download Full-text

Towards complete and error-free genome assemblies of all vertebrate species

10.1101/2020.05.22.110833 ◽

2020 ◽

Cited By ~ 16

Author(s):

Arang Rhie ◽

Shane A. McCarthy ◽

Olivier Fedrigo ◽

Joana Damas ◽

Giulio Formenti ◽

...

Keyword(s):

Cost Effective ◽

Lessons Learned ◽

Vertebrate Species ◽

High Quality ◽

New Era ◽

Sequencing Technologies ◽

Long Read ◽

Cartilaginous Fishes ◽

Genome Assemblies ◽

Reference Genomes

AbstractHigh-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are only available for a few non-microbial species1–4. To address this issue, the international Genome 10K (G10K) consortium5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling the most accurate and complete reference genomes to date. Here we summarize these developments, introduce a set of quality standards, and present lessons learned from sequencing and assembling 16 species representing major vertebrate lineages (mammals, birds, reptiles, amphibians, teleost fishes and cartilaginous fishes). We confirm that long-read sequencing technologies are essential for maximizing genome quality and that unresolved complex repeats and haplotype heterozygosity are major sources of error in assemblies. Our new assemblies identify and correct substantial errors in some of the best historical reference genomes. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an effort to generate high-quality, complete reference genomes for all ~70,000 extant vertebrate species and help enable a new era of discovery across the life sciences.

Download Full-text

Trycycler: consensus long-read assemblies for bacterial genomes

10.1101/2021.07.04.451066 ◽

2021 ◽

Author(s):

Ryan R Wick ◽

Louise M Judd ◽

Louise T Cerdeira ◽

Jane Hawkey ◽

Guillaume Meric ◽

...

Keyword(s):

Bacterial Genome ◽

Bacterial Genomes ◽

Manual Intervention ◽

Long Reads ◽

Oxford Nanopore ◽

Multiple Input ◽

Long Read ◽

Complete Bacterial Genome ◽

Genome Assemblies ◽

Reference Genomes

Assembly of bacterial genomes from long-read data (generated by Oxford Nanopore or Pacific Biosciences platforms) can often be complete: a single contig for each chromosome or plasmid in the genome. However, even complete bacterial genome assemblies constructed solely from long reads still contain a variety of errors, and different assemblies of the same genome often contain different errors. Here, we present Trycycler, a tool which produces a consensus assembly from multiple input assemblies of the same genome. Benchmarking using both simulated and real sequencing reads showed that Trycycler consensus assemblies contained fewer errors than any of those constructed with a single long-read assembler. Post-assembly polishing with Medaka and Pilon further reduced errors and yielded the most accurate genome assemblies in our study. As Trycycler can require human judgement and manual intervention, its output is not deterministic, and different users can produce different Trycycler assemblies from the same input data. However, we demonstrated that multiple users with minimal training converge on similar assemblies that are consistently more accurate than those produced by automated assembly tools. We therefore recommend Trycycler+Medaka+Pilon as an ideal approach for generating high-quality bacterial reference genomes.

Download Full-text

Rate, spectrum, and evolutionary dynamics of spontaneous epimutations

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1424254112 ◽

2015 ◽

Vol 112 (21) ◽

pp. 6676-6681 ◽

Cited By ~ 123

Author(s):

Adriaan van der Graaf ◽

René Wardenaar ◽

Drexel A. Neumann ◽

Aaron Taudt ◽

Ruth G. Shaw ◽

...

Keyword(s):

Cytosine Methylation ◽

Evolutionary Dynamics ◽

Phenotypic Diversity ◽

Natural Populations ◽

Population Level ◽

Genomic Context ◽

Term Selection ◽

Comprehensive Picture ◽

Dynamic Interplay

Stochastic changes in cytosine methylation are a source of heritable epigenetic and phenotypic diversity in plants. Using the model plant Arabidopsis thaliana, we derive robust estimates of the rate at which methylation is spontaneously gained (forward epimutation) or lost (backward epimutation) at individual cytosines and construct a comprehensive picture of the epimutation landscape in this species. We demonstrate that the dynamic interplay between forward and backward epimutations is modulated by genomic context and show that subtle contextual differences have profoundly shaped patterns of methylation diversity in A. thaliana natural populations over evolutionary timescales. Theoretical arguments indicate that the epimutation rates reported here are high enough to rapidly uncouple genetic from epigenetic variation, but low enough for new epialleles to sustain long-term selection responses. Our results provide new insights into methylome evolution and its population-level consequences.

Download Full-text

GENOME REPORT: High-quality genome assemblies of 15 Drosophila species generated using Nanopore sequencing

10.1101/267393 ◽

2018 ◽

Cited By ~ 6

Author(s):

Danny E. Miller ◽

Cynthia Staber ◽

Julia Zeitlinger ◽

R. Scott Hawley

Keyword(s):

Single Molecule ◽

Drosophila Species ◽

High Quality ◽

Additional Species ◽

Oxford Nanopore ◽

Wide Range ◽

Long Read ◽

Genome Assemblies ◽

High Quality Genome ◽

Reference Genomes

ABSTRACTThe Drosophila genus is a unique group containing a wide range of species that occupy diverse ecosystems. In addition to the most widely studied species, Drosophila melanogaster, many other members in this genus also possess a well-developed set of genetic tools. Indeed, high-quality genomes exist for several species within the genus, facilitating studies of the function and evolution of cis-regulatory regions and proteins by allowing comparisons across at least 50 million years of evolution. Yet, the available genomes still fail to capture much of the substantial genetic diversity within the Drosophila genus. We have therefore tested protocols to rapidly and inexpensively sequence and assemble the genome from any Drosophila species using single-molecule sequencing technology from Oxford Nanopore. Here, we use this technology to present high-quality genome assemblies of 15 Drosophila species: 10 of the 12 originally sequenced Drosophila species (ananassae, erecta, mojavensis, persimilis, pseudoobscura, sechellia, simulans, virilis, willistoni, and yakuba), four additional species that had previously reported assemblies (biarmipes, bipectinata, eugracilis, and mauritiana), and one novel assembly (triauraria). Genomes were generated from an average of 29x depth-of-coverage data that after assembly resulted in an average contig N50 of 4.4 Mb. Subsequent alignment of contigs from the published reference genomes demonstrates that our assemblies could be used to close over 60% of the gaps present in the currently published reference genomes. Importantly, the materials and reagents cost for each genome was approximately $1,000 (USD). This study demonstrates the power and cost-effectiveness of long-read sequencing for genome assembly in Drosophila and provides a framework for the affordable sequencing and assembly of additional Drosophila genomes.

Download Full-text

Genotyping of structural variation using PacBio high-fidelity sequencing

10.1101/2021.10.28.466362 ◽

2021 ◽

Author(s):

Zhiliang Zhang ◽

Jijin Zhang ◽

Lipeng Kang ◽

Xuebing Qiu ◽

Beirui Niu ◽

...

Keyword(s):

Comprehensive Evaluation ◽

Phenotypic Diversity ◽

Population Level ◽

High Fidelity ◽

Structural Variations ◽

Plant Genomes ◽

Sequencing Technologies ◽

Long Read ◽

Low Coverage ◽

The Impact

Background: Structural variations (SVs) pervade the genome and contribute substantially to the phenotypic diversity of species. However, most SVs were ineffectively assayed because of the complexity of plant genomes and the limitations of sequencing technologies. Recent advancement of third-generation sequencing technologies, particularly the PacBio high-fidelity (HiFi) sequencing, which generates both long and highly accurate reads, offers an unprecedented opportunity to characterize SVs and reveal their functionality. Since HiFi sequencing is new, it is crucial to evaluate HiFi reads in SV detection before applying the technology at scale. Results: We sequenced wheat genomes using HiFi, then conducted a comprehensive evaluation of SV detection using mainstream long-read aligners and SV callers. The results showed the accuracy of SV discovery depends more on aligners rather than callers. For aligners, pbmm2 and NGMLR provided the most accurate results while detecting deletion and insertion, respectively. Likewise, cuteSV and SVIM achieved the best performance across all SV callers. We demonstrated that the combination of the aligners and callers mentioned above is optimal for SV detection. Furthermore, we evaluated the impact of sequencing depth on the accuracy of SV detection. The results showed that low-coverage HiFi sequencing is capable of generating high-quality SV genotyping. Conclusions: This study provides a robust benchmark of SV discovery with HiFi reads, showing the remarkable potential of long-read sequencing to investigate structural variations in plant genomes. The high accuracy SV discovery from low-coverage HiFi sequencing indicates that skim HiFi sequencing is an ideal approach to study structural variations at the population level.

Download Full-text

Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies

Genome Biology ◽

10.1186/s13059-020-02134-9 ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Arang Rhie ◽

Brian P. Walenz ◽

Sergey Koren ◽

Adam M. Phillippy

Keyword(s):

De Novo ◽

High Accuracy ◽

Robust Method ◽

Plant Genomes ◽

Base Level ◽

Set Operations ◽

Assembly Evaluation ◽

Long Read ◽

Genome Assemblies ◽

Reference Genomes

Abstract Recent long-read assemblies often exceed the quality and completeness of available reference genomes, making validation challenging. Here we present Merqury, a novel tool for reference-free assembly evaluation based on efficient k-mer set operations. By comparing k-mers in a de novo assembly to those found in unassembled high-accuracy reads, Merqury estimates base-level accuracy and completeness. For trios, Merqury can also evaluate haplotype-specific accuracy, completeness, phase block continuity, and switch errors. Multiple visualizations, such as k-mer spectrum plots, can be generated for evaluation. We demonstrate on both human and plant genomes that Merqury is a fast and robust method for assembly validation.

Download Full-text