Long-read assemblies reveal structural diversity in genomes of organelles – an example with Acacia pycnantha

Long-read assemblies reveal structural diversity in genomes of organelles - an example with Acacia pycnantha

10.1101/2020.12.22.423164 ◽

2020 ◽

Author(s):

Anna E. Syme ◽

Todd G.B. McLay ◽

Frank Udovicic ◽

David J. Cantrill ◽

Daniel J. Murphy

Keyword(s):

Mitochondrial Genome ◽

Chloroplast Genome ◽

De Novo ◽

Genomic Structure ◽

Structural Diversity ◽

Mitochondrial Genomes ◽

Long Reads ◽

Organelle Genomes ◽

Long Read ◽

Assembly Algorithms

AbstractAlthough organelle genomes are typically represented as single, static, circular molecules, there is evidence that the chloroplast genome exists in two structural haplotypes and that the mitochondrial genome can display multiple circular, linear or branching forms. We sequenced and assembled chloroplast and mitochondrial genomes of the Golden Wattle, Acacia pycnantha, using long reads, iterative baiting to extract organelle-only reads, and several assembly algorithms to explore genomic structure. Using a de novo assembly approach agnostic to previous hypotheses about structure, we found different assemblies revealed contrasting arrangements of genomic segments; a hypothesis supported by mapped reads spanning alternate paths.

Download Full-text

Transposable element accumulation drives size differences among polymorphic Y chromosomes in Drosophila

10.1101/2021.06.02.446622 ◽

2021 ◽

Author(s):

Alison Nguyen ◽

Doris Bachtrog

Keyword(s):

Transposable Element ◽

Y Chromosome ◽

Structural Diversity ◽

Size Difference ◽

Drosophila Pseudoobscura ◽

Nucleotide Variation ◽

Y Chromosomes ◽

Long Read ◽

Low Levels ◽

Element Accumulation

Y chromosomes of many species are gene poor and show low levels of nucleotide variation, yet often display high amounts of structural diversity. Dobzhansky cataloged several morphologically distinct Y chromosomes in Drosophila pseudoobscura that differ in size and shape, but the molecular causes of their dramatic size differences are unclear. Here we use cytogenetics and long-read sequencing to study the sequence content of polymorphic Y chromosomes in D. pseudoobscura. We show that Y chromosomes differ by almost 2-fold in size, ranging from 30 to 60 Mb. Most of this size difference is caused by a handful of active transposable elements (TEs) that have recently expanded on the largest Y chromosome, with different elements being responsible for Y expansion on differently sized D. pseudoobscura Ys. We show that Y chromosomes differ in their heterochromatin enrichment, expression of Y-enriched TEs, and also influence expression of dozens of autosomal and X-linked genes. Intriguingly, the same helitron element that showed the most drastic amplification on the largest Y in D. pseudoobscura independently amplified on a polymorphic large Y chromosome in D. affinis, suggesting that some TEs are inherently more prone to become deregulated on Y chromosomes.

Download Full-text

Long-read Data Revealed Structural Diversity in Human Centromere Sequences

10.1101/784785 ◽

2019 ◽

Author(s):

Yuta Suzuki ◽

Gene Myers ◽

Shinichi Morishita

Keyword(s):

Dna Sequences ◽

Structural Diversity ◽

Higher Order ◽

Human Populations ◽

Sequence Evolution ◽

Single Nucleotide Variants ◽

True Nature ◽

Human Centromere ◽

Long Read ◽

Novel Variant

ABSTRACTCentromeres invariably serve as the loci of kinetochore assembly in all eukaryotic cells, but their underlying DNA sequences evolve rapidly. Human centromeres are characterized by their extremely repetitive structures, i.e., higher-order repeats, rendering the region one of the most difficult parts of the genome to assess. Consequently, our understanding of centromere sequence variations across human populations is limited. Here, we analyzed chromosomes 11, 17, and X using long sequencing reads of two European and two Asian genomes, and our results show that human centromere sequences exhibit substantial structural diversity, harboring many novel variant higher-order repeats specific to individuals, while frequent single-nucleotide variants are largely conserved. Our findings add another dimension to our knowledge of centromeres, challenging the notion of stable human centromeres. The discovery of such diversity prompts further deep sequencing of human populations to understand the true nature of sequence evolution in human centromeres.

Download Full-text

Reduced subgenomic RNA expression is a molecular indicator of asymptomatic SARS-CoV-2 infection

Communications Medicine ◽

10.1038/s43856-021-00034-y ◽

2021 ◽

Vol 1 (1) ◽

Author(s):

Chee Hong Wong ◽

Chew Yee Ngan ◽

Rachel L. Goldfeder ◽

Jennifer Idol ◽

Chris Kuhlberg ◽

...

Keyword(s):

Vaccine Development ◽

Structural Diversity ◽

Clinical Severity ◽

Host Responses ◽

Symptomatic Infection ◽

Genomic Signatures ◽

Asymptomatic Patients ◽

Long Read ◽

Highly Correlated ◽

Asymptomatic Individuals

Abstract Background It is estimated that up to 80% of infections caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) are asymptomatic and asymptomatic patients can still effectively transmit the virus and cause disease. While much of the effort has been placed on decoding single nucleotide variation in SARS-CoV-2 genomes, considerably less is known about their transcript variation and any correlation with clinical severity in human hosts, as defined here by the presence or absence of symptoms. Methods To assess viral genomic signatures of disease severity, we conducted a systematic characterization of SARS-CoV-2 transcripts and genetic variants in 81 clinical specimens collected from symptomatic and asymptomatic individuals using multi-scale transcriptomic analyses including amplicon-seq, short-read metatranscriptome and long-read Iso-seq. Results Here we show a highly coordinated and consistent pattern of sgRNA expression from individuals with robust SARS-CoV-2 symptomatic infection and their expression is significantly repressed in the asymptomatic infections. We also observe widespread inter- and intra-patient variants in viral RNAs, known as quasispecies frequently found in many RNA viruses. We identify unique sets of deletions preferentially found primarily in symptomatic individuals, with many likely to confer changes in SARS-CoV-2 virulence and host responses. Moreover, these frequently occurring structural variants in SARS-CoV-2 genomes serve as a mechanism to further induce SARS-CoV-2 proteome complexity. Conclusions Our results indicate that differential sgRNA expression and structural mutational burden are highly correlated with the clinical severity of SARS-CoV-2 infection. Longitudinally monitoring sgRNA expression and structural diversity could further guide treatment responses, testing strategies, and vaccine development.

Download Full-text

Subgenomic RNAs as molecular indicators of asymptomatic SARS-CoV-2 infection

10.1101/2021.02.06.430041 ◽

2021 ◽

Author(s):

Chee Hong Wong ◽

Chew Yee Ngan ◽

Rachel L. Goldfeder ◽

Jennifer Idol ◽

Chris Kuhlberg ◽

...

Keyword(s):

Vaccine Development ◽

Structural Diversity ◽

Clinical Severity ◽

Host Responses ◽

Subgenomic Rnas ◽

Sequencing Technologies ◽

Long Read ◽

Highly Correlated ◽

Host Infection ◽

Asymptomatic Individuals

SummaryIn coronaviridae such as SARS-CoV-2, subgenomic RNAs (sgRNA) are replicative intermediates, therefore, their abundance and structures could infer viral replication activity and severity of host infection. Here, we systematically characterized the sgRNA expression and their structural variation in 81 clinical specimens collected from symptomatic and asymptomatic individuals with a goal of assessing viral genomic signatures of disease severity. We demonstrated the highly coordinated and consistent expression of sgRNAs from individuals with robust infections that results in symptoms, and found their expression is significantly repressed in the asymptomatic infections, indicating that the ratio of sgRNAs to genomic RNA (sgRNA/gRNA) is highly correlated with the severity of the disease. Using long read sequencing technologies to characterize full-length sgRNA structures, we also observed widespread deletions in viral RNAs, and identified unique sets of deletions preferentially found primarily in symptomatic individuals, with many likely to confer changes in SARS-CoV-2 virulence and host responses. Furthermore, based on the sgRNA structures, the frequently occurred structural variants in SARS-CoV-2 genomes serves as a mechanism to further induce SARS-CoV-2 proteome complexity. Taken together, our results show that differential sgRNA expression and structural mutational burden both appear to be correlated with the clinical severity of SARS-CoV-2 infection. Longitudinally monitoring sgRNA expression and structural diversity could further guide treatment responses, testing strategies, and vaccine development.

Download Full-text

The alternative reality of plant mitochondrial DNA

10.1101/564278 ◽

2019 ◽

Cited By ~ 5

Author(s):

Alexander Kozik ◽

Beth A. Rowan ◽

Dean Lavelle ◽

Lidija Berke ◽

M. Eric Schranz ◽

...

Keyword(s):

Mitochondrial Dna ◽

Genome Structure ◽

Structural Diversity ◽

Data Availability ◽

Mitochondrial Genomes ◽

Physical Analysis ◽

Sequencing Data ◽

True Nature ◽

Plant Mitochondrial Dna ◽

Long Read

ABSTRACTPlant mitochondrial genomes are usually assembled and displayed as circular maps based on the widely-held assumption that circular genome molecules are the primary form of mitochondrial DNA, despite evidence to the contrary. Many plant mitochondrial genomes have one or more pairs of large repeats that can act as sites for inter- or intramolecular recombination, leading to multiple alternative genomic arrangements (isoforms). Most mitochondrial genomes have been assembled using methods that were unable to capture the complete spectrum of isoforms within a species, leading to an incomplete inference of their structure and recombinational activity. To document and investigate underlying reasons for structural diversity in plant mitochondrial DNA, we used long-read (PacBio) and short-read (Illumina) sequencing data to assemble and compare mitochondrial genomes of domesticated (Lactuca sativa) and wild (L. saligna and L. serriola) lettuce species. This allowed us to characterize a comprehensive, complex set of isoforms within each species and to compare genome structures between species. Physical analysis of L. sativa mtDNA molecules by fluorescence microscopy revealed a variety of linear, branched linear, and circular structures. The mitochondrial genomes for L. sativa and L. serriola were identical in sequence and arrangement, and differed substantially from L. saligna, indicating that the mitochondrial genome structure did not change during domestication. From the isoforms evident in our data, we inferred that recombination occurs at repeats of all sizes at variable frequencies. The differences in genome structure between L. saligna and the two other lettuce species can be largely explained by rare recombination events that rearrange the structure. Our data demonstrate that representations of plant mitochondrial DNA as simple, genome-sized circular molecules are not accurate descriptions of their true nature and that in reality plant mitochondrial DNA is a complex, dynamic mixture of forms.Data AvailabilityBioProject: Organellar genomes of cultivated and wild lettuce (Lactuca) varieties PRJNA508811 https://www.ncbi.nlm.nih.gov/bioproject/508811 and other accessions as indicated through the text and supplemental data.FundingNSF grant MCB-1413152 to ACC and support from UC Davis to RWM.

Download Full-text

Detection of simple and complex de novo mutations without, with, or with multiple reference sequences

10.1101/698910 ◽

2019 ◽

Cited By ~ 1

Author(s):

Kiran V Garimella ◽

Zamin Iqbal ◽

Michael A. Krause ◽

Susana Campino ◽

Mihir Kekre ◽

...

Keyword(s):

De Novo ◽

Structural Diversity ◽

Low Complexity ◽

Graph Connectivity ◽

Data Sources ◽

Whole Genome Sequencing Data ◽

Sequencing Data ◽

De Novo Mutations ◽

Long Read ◽

Multiple Reference

AbstractThe characterization of de novo mutations in regions of high sequence and structural diversity from whole genome sequencing data remains highly challenging. Complex structural variants tend to arise in regions of high repetitiveness and low complexity, challenging both de novo assembly, where short-reads do not capture the long-range context required for resolution, and mapping approaches, where improper alignment of reads to a reference genome that is highly diverged from that of the sample can lead to false or partial calls. Long-read technologies can potentially solve such problems but are currently unfeasible to use at scale. Here we present Corticall, a graph-based method that combines the advantages of multiple technologies and prior data sources to detect arbitrary classes of genetic variant. We construct multi-sample, coloured de Bruijn graphs from shortread data for all samples, align long-read-derived haplotypes and multiple reference data sources to restore graph connectivity information, and call variants using graph path-finding algorithms and a model for simultaneous alignment and recombination. We validate and evaluate the approach using extensive simulations and use it to characterize the rate and spectrum of de novo mutation events in 119 progeny from four Plasmodium falciparum experimental crosses, using long-read data on the parents to inform reconstructions of the progeny and to detect several known and novel non-allelic homologous recombination events.

Download Full-text

Rapid and ongoing evolution of repetitive sequence structures in human centromeres

Science Advances ◽

10.1126/sciadv.abd9230 ◽

2020 ◽

Vol 6 (50) ◽

pp. eabd9230

Author(s):

Yuta Suzuki ◽

Eugene W. Myers ◽

Shinichi Morishita

Keyword(s):

Sequence Variation ◽

Structural Diversity ◽

Higher Order ◽

Human Populations ◽

Sequence Evolution ◽

Diverse Populations ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Long Read

Our understanding of centromere sequence variation across human populations is limited by its extremely long nested repeat structures called higher-order repeats that are challenging to sequence. Here, we analyzed chromosomes 11, 17, and X using long-read sequencing data for 36 individuals from diverse populations including a Han Chinese trio and 21 Japanese. We revealed substantial structural diversity with many previously unidentified variant higher-order repeats specific to individuals characterizing rapid, haplotype-specific evolution of human centromeric arrays, while frequent single-nucleotide variants are largely conserved. We found a characteristic pattern shared among prevalent variants in human and chimpanzee. Our findings pave the way for studying sequence evolution in human and primate centromeres.

Download Full-text