SIGAR: Inferring Features of Genome Architecture and DNA Rearrangements by Split-Read Mapping

Yi Feng; Leslie Y Beh; Wei-Jen Chang; Laura F Landweber

doi:10.1093/gbe/evaa147

SIGAR: Inferring Features of Genome Architecture and DNA Rearrangements by Split-Read Mapping

Genome Biology and Evolution ◽

10.1093/gbe/evaa147 ◽

2020 ◽

Vol 12 (10) ◽

pp. 1711-1718

Author(s):

Yi Feng ◽

Leslie Y Beh ◽

Wei-Jen Chang ◽

Laura F Landweber

Keyword(s):

Genome Assembly ◽

Repetitive Sequences ◽

Genome Architecture ◽

Dna Rearrangements ◽

High Quality ◽

Microbial Eukaryotes ◽

Ciliate Species ◽

Split Read ◽

High Level ◽

Genome Assemblies

Abstract Ciliates are microbial eukaryotes with distinct somatic and germline genomes. Postzygotic development involves extensive remodeling of the germline genome to form somatic chromosomes. Ciliates therefore offer a valuable model for studying the architecture and evolution of programed genome rearrangements. Current studies usually focus on a few model species, where rearrangement features are annotated by aligning reference germline and somatic genomes. Although many high-quality somatic genomes have been assembled, a high-quality germline genome assembly is difficult to obtain due to its smaller DNA content and abundance of repetitive sequences. To overcome these hurdles, we propose a new pipeline, SIGAR (Split-read Inference of Genome Architecture and Rearrangements) to infer germline genome architecture and rearrangement features without a germline genome assembly, requiring only short DNA sequencing reads. As a proof of principle, 93% of rearrangement junctions identified by SIGAR in the ciliate Oxytricha trifallax were validated by the existing germline assembly. We then applied SIGAR to six diverse ciliate species without germline genome assemblies, including Ichthyophthirius multifilii, a fish pathogen. Despite the high level of somatic DNA contamination in each sample, SIGAR successfully inferred rearrangement junctions, short eliminated sequences, and potential scrambled genes in each species. This pipeline enables pilot surveys or exploration of DNA rearrangements in species with limited DNA material access, thereby providing new insights into the evolution of chromosome rearrangements.

Download Full-text

SIGAR: Inferring features of genome architecture and DNA rearrangements by split read mapping

10.1101/2020.05.05.079426 ◽

2020 ◽

Author(s):

Yi Feng ◽

Leslie Y. Beh ◽

Wei-Jen Chang ◽

Laura F. Landweber

Keyword(s):

Genome Assembly ◽

Repetitive Sequences ◽

Genome Architecture ◽

Dna Rearrangements ◽

High Quality ◽

Microbial Eukaryotes ◽

Ciliate Species ◽

Split Read ◽

High Level ◽

Genome Assemblies

AbstractCiliates are microbial eukaryotes with distinct somatic and germline genomes. Post-zygotic development involves extensive remodeling of the germline genome to form somatic chromosomes. Ciliates therefore offer a valuable model for studying the architecture and evolution of programmed genome rearrangements. Current studies usually focus on a few model species, where rearrangement features are annotated by aligning reference germline and somatic genomes. While many high-quality somatic genomes have been assembled, a high quality germline genome assembly is difficult to obtain due to its smaller DNA content and abundance of repetitive sequences. To overcome these hurdles, we propose a new pipeline SIGAR (Splitread Inference of Genome Architecture and Rearrangements) to infer germline genome architecture and rearrangement features without a germline genome assembly, requiring only short germline DNA sequencing reads. As a proof of principle, 93% of rearrangement junctions identified by SIGAR in the ciliate Oxytricha trifallax were validated by the existing germline assembly. We then applied SIGAR to six diverse ciliate species without germline genome assemblies, including Ichthyophthirius multifilii, a fish pathogen. Despite the high level of somatic DNA contamination in each sample, SIGAR successfully inferred rearrangement junctions, short eliminated sequences and potential scrambled genes in each species. This pipeline enables pilot surveys or exploration of DNA rearrangements in species with limited DNA material access, thereby providing new insights into the evolution of chromosome rearrangements.

Download Full-text

A high-quality, chromosome-level genome assembly of the Black Soldier Fly (Hermetia illucens L.)

G3 Genes|Genome|Genetics ◽

10.1093/g3journal/jkab085 ◽

2021 ◽

Author(s):

Tomas N Generalovic ◽

Shane A McCarthy ◽

Ian A Warren ◽

Jonathan M D Wood ◽

James Torrance ◽

...

Keyword(s):

Genome Assembly ◽

Animal Feed ◽

Repetitive Sequences ◽

Genomic Variation ◽

Runs Of Homozygosity ◽

High Quality ◽

Black Soldier Fly ◽

Hermetia Illucens ◽

Chromosome Conformation ◽

Important Species

Abstract Hermetia illucens L. (Diptera: Stratiomyidae), the Black Soldier Fly (BSF) is an increasingly important species for bioconversion of organic material into animal feed. We generated a high-quality chromosome-scale genome assembly of the BSF using Pacific Bioscience, 10X Genomics linked read and high-throughput chromosome conformation capture sequencing technology. Scaffolding the final assembly with Hi-C data produced a highly contiguous 1.01 Gb genome with 99.75% of scaffolds assembled into pseudochromosomes representing seven chromosomes with 16.01 Mb contig and 180.46 Mb scaffold N50 values. The highly complete genome obtained a BUSCO completeness of 98.6%. We masked 67.32% of the genome as repetitive sequences and annotated a total of 16,478 protein-coding genes using the BRAKER2 pipeline. We analysed an established lab population to investigate the genomic variation and architecture of the BSF revealing six autosomes and an X chromosome. Additionally, we estimated the inbreeding coefficient (1.9%) of a lab population by assessing runs of homozygosity. This provided evidence for inbreeding events including long runs of homozygosity on chromosome five. Release of this novel chromosome-scale BSF genome assembly will provide an improved resource for further genomic studies, functional characterisation of genes of interest and genetic modification of this economically important species.

Download Full-text

Identifying the causes and consequences of assembly gaps using a multiplatform genome assembly of a bird-of-paradise

10.1101/2019.12.19.882399 ◽

2019 ◽

Cited By ~ 5

Author(s):

Valentina Peona ◽

Mozes P.K. Blom ◽

Luohao Xu ◽

Reto Burri ◽

Shawn Sullivan ◽

...

Keyword(s):

Dark Matter ◽

Genome Assembly ◽

Sex Chromosome ◽

De Novo ◽

Model Organism ◽

Technology Choice ◽

High Quality ◽

Sequencing Technologies ◽

Downstream Analysis ◽

Genome Assemblies

AbstractGenome assemblies are currently being produced at an impressive rate by consortia and individual laboratories. The low costs and increasing efficiency of sequencing technologies have opened up a whole new world of genomic biodiversity. Although these technologies generate high-quality genome assemblies, there are still genomic regions difficult to assemble, like repetitive elements and GC-rich regions (genomic “dark matter”). In this study, we compare the efficiency of currently used sequencing technologies (short/linked/long reads and proximity ligation maps) and combinations thereof in assembling genomic dark matter starting from the same sample. By adopting different de-novo assembly strategies, we were able to compare each individual draft assembly to a curated multiplatform one and identify the nature of the previously missing dark matter with a particular focus on transposable elements, multi-copy MHC genes, and GC-rich regions. Thanks to this multiplatform approach, we demonstrate the feasibility of producing a high-quality chromosome-level assembly for a non-model organism (paradise crow) for which only suboptimal samples are available. Our approach was able to reconstruct complex chromosomes like the repeat-rich W sex chromosome and several GC-rich microchromosomes. Telomere-to-telomere assemblies are not a reality yet for most organisms, but by leveraging technology choice it is possible to minimize genome assembly gaps for downstream analysis. We provide a roadmap to tailor sequencing projects around the completeness of both the coding and non-coding parts of the genomes.

Download Full-text

A high-quality genome assembly from a single, field-collected spotted lanternfly (Lycorma delicatula) using the PacBio Sequel II system

GigaScience ◽

10.1093/gigascience/giz122 ◽

2019 ◽

Vol 8 (10) ◽

Cited By ~ 12

Author(s):

Sarah B Kingan ◽

Julie Urban ◽

Christine C Lambert ◽

Primo Baybayan ◽

Anna K Childers ◽

...

Keyword(s):

Invasive Species ◽

Genome Assembly ◽

De Novo ◽

Fragment Size ◽

High Quality ◽

De Novo Genome Assembly ◽

Lycorma Delicatula ◽

Long Read ◽

Genome Assemblies ◽

High Quality Genome

ABSTRACT Background A high-quality reference genome is an essential tool for applied and basic research on arthropods. Long-read sequencing technologies may be used to generate more complete and contiguous genome assemblies than alternate technologies; however, long-read methods have historically had greater input DNA requirements and higher costs than next-generation sequencing, which are barriers to their use on many samples. Here, we present a 2.3 Gb de novo genome assembly of a field-collected adult female spotted lanternfly (Lycorma delicatula) using a single Pacific Biosciences SMRT Cell. The spotted lanternfly is an invasive species recently discovered in the northeastern United States that threatens to damage economically important crop plants in the region. Results The DNA from 1 individual was used to make 1 standard, size-selected library with an average DNA fragment size of ∼20 kb. The library was run on 1 Sequel II SMRT Cell 8M, generating a total of 132 Gb of long-read sequences, of which 82 Gb were from unique library molecules, representing ∼36× coverage of the genome. The assembly had high contiguity (contig N50 length = 1.5 Mb), completeness, and sequence level accuracy as estimated by conserved gene set analysis (96.8% of conserved genes both complete and without frame shift errors). Furthermore, it was possible to segregate more than half of the diploid genome into the 2 separate haplotypes. The assembly also recovered 2 microbial symbiont genomes known to be associated with L. delicatula, each microbial genome being assembled into a single contig. Conclusions We demonstrate that field-collected arthropods can be used for the rapid generation of high-quality genome assemblies, an attractive approach for projects on emerging invasive species, disease vectors, or conservation efforts of endangered species.

Download Full-text

A high-quality de novo genome assembly from a single parasitoid wasp

10.1101/2020.07.13.200725 ◽

2020 ◽

Cited By ~ 1

Author(s):

Xinhai Ye ◽

Yi Yang ◽

Zhaoyang Tian ◽

Le Xu ◽

Kaili Yu ◽

...

Keyword(s):

Genome Assembly ◽

De Novo ◽

Parasitoid Wasp ◽

Genome Comparison ◽

Single Individual ◽

High Quality ◽

De Novo Genome Assembly ◽

Low Input ◽

A Genome ◽

High Level

AbstractSequencing and assembling a genome with a single individual have several advantages, such as lower heterozygosity and easier sample preparation. However, the amount of genomic DNA of some small sized organisms might not meet the standard DNA input requirement for current sequencing pipelines. Although few studies sequenced a single small insect with about 100 ng DNA as input, it may still be challenging for many small organisms to obtain such amount of DNA from a single individual. Here, we use 20 ng DNA as input, and present a high-quality genome assembly for a single haploid male parasitoid wasp (Habrobracon hebetor) using Nanopore and Illumina. Because of the low input DNA, a whole genome amplification (WGA) method is used before sequencing. The assembled genome size is 131.6 Mb with a contig N50 of 1.63 Mb. A total of 99% Benchmarking Universal Single-Copy Orthologs are detected, suggesting the high level of completeness of the genome assembly. Genome comparison between H. hebetor and its relative Bracon brevicornis shows a high-level genome synteny, indicating the genome of H. hebetor is highly accurate and contiguous. Our study provides an example for de novo assembling a genome from ultra-low input DNA, and will be used for sequencing projects of small sized species and rare samples, haploid genomics as well as population genetics of small sized species.

Download Full-text

Chicago and Dovetail Hi-C proximity ligation yield chromosome length scaffolds of Ixodes scapularis genome

10.1101/392126 ◽

2018 ◽

Cited By ~ 3

Author(s):

Andrew B. Nuss ◽

Arvind Sharma ◽

Monika Gulia-Nuss

Keyword(s):

Molecular Level ◽

Ixodes Scapularis ◽

Repetitive Sequences ◽

Chromosome Length ◽

Genome Architecture ◽

High Quality ◽

Proximity Ligation ◽

Sequencing Technologies ◽

Functional Gene Analysis ◽

High Quality Genome

AbstractA high-quality genome sequence is essential for understanding an organism on molecular level. However, the larger genomes with substantial repetitive sequences are challenging to assemble with the sequencing technologies. Hi-C technique is changing the genome architecture landscape by providing links across a variety of length scales, spanning even whole chromosomes. Ixodes scapularis haploid genome is 2.1 gbp and the current assembly consists of 369,495 scaffolds representing 57% of the genome. The fragmented genome poses challenges with functional gene analysis and an improved assembly is needed. We therefore used the Hi C technique to achieve chromosomal level assembly of tick genome. With Chicago and Dovetail Hi C assemblies, we were able to achieve 28 >10Mb sequences that correspond to 28 chromosomes in I. scapularis.

Download Full-text

The de novo genome of the “Spanish” slug Arion vulgaris Moquin-Tandon, 1855 (Gastropoda: Panpulmonata): massive expansion of transposable elements in a major pest species

10.1101/2020.11.30.403303 ◽

2020 ◽

Author(s):

Zeyuan Chen ◽

Özgül Doğan ◽

Nadège Guiglielmoni ◽

Anne Guichard ◽

Michael Schrödl

Keyword(s):

Transposable Elements ◽

Genome Assembly ◽

De Novo ◽

Repetitive Sequences ◽

Land Snails ◽

Whole Genome ◽

Pest Species ◽

High Quality ◽

Genome Duplication Event ◽

Arion Vulgaris

AbstractBackgroundThe “Spanish” slug, Arion vulgaris Moquin-Tandon, 1855, is considered to be among the 100 worst pest species in Europe. It is common and invasive to at least northern and eastern parts of Europe, probably benefitting from climate change and the modern human lifestyle. The origin and expansion of this species, the mechanisms behind its outstanding adaptive success and ability to outcompete other land slugs are worth to be explored on a genomic level. However, a high-quality chromosome-level genome is still lacking.FindingsThe final assembly of A. vulgaris was obtained by combining short reads, linked reads, Nanopore long reads, and Hi-C data. The genome assembly size is 1.54 Gb with a contig N50 length of 8.6 Mb. We found a recent expansion of transposable elements (TEs) which results in repetitive sequences accounting for more than 75% of the A. vulgaris genome, which is the highest among all known gastropod species. We identified 32,518 protein coding genes, and 2,763 species specific genes were functionally enriched in response to stimuli, nervous system and reproduction. With 1,237 single-copy orthologs from A. vulgaris and other related mollusks with whole-genome data available, we reconstructed the phylogenetic relationships of gastropods and estimated the divergence time of stylommatophoran land snails (Achatina) and Arion slugs at around 126 million years ago, and confirmed the whole genome duplication event shared by them.ConclusionsTo our knowledge, the A. vulgaris genome is the first land slug genome assembly published to date. The high-quality genomic data will provide valuable genetic resources for further phylogeographic studies of A. vulgaris origin and expansion, invasiveness, as well as molluscan aquatic-land transition and shell formation.

Download Full-text

Chromosome-Level Genome Assembly Reveals Significant Gene Expansion in the Toll and IMD Signaling Pathways of Dendrolimus kikuchii

Frontiers in Genetics ◽

10.3389/fgene.2021.728418 ◽

2021 ◽

Vol 12 ◽

Author(s):

Jielong Zhou ◽

Peifu Wu ◽

Zhongping Xiong ◽

Naiyong Liu ◽

Ning Zhao ◽

...

Keyword(s):

Genome Assembly ◽

Phylogenetic Analyses ◽

Repetitive Sequences ◽

Gene Families ◽

Thaumetopoea Pityocampa ◽

High Quality ◽

Protein Coding ◽

Peptidoglycan Recognition Protein ◽

Recognition Protein ◽

Chromosome Level

A high-quality genome is of significant value when seeking to control forest pests such as Dendrolimus kikuchii, a destructive member of the order Lepidoptera that is widespread in China. Herein, a high quality, chromosome-level reference genome for D. kikuchii based on Nanopore, Pacbio HiFi sequencing and the Hi-C capture system is presented. Overall, a final genome assembly of 705.51 Mb with contig and scaffold N50 values of 20.89 and 24.73 Mb, respectively, was obtained. Of these contigs, 95.89% had unique locations on 29 chromosomes. In silico analysis revealed that the genome contained 15,323 protein-coding genes and 63.44% repetitive sequences. Phylogenetic analyses indicated that D. kikuchii may diverged from the common ancestor of Thaumetopoea. Pityocampa, Thaumetopoea ni, Heliothis virescens, Hyphantria armigera, Spodoptera frugiperda, and Spodoptera litura approximately 122.05 million years ago. Many gene families were expanded in the D. kikuchii genome, particularly those of the Toll and IMD signaling pathway, which included 10 genes in peptidoglycan recognition protein, 19 genes in MODSP, and 11 genes in Toll. The findings from this study will help to elucidate the mechanisms involved in protection of D. kikuchii against foreign substances and pathogens, and may highlight a potential channel to control this pest.

Download Full-text

A high-quality chromosome-level genome assembly reveals genetics for important traits in eggplant

Horticulture Research ◽

10.1038/s41438-020-00391-0 ◽

2020 ◽

Vol 7 (1) ◽

Author(s):

Qingzhen Wei ◽

Jinglei Wang ◽

Wuhong Wang ◽

Tianhua Hu ◽

Haijiao Hu ◽

...

Keyword(s):

Genome Assembly ◽

Reference Genome ◽

Repetitive Sequences ◽

Gene Families ◽

Specific Gene ◽

High Quality ◽

Total Size ◽

Protein Coding ◽

Fruit Length ◽

Protein Coding Genes

Abstract Eggplant (Solanum melongena L.) is an economically important vegetable crop in the Solanaceae family, with extensive diversity among landraces and close relatives. Here, we report a high-quality reference genome for the eggplant inbred line HQ-1315 (S. melongena-HQ) using a combination of Illumina, Nanopore and 10X genomics sequencing technologies and Hi-C technology for genome assembly. The assembled genome has a total size of ~1.17 Gb and 12 chromosomes, with a contig N50 of 5.26 Mb, consisting of 36,582 protein-coding genes. Repetitive sequences comprise 70.09% (811.14 Mb) of the eggplant genome, most of which are long terminal repeat (LTR) retrotransposons (65.80%), followed by long interspersed nuclear elements (LINEs, 1.54%) and DNA transposons (0.85%). The S. melongena-HQ eggplant genome carries a total of 563 accession-specific gene families containing 1009 genes. In total, 73 expanded gene families (892 genes) and 34 contraction gene families (114 genes) were functionally annotated. Comparative analysis of different eggplant genomes identified three types of variations, including single-nucleotide polymorphisms (SNPs), insertions/deletions (indels) and structural variants (SVs). Asymmetric SV accumulation was found in potential regulatory regions of protein-coding genes among the different eggplant genomes. Furthermore, we performed QTL-seq for eggplant fruit length using the S. melongena-HQ reference genome and detected a QTL interval of 71.29–78.26 Mb on chromosome E03. The gene Smechr0301963, which belongs to the SUN gene family, is predicted to be a key candidate gene for eggplant fruit length regulation. Moreover, we anchored a total of 210 linkage markers associated with 71 traits to the eggplant chromosomes and finally obtained 26 QTL hotspots. The eggplant HQ-1315 genome assembly can be accessed at http://eggplant-hq.cn. In conclusion, the eggplant genome presented herein provides a global view of genomic divergence at the whole-genome level and powerful tools for the identification of candidate genes for important traits in eggplant.

Download Full-text

A high-quality, chromosome-level genome assembly of the Black Soldier Fly (Hermetia Illucens L.)

10.1101/2020.11.13.381889 ◽

2020 ◽

Author(s):

Tomas N. Generalovic ◽

Shane A. McCarthy ◽

Ian A. Warren ◽

Jonathan M.D. Wood ◽

James Torrance ◽

...

Keyword(s):

Genome Assembly ◽

Population Genomics ◽

Animal Feed ◽

Repetitive Sequences ◽

Genomic Variation ◽

Reference Sequence ◽

Runs Of Homozygosity ◽

High Quality ◽

Black Soldier Fly ◽

Hermetia Illucens

AbstractBackgroundHermetia illucens L. (Diptera: Stratiomyidae), the Black Soldier Fly (BSF) is an increasingly important mass reared entomological resource for bioconversion of organic material into animal feed.ResultsWe generated a high-quality chromosome-scale genome assembly of the BSF using Pacific Bioscience, 10X Genomics linked read and high-throughput chromosome conformation capture sequencing technology. Scaffolding the final assembly with Hi-C data produced a highly contiguous 1.01 Gb genome with 99.75% of scaffolds assembled into pseudo-chromosomes representing seven chromosomes with 16.01 Mb contig and 180.46 Mb scaffold N50 values. The highly complete genome obtained a BUSCO completeness of 98.6%. We masked 67.32% of the genome as repetitive sequences and annotated a total of 17,664 protein-coding genes using the BRAKER2 pipeline. We analysed an established lab population to investigate the genomic variation and architecture of the BSF revealing six autosomes and the identification of an X chromosome. Additionally, we estimated the inbreeding coefficient (1.9%) of a lab population by assessing runs of homozygosity. This revealed a plethora of inbreeding events including recent long runs of homozygosity on chromosome five.ConclusionsRelease of this novel chromosome-scale BSF genome assembly will provide an improved platform for further genomic studies and functional characterisation of candidate regions of artificial selection. This reference sequence will provide an essential tool for future genetic modifications, functional and population genomics.

Download Full-text