Nanopore Sequencing Significantly Improves Genome Assembly of the Protozoan Parasite Trypanosoma cruzi

Florencia Díaz-Viraqué; Sebastián Pita; Gonzalo Greif; Rita de Cássia Moreira de Souza; Gregorio Iraola; Carlos Robello

doi:10.1093/gbe/evz129

Nanopore Sequencing Significantly Improves Genome Assembly of the Protozoan Parasite Trypanosoma cruzi

Genome Biology and Evolution ◽

10.1093/gbe/evz129 ◽

2019 ◽

Vol 11 (7) ◽

pp. 1952-1957 ◽

Cited By ~ 12

Author(s):

Florencia Díaz-Viraqué ◽

Sebastián Pita ◽

Gonzalo Greif ◽

Rita de Cássia Moreira de Souza ◽

Gregorio Iraola ◽

...

Keyword(s):

Trypanosoma Cruzi ◽

Genome Assembly ◽

De Novo ◽

Repetitive Sequences ◽

Protozoan Parasite ◽

Single Copy ◽

Nanopore Sequencing ◽

Short Reads ◽

Comparative Analyses ◽

Clade C

Abstract Chagas disease was described by Carlos Chagas, who first identified the parasite Trypanosoma cruzi from a 2-year-old girl called Berenice. Many T. cruzi sequencing projects based on short reads have demonstrated that genome assembly and downstream comparative analyses are extremely challenging in this species, given that half of its genome is composed of repetitive sequences. Here, we report de novo assemblies, annotation, and comparative analyses of the Berenice strain using a combination of Illumina short reads and MinION long reads. Our work demonstrates that Nanopore sequencing improves T. cruzi assembly contiguity and increases the assembly size in ∼16 Mb. Specifically, we found that assembly improvement also refines the completeness of coding regions for both single-copy genes and repetitive transposable elements. Beyond its historical and epidemiological importance, Berenice constitutes a fundamental resource because it now constitutes a high-quality assembly available for TcII (clade C), a prevalent lineage causing human infections in South America. The availability of Berenice genome expands the known genetic diversity of these parasites and reinforces the idea that T. cruzi is intraspecifically divided in three main clades. Finally, this work represents the introduction of Nanopore technology to resolve complex protozoan genomes, supporting its subsequent application for improving trypanosomatid and other highly repetitive genomes.

Download Full-text

Nanopore sequencing significantly improves genome assembly of the eukaryotic protozoan parasite Trypanosoma cruzi

10.1101/489534 ◽

2018 ◽

Cited By ~ 1

Author(s):

Florencia Diaz-Viraque ◽

Sebastian Pita ◽

Gonzalo Greif ◽

Rita de Cassia Moreira de Souza ◽

Gregorio Iraola ◽

...

Keyword(s):

Trypanosoma Cruzi ◽

Genome Assembly ◽

De Novo ◽

Repetitive Sequences ◽

Protozoan Parasite ◽

Single Copy ◽

Nanopore Sequencing ◽

Short Reads ◽

Comparative Analyses ◽

Long Reads

Chagas disease was described by Carlos Chagas, who first identified the parasite Trypanosoma cruzi from a two-year-old girl called Berenice. Many T. cruzi sequencing projects based on short reads have demonstrated that genome assembly and downstream comparative analyses are extremely challenging in this species, given that half of its genome is composed of repetitive sequences. Here, we report de novo assemblies, annotation and comparative analyses of the Berenice strain using a combination of Illumina short reads and MinION long reads. Our work demonstrates that Nanopore sequencing improves T. cruzi assembly contiguity and increases the assembly size in ~16 Mb. Specifically, we found that assembly improvement also refines the completeness of coding regions for both single copy genes and repetitive transposable elements. Beyond its historical and epidemiological importance, Berenice constitutes a fundamental resource since it now represents the best-quality assembly available for TcII, a highly prevalent lineage causing human infections in South America. The availability of Berenice genome expands the known genetic diversity of T. cruzi and facilitates more comprehensive evolutionary inferences. Our work represents the first report of Nanopore technology used to resolve complex protozoan genomes, supporting its subsequent application for improving trypanosomatid and other highly repetitive genomes.

Download Full-text

Draft genome of a porcupinefish, Diodon Holocanthus

10.1101/775387 ◽

2019 ◽

Author(s):

Mengyang Xu ◽

Xiaoshan Su ◽

Mengqi Zhang ◽

Ming Li ◽

Xiaoyun Huang ◽

...

Keyword(s):

Genome Assembly ◽

De Novo ◽

Repetitive Sequences ◽

Draft Genome ◽

Single Copy ◽

Single Individual ◽

Protein Coding ◽

Long Read ◽

Phylogeny And Evolution ◽

Downstream Analysis

AbstractThe long-spine porcupinefish, Diodon holocanthus (Diodontidae, Tetraodontiformes, Actinopterygii), also known as the freckled porcupinefish, attracts great interest of ecology and economy. Its distinct characteristics including inflation reaction, spiny skin and tetradotoxin, however, have not been fully studied without a complete genome assembly.In this study, the whole genome of a single individual was sequenced using single tube-Long Fragment Read co-barcode reads, generating 154.3 Gb of paired-end data (219.8× depth). The gap was further filled using small amount of Oxford Nanopore MinION long read dataset (11.4Gb, 15.9× depth). Taking full use of long, medium, short-range of genome assembly information, the final assembled sequences with a total length of 650.02 Mb obtained contig and scaffold N50 sizes of 2.15 Mb and 8.13 Mb, respectively, despite of high repetitive content. Benchmarking Universal Single-Copy Orthologs captured 95.7% (2,474) of core genes to assess the completeness. In addition, 206.5 Mb (32.10%) of repetitive sequences were identified, and 20,840 protein-coding genes were annotated, among which 18,281 (87.72%) proteins were assigned with possible functions.This is the first demonstration of de novo genome of the porcupinefish, which will benefit downstream analysis of ontogeny, phylogeny, and evolution, and improve the exploration of its unique defensive mechanism.

Download Full-text

De Novo Assembly of a High-Quality Reference Genome for the Horned Lark (Eremophila alpestris)

G3 Genes|Genome|Genetics ◽

10.1534/g3.119.400846 ◽

2019 ◽

Vol 10 (2) ◽

pp. 475-478 ◽

Cited By ~ 3

Author(s):

Nicholas A. Mason ◽

Paulo Pulgarin ◽

Carlos Daniel Cadena ◽

Irby J. Lovette

Keyword(s):

Genome Assembly ◽

De Novo ◽

Single Copy ◽

Single Copy Gene ◽

High Quality ◽

Data Set ◽

Copy Gene ◽

Assembly Pipeline ◽

Horned Lark ◽

Gene Orthologs

The Horned Lark (Eremophila alpestris) is a small songbird that exhibits remarkable geographic variation in appearance and habitat across an expansive distribution. While E. alpestris has been the focus of many ecological and evolutionary studies, we still lack a highly contiguous genome assembly for the Horned Lark and related taxa (Alaudidae). Here, we present CLO_EAlp_1.0, a highly contiguous assembly for E. alpestris generated from a blood sample of a wild, male bird captured in the Altiplano Cundiboyacense of Colombia. By combining short-insert and mate-pair libraries with the ALLPATHS-LG genome assembly pipeline, we generated a 1.04 Gb assembly comprised of 2713 scaffolds, with a largest scaffold size of 31.81 Mb, a scaffold N50 of 9.42 Mb, and a scaffold L50 of 30. These scaffolds were assembled from 23685 contigs, with a largest contig size of 1.69 Mb, a contig N50 of 193.81 kb, and a contig L50 of 1429. Our assembly pipeline also produced a single mitochondrial DNA contig of 14.00 kb. After polishing the genome, we identified 94.5% of single-copy gene orthologs from an Aves data set and 97.7% of single-copy gene orthologs from a vertebrata data set, which further demonstrates the high quality of our assembly. We anticipate that this genomic resource will be useful to the broader ornithological community and those interested in studying the evolutionary history and ecological interactions of larks, which comprise a widespread, yet understudied lineage of songbirds.

Download Full-text

De Novo Sequencing and Hybrid Assembly of the Biofuel Crop Jatropha curcas L.: Identification of Quantitative Trait Loci for Geminivirus Resistance

Genes ◽

10.3390/genes10010069 ◽

2019 ◽

Vol 10 (1) ◽

pp. 69 ◽

Cited By ~ 9

Author(s):

Nagesh Kancharla ◽

Saakshi Jalali ◽

J. Narasimham ◽

Vinod Nair ◽

Vijay Yepuri ◽

...

Keyword(s):

Ssr Markers ◽

Genome Assembly ◽

Jatropha Curcas ◽

Quantitative Trait ◽

De Novo ◽

Mapping Population ◽

Single Copy ◽

Sequencing Data ◽

De Novo Genome Assembly ◽

Sequencing Technologies

Jatropha curcas is an important perennial, drought tolerant plant that has been identified as a potential biodiesel crop. We report here the hybrid de novo genome assembly of J. curcas generated using Illumina and PacBio sequencing technologies, and identification of quantitative loci for Jatropha Mosaic Virus (JMV) resistance. In this study, we generated scaffolds of 265.7 Mbp in length, which correspond to 84.8% of the gene space, using Benchmarking Universal Single-Copy Orthologs (BUSCO) analysis. Additionally, 96.4% of predicted protein-coding genes were captured in RNA sequencing data, which reconfirms the accuracy of the assembled genome. The genome was utilized to identify 12,103 dinucleotide simple sequence repeat (SSR) markers, which were exploited in genetic diversity analysis to identify genetically distinct lines. A total of 207 polymorphic SSR markers were employed to construct a genetic linkage map for JMV resistance, using an interspecific F2 mapping population involving susceptible J. curcas and resistant Jatropha integerrima as parents. Quantitative trait locus (QTL) analysis led to the identification of three minor QTLs for JMV resistance, and the same has been validated in an alternate F2 mapping population. These validated QTLs were utilized in marker-assisted breeding for JMV resistance. Comparative genomics of oil-producing genes across selected oil producing species revealed 27 conserved genes and 2986 orthologous protein clusters in Jatropha. This reference genome assembly gives an insight into the understanding of the complex genetic structure of Jatropha, and serves as source for the development of agronomically improved virus-resistant and oil-producing lines.

Download Full-text

Whole-Genome Sequencing of the Giant Devil Catfish, Bagarius yarrelli

Genome Biology and Evolution ◽

10.1093/gbe/evz143 ◽

2019 ◽

Vol 11 (8) ◽

pp. 2071-2077 ◽

Cited By ~ 6

Author(s):

Wansheng Jiang ◽

Yunyun Lv ◽

Le Cheng ◽

Kunfeng Yang ◽

Chao Bian ◽

...

Keyword(s):

Body Size ◽

Ictalurus Punctatus ◽

Genome Assembly ◽

De Novo ◽

Large Body ◽

Single Copy ◽

The Body ◽

Flesh Color ◽

Fish Muscles ◽

Freshwater Aquaculture

AbstractAs one economically important fish in the southeastern Himalayas, the giant devil catfish (Bagarius yarrelli) has been known for its extraordinarily large body size. It can grow up to 2 m, whereas the non-Bagarius sisorids only reach 10–30 cm. Another outstanding characteristic of Bagarius species is the salmonids-like reddish flesh color. Both body size and flesh color are interesting questions in science and also valuable features in aquaculture that worth of deep investigations. Bagarius species therefore are ideal materials for studying body size evolution and color depositions in fish muscles, and also potential organisms for extensive utilization in Asian freshwater aquaculture. In a combination of Illumina and PacBio sequencing technologies, we de novo assembled a 571-Mb genome for the giant devil catfish from a total of 153.4-Gb clean reads. The scaffold and contig N50 values are 3.1 and 1.6 Mb, respectively. This genome assembly was evaluated with 93.4% of Benchmarking Universal Single-Copy Orthologs completeness, 98% of transcripts coverage, and highly homologous with a chromosome-level-based genome of channel catfish (Ictalurus punctatus). We detected that 35.26% of the genome assembly is composed of repetitive elements. Employing homology, de novo, and transcriptome-based annotations, we annotated a total of 19,027 protein-coding genes for further use. In summary, we generated the first high-quality genome assembly of the giant devil catfish, which provides an important genomic resource for its future studies such as the body size and flesh color issues, and also for facilitating the conservation and utilization of this valuable catfish.

Download Full-text

Pacific Biosciences assembly with Hi-C mapping generates an improved, chromosome-level goose genome

GigaScience ◽

10.1093/gigascience/giaa114 ◽

2020 ◽

Vol 9 (10) ◽

Cited By ~ 1

Author(s):

Yan Li ◽

Guangliang Gao ◽

Yu Lin ◽

Silu Hu ◽

Yi Luo ◽

...

Keyword(s):

Single Molecule ◽

Genome Assembly ◽

Spatial Organization ◽

De Novo ◽

Single Copy ◽

Interaction Patterns ◽

Anser Anser ◽

Pacific Biosciences ◽

Eukaryotic Genes ◽

Chromosome Level

ABSTRACT Background The domestic goose is an economically important and scientifically valuable waterfowl; however, a lack of high-quality genomic data has hindered research concerning its genome, genetics, and breeding. As domestic geese breeds derive from both the swan goose (Anser cygnoides) and the graylag goose (Anser anser), we selected a female Tianfu goose for genome sequencing. We generated a chromosome-level goose genome assembly by adopting a hybrid de novo assembly approach that combined Pacific Biosciences single-molecule real-time sequencing, high-throughput chromatin conformation capture mapping, and Illumina short-read sequencing. Findings We generated a 1.11-Gb goose genome with contig and scaffold N50 values of 1.85 and 33.12 Mb, respectively. The assembly contains 39 pseudo-chromosomes (2n = 78) accounting for ∼88.36% of the goose genome. Compared with previous goose assemblies, our assembly has more continuity, completeness, and accuracy; the annotation of core eukaryotic genes and universal single-copy orthologs has also been improved. We have identified 17,568 protein-coding genes and a repeat content of 8.67% (96.57 Mb) in this genome assembly. We also explored the spatial organization of chromatin and gene expression in the goose liver tissues, in terms of inter-pseudo-chromosomal interaction patterns, compartments, topologically associating domains, and promoter-enhancer interactions. Conclusions We present the first chromosome-level assembly of the goose genome. This will be a valuable resource for future genetic and genomic studies on geese.

Download Full-text

De novo Genome Assembly of the indica Rice Variety IR64 Using Linked-Read Sequencing and Nanopore Sequencing

G3 Genes|Genome|Genetics ◽

10.1534/g3.119.400871 ◽

2020 ◽

Vol 10 (5) ◽

pp. 1495-1501 ◽

Cited By ~ 1

Author(s):

Tsuyoshi Tanaka ◽

Ryo Nishijima ◽

Shota Teramoto ◽

Yuka Kitomi ◽

Takeshi Hayashi ◽

...

Keyword(s):

Functional Genomics ◽

Genome Assembly ◽

De Novo ◽

Rice Variety ◽

Rice Genome ◽

High Yield ◽

Nanopore Sequencing ◽

Long Reads ◽

A Genome ◽

Modern Varieties

IR64 is a rice variety with high-yield that has been widely cultivated around the world. IR64 has been replaced by modern varieties in most growing areas. Given that modern varieties are mostly progenies or relatives of IR64, genetic analysis of IR64 is valuable for rice functional genomics. However, chromosome-level genome sequences of IR64 have not been available previously. Here, we sequenced the IR64 genome using synthetic long reads obtained by linked-read sequencing and ultra-long reads obtained by nanopore sequencing. We integrated these data and generated the de novo assembly of the IR64 genome of 367 Mb, equivalent to 99% of the estimated size. Continuity of the IR64 genome assembly was improved compared with that of a publicly available IR64 genome assembly generated by short reads only. We annotated 41,458 protein-coding genes, including 657 IR64-specific genes, that are missing in other high-quality rice genome assemblies IRGSP-1.0 of japonica cultivar Nipponbare or R498 of indica cultivar Shuhui498. The IR64 genome assembly will serve as a genome resource for rice functional genomics as well as genomics-driven and/or molecular breeding.

Download Full-text

Genome Report: De novo assembly of a high-quality reference genome for the Horned Lark (Eremophila alpestris)

10.1101/811745 ◽

2019 ◽

Cited By ~ 1

Author(s):

Nicholas A. Mason ◽

Paulo Pulgarin ◽

Carlos Daniel Cadena ◽

Irby J. Lovette

Keyword(s):

Genome Assembly ◽

De Novo ◽

Single Copy ◽

Single Copy Gene ◽

Data Set ◽

Copy Gene ◽

Assembly Pipeline ◽

Horned Lark ◽

Genomic Resource ◽

Gene Orthologs

AbstractThe Horned Lark (Eremophila alpestris) is a species of small songbird that exhibits remarkable geographic variation in appearance and habitat across an expansive distribution. While E. alpestris and related species have been the focus of many ecological and evolutionary studies, we still lack a highly contiguous genome assembly for horned larks and related taxa (Alaudidae). Here, we present CLO_EAlp_1.0, a highly contiguous assembly for horned larks generated from blood samples of a wild, male bird captured in the Altiplano Cundiboyacense of Colombia. By combining short-insert and mate-pair libraries with the ALLPATHS-LG genome assembly pipeline, we generated a 1.04 Gb assembly comprised of 2708 contigs with an N50 of 10.58 Mb and a L50 of 29. After polishing the genome, we were able to identify 94.5% of single-copy gene orthologs from an Aves data set and 97.7% of single-copy gene orthologs from a vertebrata data set, indicating that our de novo assembly is near complete. We anticipate that this genomic resource will be useful to the broader ornithological community and those interested in studying the evolutionary history and ecological interactions of a widespread, yet understudied lineage of songbirds.

Download Full-text

The de novo genome of the “Spanish” slug Arion vulgaris Moquin-Tandon, 1855 (Gastropoda: Panpulmonata): massive expansion of transposable elements in a major pest species

10.1101/2020.11.30.403303 ◽

2020 ◽

Author(s):

Zeyuan Chen ◽

Özgül Doğan ◽

Nadège Guiglielmoni ◽

Anne Guichard ◽

Michael Schrödl

Keyword(s):

Transposable Elements ◽

Genome Assembly ◽

De Novo ◽

Repetitive Sequences ◽

Land Snails ◽

Whole Genome ◽

Pest Species ◽

High Quality ◽

Genome Duplication Event ◽

Arion Vulgaris

AbstractBackgroundThe “Spanish” slug, Arion vulgaris Moquin-Tandon, 1855, is considered to be among the 100 worst pest species in Europe. It is common and invasive to at least northern and eastern parts of Europe, probably benefitting from climate change and the modern human lifestyle. The origin and expansion of this species, the mechanisms behind its outstanding adaptive success and ability to outcompete other land slugs are worth to be explored on a genomic level. However, a high-quality chromosome-level genome is still lacking.FindingsThe final assembly of A. vulgaris was obtained by combining short reads, linked reads, Nanopore long reads, and Hi-C data. The genome assembly size is 1.54 Gb with a contig N50 length of 8.6 Mb. We found a recent expansion of transposable elements (TEs) which results in repetitive sequences accounting for more than 75% of the A. vulgaris genome, which is the highest among all known gastropod species. We identified 32,518 protein coding genes, and 2,763 species specific genes were functionally enriched in response to stimuli, nervous system and reproduction. With 1,237 single-copy orthologs from A. vulgaris and other related mollusks with whole-genome data available, we reconstructed the phylogenetic relationships of gastropods and estimated the divergence time of stylommatophoran land snails (Achatina) and Arion slugs at around 126 million years ago, and confirmed the whole genome duplication event shared by them.ConclusionsTo our knowledge, the A. vulgaris genome is the first land slug genome assembly published to date. The high-quality genomic data will provide valuable genetic resources for further phylogeographic studies of A. vulgaris origin and expansion, invasiveness, as well as molluscan aquatic-land transition and shell formation.

Download Full-text

De novo sequencing, assembly and functional annotation of Armillaria borealis genome

BMC Genomics ◽

10.1186/s12864-020-06964-6 ◽

2020 ◽

Vol 21 (S7) ◽

Author(s):

Vasilina S. Akulova ◽

Vadim V. Sharov ◽

Anastasiya I. Aksyonova ◽

Yuliya A. Putintseva ◽

Natalya V. Oreshkova ◽

...

Keyword(s):

Comparative Analysis ◽

Genome Assembly ◽

Functional Annotation ◽

De Novo ◽

Fundamental Problem ◽

Repetitive Sequences ◽

Far East ◽

White Rot ◽

Climatic Effects ◽

De Novo Genome Assembly

Abstract Background Massive forest decline has been observed almost everywhere as a result of negative anthropogenic and climatic effects, which can interact with pests, fungi and other phytopathogens and aggravate their effects. Climatic changes can weaken trees and make fungi, such as Armillaria more destructive. Armillaria borealis (Marxm. & Korhonen) is a fungus from the Physalacriaceae family (Basidiomycota) widely distributed in Eurasia, including Siberia and the Far East. Species from this genus cause the root white rot disease that weakens and often kills woody plants. However, little is known about ecological behavior and genetics of A. borealis. According to field research data, A. borealis is less pathogenic than A. ostoyae, and its aggressive behavior is quite rare. Mainly A. borealis behaves as a secondary pathogen killing trees already weakened by other factors. However, changing environment might cause unpredictable effects in fungus behavior. Results The de novo genome assembly and annotation were performed for the A. borealis species for the first time and presented in this study. The A. borealis genome assembly contained ~ 68 Mbp and was comparable with ~ 60 and ~ 79.5 Mbp for the A. ostoyae and A. mellea genomes, respectively. The N50 for contigs equaled 50,544 bp. Functional annotation analysis revealed 21,969 protein coding genes and provided data for further comparative analysis. Repetitive sequences were also identified. The main focus for further study and comparative analysis will be on the enzymes and regulatory factors associated with pathogenicity. Conclusions Pathogenic fungi such as Armillaria are currently one of the main problems in forest conservation. A comprehensive study of these species and their pathogenicity is of great importance and needs good genomic resources. The assembled genome of A. borealis presented in this study is of sufficiently good quality for further detailed comparative study on the composition of enzymes in other Armillaria species. There is also a fundamental problem with the identification and classification of species of the Armillaria genus, where the study of repetitive sequences in the genomes of basidiomycetes and their comparative analysis will help us identify more accurately taxonomy of these species and reveal their evolutionary relationships.

Download Full-text