De Novo Sequencing and Hybrid Assembly of the Biofuel Crop Jatropha curcas L.: Identification of Quantitative Trait Loci for Geminivirus Resistance

Nagesh Kancharla; Saakshi Jalali; J. Narasimham; Vinod Nair; Vijay Yepuri; Bijal Thakkar; VB Reddy; Boney Kuriakose; Neeta Madan; Arockiasamy S

doi:10.3390/genes10010069

De Novo Sequencing and Hybrid Assembly of the Biofuel Crop Jatropha curcas L.: Identification of Quantitative Trait Loci for Geminivirus Resistance

Genes ◽

10.3390/genes10010069 ◽

2019 ◽

Vol 10 (1) ◽

pp. 69 ◽

Cited By ~ 9

Author(s):

Nagesh Kancharla ◽

Saakshi Jalali ◽

J. Narasimham ◽

Vinod Nair ◽

Vijay Yepuri ◽

...

Keyword(s):

Ssr Markers ◽

Genome Assembly ◽

Jatropha Curcas ◽

Quantitative Trait ◽

De Novo ◽

Mapping Population ◽

Single Copy ◽

Sequencing Data ◽

De Novo Genome Assembly ◽

Sequencing Technologies

Jatropha curcas is an important perennial, drought tolerant plant that has been identified as a potential biodiesel crop. We report here the hybrid de novo genome assembly of J. curcas generated using Illumina and PacBio sequencing technologies, and identification of quantitative loci for Jatropha Mosaic Virus (JMV) resistance. In this study, we generated scaffolds of 265.7 Mbp in length, which correspond to 84.8% of the gene space, using Benchmarking Universal Single-Copy Orthologs (BUSCO) analysis. Additionally, 96.4% of predicted protein-coding genes were captured in RNA sequencing data, which reconfirms the accuracy of the assembled genome. The genome was utilized to identify 12,103 dinucleotide simple sequence repeat (SSR) markers, which were exploited in genetic diversity analysis to identify genetically distinct lines. A total of 207 polymorphic SSR markers were employed to construct a genetic linkage map for JMV resistance, using an interspecific F2 mapping population involving susceptible J. curcas and resistant Jatropha integerrima as parents. Quantitative trait locus (QTL) analysis led to the identification of three minor QTLs for JMV resistance, and the same has been validated in an alternate F2 mapping population. These validated QTLs were utilized in marker-assisted breeding for JMV resistance. Comparative genomics of oil-producing genes across selected oil producing species revealed 27 conserved genes and 2986 orthologous protein clusters in Jatropha. This reference genome assembly gives an insight into the understanding of the complex genetic structure of Jatropha, and serves as source for the development of agronomically improved virus-resistant and oil-producing lines.

Download Full-text

De Novo Genome Assembly of Next-Generation Sequencing Data

Compendium of Plant Genomes - The Brassica rapa Genome ◽

10.1007/978-3-662-47901-8_4 ◽

2015 ◽

pp. 41-51

Author(s):

Min Liu ◽

Dongyuan Liu ◽

Hongkun Zheng

Keyword(s):

Next Generation Sequencing ◽

Genome Assembly ◽

De Novo ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

De Novo Genome Assembly ◽

Generation Sequencing

Download Full-text

Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads

Nature Biotechnology ◽

10.1038/s41587-020-0719-5 ◽

2020 ◽

Author(s):

David Porubsky ◽

◽

Peter Ebert ◽

Peter A. Audano ◽

Mitchell R. Vollger ◽

...

Keyword(s):

Single Cell ◽

Genome Assembly ◽

De Novo ◽

Error Rates ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

De Novo Genome Assembly ◽

Parental Data ◽

Human Genome Assembly ◽

Long Read

AbstractHuman genomes are typically assembled as consensus sequences that lack information on parental haplotypes. Here we describe a reference-free workflow for diploid de novo genome assembly that combines the chromosome-wide phasing and scaffolding capabilities of single-cell strand sequencing1,2 with continuous long-read or high-fidelity3 sequencing data. Employing this strategy, we produced a completely phased de novo genome assembly for each haplotype of an individual of Puerto Rican descent (HG00733) in the absence of parental data. The assemblies are accurate (quality value > 40) and highly contiguous (contig N50 > 23 Mbp) with low switch error rates (0.17%), providing fully phased single-nucleotide variants, indels and structural variants. A comparison of Oxford Nanopore Technologies and Pacific Biosciences phased assemblies identified 154 regions that are preferential sites of contig breaks, irrespective of sequencing technology or phasing algorithms.

Download Full-text

De novo genome assembly and Hi-C analysis reveal an association between chromatin architecture alterations and sex differentiation in the woody plant Jatropha curcas

GigaScience ◽

10.1093/gigascience/giaa009 ◽

2020 ◽

Vol 9 (2) ◽

Cited By ~ 2

Author(s):

Mao-Sheng Chen ◽

Longjian Niu ◽

Mei-Li Zhao ◽

Chuanjia Xu ◽

Bang-Zhen Pan ◽

...

Keyword(s):

Gene Transcription ◽

Genome Assembly ◽

Jatropha Curcas ◽

Woody Plants ◽

Woody Plant ◽

De Novo ◽

Sex Differentiation ◽

Chromatin Organization ◽

De Novo Genome Assembly ◽

Chromatin Architecture

Abstract Background Chromatin architecture is an essential factor regulating gene transcription in different cell types and developmental phases. However, studies on chromatin architecture in perennial woody plants and on the function of chromatin organization in sex determination have not been reported. Results Here, we produced a chromosome-scale de novo genome assembly of the woody plant Jatropha curcas with a total length of 379.5 Mb and a scaffold N50 of 30.7 Mb using Pacific Biosciences long reads combined with genome-wide chromosome conformation capture (Hi-C) technology. Based on this high-quality reference genome, we detected chromatin architecture differences between monoecious and gynoecious inflorescence buds of Jatropha. Differentially expressed genes were significantly enriched in the changed A/B compartments and topologically associated domain regions and occurred preferentially in differential contact regions between monoecious and gynoecious inflorescence buds. Twelve differentially expressed genes related to flower development or hormone synthesis displayed significantly different genomic interaction patterns in monoecious and gynoecious inflorescence buds. These results demonstrate that chromatin organization participates in the regulation of gene transcription during the process of sex differentiation in Jatropha. Conclusions We have revealed the features of chromatin architecture in perennial woody plants and investigated the possible function of chromatin organization in Jatropha sex differentiation. These findings will facilitate understanding of the regulatory mechanisms of sex determination in higher plants.

Download Full-text

A Practical Comparison of De Novo Genome Assembly Software Tools for Next-Generation Sequencing Technologies

PLoS ONE ◽

10.1371/journal.pone.0017915 ◽

2011 ◽

Vol 6 (3) ◽

pp. e17915 ◽

Cited By ~ 144

Author(s):

Wenyu Zhang ◽

Jiajia Chen ◽

Yang Yang ◽

Yifei Tang ◽

Jing Shang ◽

...

Keyword(s):

Next Generation Sequencing ◽

Genome Assembly ◽

De Novo ◽

Software Tools ◽

Next Generation ◽

De Novo Genome Assembly ◽

Sequencing Technologies ◽

Generation Sequencing ◽

Assembly Software

Download Full-text

Hybrid de novo Genome Assembly of Erwinia sp. E602 and Bioinformatic Analysis Characterized a New Plasmid-Borne lac Operon Under Positive Selection

Frontiers in Microbiology ◽

10.3389/fmicb.2021.783195 ◽

2021 ◽

Vol 12 ◽

Author(s):

Yu Xia ◽

Zhi-Yuan Wei ◽

Rui He ◽

Jia-Huan Li ◽

Zhi-Xin Wang ◽

...

Keyword(s):

Positive Selection ◽

Genome Assembly ◽

De Novo ◽

Bioinformatic Analysis ◽

Lac Operon ◽

Pacbio Sequencing ◽

Metabolic Pathway Analysis ◽

De Novo Genome Assembly ◽

Sequencing Technologies ◽

Lactose Metabolism

Our previous study identified a new β-galactosidase in Erwinia sp. E602. To further understand the lactose metabolism in this strain, de novo genome assembly was conducted by using a strategy combining Illumina and PacBio sequencing technology. The whole genome of Erwinia sp. E602 includes a 4.8 Mb chromosome and a 326 kb large plasmid. A total of 4,739 genes, including 4,543 protein-coding genes, 25 rRNAs, 82 tRNAs and 7 other ncRNAs genes were annotated. The plasmid was the largest one characterized in genus Erwinia by far, and it contained a number of genes and pathways responsible for lactose metabolism and regulation. Moreover, a new plasmid-borne lac operon that lacked a typical β-galactoside transacetylase (lacA) gene was identified in the strain. Phylogenetic analysis showed that the genes lacY and lacZ in the operon were under positive selection, indicating the adaptation of lactose metabolism to the environment in Erwinia sp. E602. Our current study demonstrated that the hybrid de novo genome assembly using Illumina and PacBio sequencing technologies, as well as the metabolic pathway analysis, provided a useful strategy for better understanding of the evolution of undiscovered microbial species or strains.

Download Full-text

De Novo genome assembly for third generation sequencing data

Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments 2018 ◽

10.1117/12.2501543 ◽

2018 ◽

Author(s):

Robert M. Nowak ◽

Mateusz Forc ◽

Wiktor Kuśmirek

Keyword(s):

Genome Assembly ◽

De Novo ◽

Third Generation ◽

Sequencing Data ◽

De Novo Genome Assembly ◽

Third Generation Sequencing ◽

Generation Sequencing

Download Full-text

De Novo Genome Assembly of Populus simonii Further Supports That Populus simonii and Populus trichocarpa Belong to Different Sections

G3 Genes|Genome|Genetics ◽

10.1534/g3.119.400913 ◽

2019 ◽

Vol 10 (2) ◽

pp. 455-466

Author(s):

Hainan Wu ◽

Dan Yao ◽

Yuhua Chen ◽

Wenguo Yang ◽

Wei Zhao ◽

...

Keyword(s):

Genome Sequence ◽

Genome Assembly ◽

De Novo ◽

Populus Trichocarpa ◽

Hybrid Population ◽

Linkage Maps ◽

Sequencing Data ◽

De Novo Genome Assembly ◽

Populus Simonii ◽

Integrity Assessment

Populus simonii is an important tree in the genus Populus, widely distributed in the Northern Hemisphere and having a long cultivation history. Although this species has ecologically and economically important values, its genome sequence is currently not available, hindering the development of new varieties with wider adaptive and commercial traits. Here, we report a chromosome-level genome assembly of P. simonii using PacBio long-read sequencing data aided by Illumina paired-end reads and related genetic linkage maps. The assembly is 441.38 Mb in length and contain 686 contigs with a contig N50 of 1.94 Mb. With the linkage maps, 336 contigs were successfully anchored into 19 pseudochromosomes, accounting for 90.2% of the assembled genome size. Genomic integrity assessment showed that 1,347 (97.9%) of the 1,375 genes conserved among all embryophytes can be found in the P. simonii assembly. Genomic repeat analysis revealed that 41.47% of the P. simonii genome is composed of repetitive elements, of which 40.17% contained interspersed repeats. A total of 45,459 genes were predicted from the P. simonii genome sequence and 39,833 (87.6%) of the genes were annotated with one or more related functions. Phylogenetic analysis indicated that P. simonii and Populus trichocarpa should be placed in different sections, contrary to the previous classification according to morphology. The genome assembly not only provides an important genetic resource for the comparative and functional genomics of different Populus species, but also furnishes one of the closest reference sequences for identifying genomic variants in an F1 hybrid population derived by crossing P. simonii with other Populus species.

Download Full-text

Effects of GC Bias in Next-Generation-Sequencing Data on De Novo Genome Assembly

PLoS ONE ◽

10.1371/journal.pone.0062856 ◽

2013 ◽

Vol 8 (4) ◽

pp. e62856 ◽

Cited By ~ 121

Author(s):

Yen-Chun Chen ◽

Tsunglin Liu ◽

Chun-Hui Yu ◽

Tzen-Yuh Chiang ◽

Chi-Chuan Hwang

Keyword(s):

Next Generation Sequencing ◽

Genome Assembly ◽

De Novo ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

De Novo Genome Assembly ◽

Gc Bias ◽

Generation Sequencing

Download Full-text

De novo Nanopore read quality improvement using deep learning

BMC Bioinformatics ◽

10.1186/s12859-019-3103-z ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 4

Author(s):

Nathan LaPierre ◽

Rob Egan ◽

Wei Wang ◽

Zhong Wang

Keyword(s):

Error Correction ◽

Genome Assembly ◽

Large Scale ◽

De Novo ◽

Error Rates ◽

De Novo Genome Assembly ◽

Sequencing Technologies ◽

Oxford Nanopore ◽

Long Read ◽

Read Error Correction

Abstract Background Long read sequencing technologies such as Oxford Nanopore can greatly decrease the complexity of de novo genome assembly and large structural variation identification. Currently Nanopore reads have high error rates, and the errors often cluster into low-quality segments within the reads. The limited sensitivity of existing read-based error correction methods can cause large-scale mis-assemblies in the assembled genomes, motivating further innovation in this area. Results Here we developed a Convolutional Neural Network (CNN) based method, called MiniScrub, for identification and subsequent “scrubbing” (removal) of low-quality Nanopore read segments to minimize their interference in downstream assembly process. MiniScrub first generates read-to-read overlaps via MiniMap2, then encodes the overlaps into images, and finally builds CNN models to predict low-quality segments. Applying MiniScrub to real world control datasets under several different parameters, we show that it robustly improves read quality, and improves read error correction in the metagenome setting. Compared to raw reads, de novo genome assembly with scrubbed reads produces many fewer mis-assemblies and large indel errors. Conclusions MiniScrub is able to robustly improve read quality of Oxford Nanopore reads, especially in the metagenome setting, making it useful for downstream applications such as de novo assembly. We propose MiniScrub as a tool for preprocessing Nanopore reads for downstream analyses. MiniScrub is open-source software and is available at https://bitbucket.org/berkeleylab/jgi-miniscrub.

Download Full-text

Benchmarking metagenomic classification tools for long-read sequencing data

10.1101/2020.11.25.397729 ◽

2020 ◽

Author(s):

Josip Marić ◽

Krešimir Križanović ◽

Sylvain Riondet ◽

Niranjan Nagarajan ◽

Mile Šikić

Keyword(s):

De Novo ◽

Real Life ◽

Metagenomic Analysis ◽

Sequencing Data ◽

De Novo Genome Assembly ◽

Sequencing Technologies ◽

Long Reads ◽

Species Abundances ◽

Long Read ◽

Eukaryotic Genomes

ABSTRACTIn recent years, both long-read sequencing and metagenomic analysis have been significantly advanced. Although long-read sequencing technologies have been primarily used for de novo genome assembly, they are rapidly maturing for widespread use in other applications. In particular, long reads could potentially lead to more precise taxonomic identification, which has sparked an interest in using them for metagenomic analysis.Here we present a benchmark of several state-of-the-art tools for metagenomic taxonomic classification, tested on in-silico datasets constructed using real long reads from isolate sequencing. We compare tools that were either newly developed or modified to work with long reads, including k-mer based tools Kraken2, Centrifuge and CLARK, and mapping-based tools MetaMaps and MEGAN-LR. The test datasets were constructed with varying numbers of bacterial and eukaryotic genomes to simulate different real-life metagenomic applications. The tools were tested to detect species accurately and precisely estimate species abundances in the samples.Our analysis shows that all tested classifiers provide useful results, and the composition of the used database strongly influences the performance. Using the same database, tested tools achieve comparable results except for MetaMaps, which slightly outperform others in most metrics, but it is significantly slower than k-mer based tools.We deem there is significant room for improvement for all tested tools, especially in lowering the number of false-positive detections.

Download Full-text