De novo genome assembly and single nucleotide variations for Soybean yellow common mosaic virus using soybean flower bud transcriptome data

Yeonhwa Jo; Hoseong Choi; Sang-Min Kim; Bong Choon Lee; Won Kyong Cho

doi:10.3839/jabc.2020.026

De novo Genome Assembly and Single Nucleotide Variations for Soybean Mosaic Virus Using Soybean Seed Transcriptome Data

The Plant Pathology Journal ◽

10.5423/ppj.oa.03.2017.0060 ◽

2017 ◽

Vol 33 (5) ◽

pp. 478-487 ◽

Cited By ~ 2

Author(s):

Yeonhwa Jo ◽

Hoseong Choi ◽

Miah Bae ◽

Sang-Min Kim ◽

Sun-Lim Kim ◽

...

Keyword(s):

Mosaic Virus ◽

Genome Assembly ◽

De Novo ◽

Soybean Mosaic Virus ◽

Soybean Seed ◽

Transcriptome Data ◽

De Novo Genome Assembly ◽

Single Nucleotide ◽

Single Nucleotide Variations

Download Full-text

To Trim or Not to Trim: Effects of Read Trimming on the De Novo Genome Assembly of a Widespread East Asian Passerine, the Rufous-Capped Babbler (Cyanoderma ruficeps Blyth)

Genes ◽

10.3390/genes10100737 ◽

2019 ◽

Vol 10 (10) ◽

pp. 737 ◽

Cited By ~ 4

Author(s):

Shang-Fang Yang ◽

Chia-Wei Lu ◽

Cheng-Te Yao ◽

Chih-Ming Hung

Keyword(s):

Genome Assembly ◽

Sex Chromosome ◽

De Novo ◽

Gene Annotation ◽

East Asian ◽

Computational Time ◽

Nucleotide Polymorphisms ◽

De Novo Genome Assembly ◽

Single Nucleotide ◽

Pros And Cons

Trimming low quality bases from sequencing reads is considered as routine procedure for genome assembly; however, we know little about its pros and cons. Here, we used empirical data to examine how read trimming affects assembled genome quality and computational time for a widespread East Asian passerine, the rufous-capped babbler (Cyanoderma ruficeps Blyth). We found that scaffolds assembled from raw reads were always longer than those from trimmed ones, whereas computational times for the former were sometimes much longer than the latter. Nevertheless, assembly completeness showed little difference among the trimming strategies. One should determine the optimal trimming strategy based on what the assembled genome will be used for. For example, to identify single nucleotide polymorphisms (SNPs) associated with phenotypic evolution, applying PLATANUS to gently trim reads would yield a reference genome with a slightly shorter scaffold length (N50 = 15.64 vs. 16.89 Mb) than the raw reads, but would save 75% of computational time. We also found that chromosomes Z, W, and 4A of the rufous-capped babbler were poorly assembled, likely due to a recently fused, neo-sex chromosome. The rufous-capped babbler genome with long scaffolds and quality gene annotation can provide a good system to study avian ecological adaptation in East Asia.

Download Full-text

De Novo Genome Assembly of Ryegrass Mosaic Virus from a Ryegrass Transcriptome

Genome Announcements ◽

10.1128/genomea.00497-15 ◽

2015 ◽

Vol 3 (3) ◽

Author(s):

Yeonhwa Jo ◽

Hoseong Choi ◽

Won Kyong Cho

Keyword(s):

Mosaic Virus ◽

Genome Assembly ◽

De Novo ◽

De Novo Genome Assembly

Download Full-text

Improved hybrid de novo genome assembly of domesticated apple (Malus x domestica)

GigaScience ◽

10.1186/s13742-016-0139-0 ◽

2016 ◽

Vol 5 (1) ◽

Cited By ~ 28

Author(s):

Xuewei Li ◽

Ling Kui ◽

Jing Zhang ◽

Yinpeng Xie ◽

Liping Wang ◽

...

Keyword(s):

Genome Assembly ◽

De Novo ◽

Malus X Domestica ◽

De Novo Genome Assembly

Download Full-text

Meraculous: De Novo Genome Assembly with Short Paired-End Reads

PLoS ONE ◽

10.1371/journal.pone.0023501 ◽

2011 ◽

Vol 6 (8) ◽

pp. e23501 ◽

Cited By ~ 107

Author(s):

Jarrod A. Chapman ◽

Isaac Ho ◽

Sirisha Sunkara ◽

Shujun Luo ◽

Gary P. Schroth ◽

...

Keyword(s):

Genome Assembly ◽

De Novo ◽

De Novo Genome Assembly

Download Full-text

Ultra Efficient Acceleration for De Novo Genome Assembly via Near-Memory Computing

10.1109/pact52795.2021.00022 ◽

2021 ◽

Author(s):

Minxuan Zhou ◽

Lingxi Wu ◽

Muzhou Li ◽

Niema Moshiri ◽

Kevin Skadron ◽

...

Keyword(s):

Genome Assembly ◽

De Novo ◽

De Novo Genome Assembly

Download Full-text

De novo Genome Assembly from Next-Generation Sequencing (NGS) Reads

Next-Generation Sequencing Data Analysis ◽

10.1201/b19532-11 ◽

2016 ◽

pp. 144-155

Keyword(s):

Next Generation Sequencing ◽

Genome Assembly ◽

De Novo ◽

Next Generation ◽

De Novo Genome Assembly ◽

Next Generation Sequencing Ngs ◽

Generation Sequencing

Download Full-text

Optimizing de novo genome assembly from PCR-amplified metagenomes

PeerJ ◽

10.7717/peerj.6902 ◽

2019 ◽

Vol 7 ◽

pp. e6902 ◽

Cited By ~ 9

Author(s):

Simon Roux ◽

Gareth Trubl ◽

Danielle Goudeau ◽

Nandita Nath ◽

Estelle Couradeau ◽

...

Keyword(s):

Genome Assembly ◽

De Novo ◽

Pcr Amplification ◽

Error Rates ◽

De Novo Genome Assembly ◽

Low Input ◽

Assembly Algorithm ◽

Coverage Bias ◽

Size Number ◽

Assembly Pipeline

Background Metagenomics has transformed our understanding of microbial diversity across ecosystems, with recent advances enabling de novo assembly of genomes from metagenomes. These metagenome-assembled genomes are critical to provide ecological, evolutionary, and metabolic context for all the microbes and viruses yet to be cultivated. Metagenomes can now be generated from nanogram to subnanogram amounts of DNA. However, these libraries require several rounds of PCR amplification before sequencing, and recent data suggest these typically yield smaller and more fragmented assemblies than regular metagenomes. Methods Here we evaluate de novo assembly methods of 169 PCR-amplified metagenomes, including 25 for which an unamplified counterpart is available, to optimize specific assembly approaches for PCR-amplified libraries. We first evaluated coverage bias by mapping reads from PCR-amplified metagenomes onto reference contigs obtained from unamplified metagenomes of the same samples. Then, we compared different assembly pipelines in terms of assembly size (number of bp in contigs ≥ 10 kb) and error rates to evaluate which are the best suited for PCR-amplified metagenomes. Results Read mapping analyses revealed that the depth of coverage within individual genomes is significantly more uneven in PCR-amplified datasets versus unamplified metagenomes, with regions of high depth of coverage enriched in short inserts. This enrichment scales with the number of PCR cycles performed, and is presumably due to preferential amplification of short inserts. Standard assembly pipelines are confounded by this type of coverage unevenness, so we evaluated other assembly options to mitigate these issues. We found that a pipeline combining read deduplication and an assembly algorithm originally designed to recover genomes from libraries generated after whole genome amplification (single-cell SPAdes) frequently improved assembly of contigs ≥10 kb by 10 to 100-fold for low input metagenomes. Conclusions PCR-amplified metagenomes have enabled scientists to explore communities traditionally challenging to describe, including some with extremely low biomass or from which DNA is particularly difficult to extract. Here we show that a modified assembly pipeline can lead to an improved de novo genome assembly from PCR-amplified datasets, and enables a better genome recovery from low input metagenomes.

Download Full-text

Accurate long-read de novo assembly evaluation with Inspector

Genome Biology ◽

10.1186/s13059-021-02527-4 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Yu Chen ◽

Yixin Zhang ◽

Amy Y. Wang ◽

Min Gao ◽

Zechen Chong

Keyword(s):

Genome Assembly ◽

De Novo Assembly ◽

In Silico ◽

Large Scale ◽

De Novo ◽

Small Scale ◽

De Novo Genome Assembly ◽

Consensus Sequences ◽

Assembly Evaluation ◽

Long Read

AbstractLong-read de novo genome assembly continues to advance rapidly. However, there is a lack of effective tools to accurately evaluate the assembly results, especially for structural errors. We present Inspector, a reference-free long-read de novo assembly evaluator which faithfully reports types of errors and their precise locations. Notably, Inspector can correct the assembly errors based on consensus sequences derived from raw reads covering erroneous regions. Based on in silico and long-read assembly results from multiple long-read data and assemblers, we demonstrate that in addition to providing generic metrics, Inspector can accurately identify both large-scale and small-scale assembly errors.

Download Full-text

Implications of Genetic Distance to Reference and De Novo Genome Assembly for Clinical Genomics in Africans

10.1101/2020.09.25.20201780 ◽

2020 ◽

Author(s):

Daniel Shriner ◽

Adebowale Adeyemo ◽

Charles Rotimi

Keyword(s):

Genetic Distance ◽

De Novo ◽

Reference Sequence ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

De Novo Genome Assembly ◽

Single Nucleotide ◽

Clinical Genomics ◽

Advantages And Disadvantages ◽

False Discovery

In clinical genomics, variant calling from short-read sequencing data typically relies on a pan-genomic, universal human reference sequence. A major limitation of this approach is that the number of reads that incorrectly map or fail to map increase as the reads diverge from the reference sequence. In the context of genome sequencing of genetically diverse Africans, we investigate the advantages and disadvantages of using a de novo assembly of the read data as the reference sequence in single sample calling. Conditional on sufficient read depth, the alignment-based and assembly-based approaches yielded comparable sensitivity and false discovery rates for single nucleotide variants when benchmarked against a gold standard call set. The alignment-based approach yielded coverage of an additional 270.8 Mb over which sensitivity was lower and the false discovery rate was higher. Although both approaches detected and missed clinically relevant variants, the assembly-based approach identified more such variants than the alignment-based approach. Of particular relevance to individuals of African descent, the assembly-based approach identified four heterozygous genotypes containing the sickle allele whereas the alignment-based approach identified no occurrences of the sickle allele. Variant annotation using dbSNP and gnomAD identified systematic biases in these databases due to underrepresentation of Africans. Using the counts of homozygous alternate genotypes from the alignment-based approach as a measure of genetic distance to the reference sequence GRCh38.p12, we found that the numbers of misassemblies, total variant sites, potentially novel single nucleotide variants (SNVs), and certain variant classes (e.g., splice acceptor variants, stop loss variants, missense variants, synonymous variants, and variants absent from gnomAD) were significantly correlated with genetic distance. In contrast, genomic coverage and other variant classes (e.g., ClinVar pathogenic or likely pathogenic variants, start loss variants, stop gain variants, splice donor variants, incomplete terminal codons, variants with CADD score ≥20) were not correlated with genetic distance. With improvement in coverage, the assembly-based approach can offer a viable alternative to the alignment-based approach, with the advantage that it can obviate the need to generate diverse human reference sequences or collections of alternate scaffolds.

Download Full-text