The effect of variant interference on de novo assembly for viral deep sequencing

Mapping Intimacies ◽

10.1101/815480 ◽

2019 ◽

Cited By ~ 1

Author(s):

Christina J. Castro ◽

Rachel L. Marine ◽

Edward Ramos ◽

Terry Fei Fan Ng

Keyword(s):

Deep Sequencing ◽

De Novo ◽

Gc Content ◽

Read Length ◽

Viral Genomes ◽

Minor Variant ◽

Main Driver ◽

Next Generation Sequencing Ngs ◽

Viral Sequences ◽

Generation Sequencing

AbstractViruses have high mutation rates and generally exist as a mixture of variants in biological samples. Next-generation sequencing (NGS) approach has surpassed Sanger for generating long viral sequences, yet how variants affect NGS de novo assembly remains largely unexplored. Our results from >15,000 simulated experiments showed that presence of variants can turn an assembly of one genome into tens to thousands of contigs. This “variant interference” (VI) is highly consistent and reproducible by ten most used de novo assemblers, and occurs independent of genome length, read length, and GC content. The main driver of VI is pairwise identities between viral variants. These findings were further supported by in silico simulations, where selective removal of minor variant reads from clinical datasets allow the “rescue” of full viral genomes from fragmented contigs. These results call for careful interpretation of contigs and contig numbers from de novo assembly in viral deep sequencing.

Download Full-text

HAPHPIPE: Haplotype Reconstruction and Phylodynamics for Deep Sequencing of Intra-Host Viral Populations

Molecular Biology and Evolution ◽

10.1093/molbev/msaa315 ◽

2020 ◽

Author(s):

Matthew L Bendall ◽

Keylie M Gibson ◽

Margaret C Steiner ◽

Uzma Rentia ◽

Marcos Pérez-Losada ◽

...

Keyword(s):

Deep Sequencing ◽

De Novo ◽

Consensus Sequence ◽

Haplotype Reconstruction ◽

Consensus Sequences ◽

Genome Wide ◽

Genomic Regions ◽

Next Generation Sequencing Ngs ◽

Ngs Data ◽

Generation Sequencing

Abstract Deep sequencing of viral populations using next generation sequencing (NGS) offers opportunities to understand and investigate evolution, transmission dynamics, and population genetics. Currently, the standard practice for processing NGS data to study viral populations is to summarize all the observed sequences from a sample as a single consensus sequence, thus discarding valuable information about the intra-host viral molecular epidemiology. Furthermore, existing analytical pipelines may only analyze genomic regions involved in drug resistance, thus are not suited for full viral genome analysis. Here we present HAPHPIPE, a HAplotype and PHylodynamics PIPEline for genome-wide assembly of viral consensus sequences and haplotypes. The HAPHPIPE protocol includes modules for quality trimming, error correction, de novo assembly, alignment, and haplotype reconstruction. The resulting consensus sequences, haplotypes, and alignments can be further analyzed using a variety of phylogenetic and population genetic software. HAPHPIPE is designed to provide users with a single pipeline to rapidly analyze sequences from viral populations generated from NGS platforms and provide quality output properly formatted for downstream evolutionary analyses.

Download Full-text

Comparing fragment size with read-length and their effect on HLA genotyping by next-generation sequencing (NGS)

Human Immunology ◽

10.1016/j.humimm.2015.07.072 ◽

2015 ◽

Vol 76 ◽

pp. 52

Author(s):

Tracie Profaizer ◽

Eszter Lazar-Molnar ◽

Julio C. Delgado ◽

Attila Kumanovics

Keyword(s):

Next Generation Sequencing ◽

Fragment Size ◽

Read Length ◽

Next Generation ◽

Hla Genotyping ◽

Next Generation Sequencing Ngs ◽

Generation Sequencing

Download Full-text

De novo Genome Assembly from Next-Generation Sequencing (NGS) Reads

Next-Generation Sequencing Data Analysis ◽

10.1201/b19532-11 ◽

2016 ◽

pp. 144-155

Keyword(s):

Next Generation Sequencing ◽

Genome Assembly ◽

De Novo ◽

Next Generation ◽

De Novo Genome Assembly ◽

Next Generation Sequencing Ngs ◽

Generation Sequencing

Download Full-text

Deep Sequencing of Recombinant Virus Populations in Transgenic and Nontransgenic Plants Infected with Cucumber mosaic virus

Molecular Plant-Microbe Interactions ◽

10.1094/mpmi-02-13-0057-r ◽

2013 ◽

Vol 26 (7) ◽

pp. 801-811 ◽

Cited By ~ 6

Author(s):

Marco Morroni ◽

Mireille Jacquemond ◽

Mark Tepfer

Keyword(s):

Transgenic Plants ◽

Cucumber Mosaic Virus ◽

Mosaic Virus ◽

De Novo ◽

Recombinant Viruses ◽

Viral Genomes ◽

Large Deletions ◽

Cp Gene ◽

Viral Sequences ◽

Virus Variability

Recombination is a major source of virus variability, and the question of whether novel recombinant viruses would emerge in transgenic plants expressing viral sequences has been a biosafety issue. We describe the results of pyrosequencing the recombinant viral RNAs appearing in transgenic plants expressing the coat protein (CP) gene and 3′ noncoding region of Cucumber mosaic virus RNA3, as well as in nontransgenic controls. The populations of recombinants in both transgenic and nontransgenic plants were similar to those previously described from Sanger sequencing but many more recombinant types were observed, including a novel class of large deletions removing all or nearly the entire CP gene. These results show that populations of recombinant viral genomes arising de novo can be characterized in detail by pyrosequencing, and confirm that the transgenic plants did not harbor novel recombinants of biosafety concern.

Download Full-text

Genome size and identification of abundant repetitive sequences in Vallisneria spinulosa

PeerJ ◽

10.7717/peerj.3982 ◽

2017 ◽

Vol 5 ◽

pp. e3982 ◽

Cited By ~ 3

Author(s):

RuiJuan Feng ◽

Xin Wang ◽

Min Tao ◽

Guanchao Du ◽

Qishuo Wang

Keyword(s):

Genome Size ◽

Aquatic Plant ◽

Nuclear Dna ◽

De Novo ◽

Repetitive Sequences ◽

Nuclear Dna Content ◽

Ltr Retrotransposons ◽

Sequencing Data ◽

Next Generation Sequencing Ngs ◽

Generation Sequencing

Vallisneria spinulosa is a freshwater aquatic plant of ecological and economic importance. However, there is limited cytogenetic and genomics information on Vallisneria. In this study, we measured the nuclear DNA content of Vallisneria spinulosa by flow cytometry, performed a de novo assembly, and annotated repetitive sequences by using a combination of next-generation sequencing (NGS) and bioinformatics tools. The genome size of Vallisneria spinulosa is approximately 3,595 Mbp, in which nearly 60% of the genome consists of repetitive sequences. The majority of the repetitive sequences are LTR-retrotransposons comprising 43% of the genome. Although the amount of sequencing data used in this study was not sufficient for a whole-genome assembly, it could generate an overview of representative elements in the genome. These results will lay a new foundation for further studies on various species that belong to the Vallisneria genus.

Download Full-text

High-Quality Genome Resource of the Pathogen of Botryosphaeria dothidea Causing Kiwifruit Soft Rot

PhytoFrontiers™ ◽

10.1094/phytofr-07-20-0006-a ◽

2021 ◽

pp. PHYTOFR-07-20-0

Author(s):

Kuan Liang ◽

Jianbin Lan ◽

Baoquan Wang ◽

Yuanyuan Liu ◽

Qi Lu ◽

...

Keyword(s):

De Novo ◽

Gc Content ◽

Soft Rot ◽

Read Length ◽

Comparative Genomic ◽

Secretory Proteins ◽

Botryosphaeria Dothidea ◽

High Quality ◽

Total Size ◽

High Quality Genome

Kiwifruit soft rot caused by the fungal pathogen Botryosphaeria dothidea is a serious disease in kiwifruit-growing regions worldwide. In this study, we reported the high-quality genome sequence of the highly virulent B. dothidea strain PTZ1 using PacBio Sequel techniques. In total, 100.87 million clean reads with mean read length of 9,871 bp were obtained. De novo assembly resulted in 28 contigs with a total size of 44.45 Mb. The GC content of the genome was 54.59%. Furthermore, genes related to specific virulence of the strain were identified, including 259 fungal cytochrome P450s, 550 carbohydrate-active enzymes, 860 secretory proteins, and 1,182 pathogen–host interactions related proteins. The genome is a useful resource to serve as a reference to facilitate the analysis of B. dothidea isolates and comparative genomic studies of the necrotroph pathogens. [Formula: see text] Copyright © 2021 The Author(s). This is an open access article distributed under the CC BY-NC-ND 4.0 International license .

Download Full-text

The Complete Chloroplast Genome of Critically Endangered Chimonobambusa hirtinoda (Poaceae: Chimonobambusa) and Phylogenetic Analysis

10.21203/rs.3.rs-1019626/v1 ◽

2021 ◽

Author(s):

yanjiang liu ◽

Xiao Zhu ◽

Mingli Wu ◽

Xue Xu ◽

Zhaoxia Dai ◽

...

Keyword(s):

Phylogenetic Analysis ◽

Gc Content ◽

Trna Genes ◽

Protein Coding ◽

Complete Chloroplast Genome ◽

Usage Frequency ◽

Cp Genome ◽

Next Generation Sequencing Ngs ◽

Generation Sequencing ◽

Simple Sequence

Abstract Chimonobambusa hirtinoda is a threatened species and only naturally distributed in Doupeng Mountain, Duyun, Guizhou, China. Next-generation sequencing (NGS) is used obtained the complete chloroplast (cp) genome sequence of C. hirtinoda, and then the sequence was assembled and analyze for phylogenetic and evolutionary. We also analyzed comparing the cp genome among Chimonobambusa species with previously published. The complete cp genome of C. hirtinoda has the total length of 139, 561 bp, 38.90% GC content was detected. A total of 130 genes were founded in the cp genome, including 85 protein coding genes, 37 tRNA genes, 8 rRNA. Some genes are missing and the introns occur lost in the cp genome of C. hirtinoda. A total of 48 simple sequence repeat (SSR) were detected and by measuring the codon usage frequency of amino acids, the A/U preference of the third nucleotide in the cp genome of C. hirtinoda was obtained. Furthermore, phylogenetic analysis using complete cp sequences, matk gene exhibited genetic relationship within the Chimonobambusa genus.

Download Full-text

High Genomic Variability in Equine Infectious Anemia Virus Obtained from Naturally Infected Horses in Pantanal, Brazil: An Endemic Region Case

Viruses ◽

10.3390/v12020207 ◽

2020 ◽

Vol 12 (2) ◽

pp. 207 ◽

Cited By ~ 2

Author(s):

Camila Dantas Malossi ◽

Eduardo Gorzoni Fioratti ◽

Jedson Ferreira Cardoso ◽

Angelo Jose Magro ◽

Erna Geessien Kroon ◽

...

Keyword(s):

Amino Acid ◽

Equine Infectious Anemia Virus ◽

Genomic Sequences ◽

Endemic Region ◽

Viral Genomes ◽

Equine Infectious Anemia ◽

Anemia Virus ◽

Next Generation Sequencing Ngs ◽

First Time ◽

Generation Sequencing

Equine infectious anemia virus (EIAV) is a persistent lentivirus that causes equine infectious anemia (EIA). In Brazil, EIAV is endemic in the Pantanal region, and euthanasia is not mandatory in this area. All of the complete genomic sequences from field viruses are from North America, Asia, and Europe, and only proviral genomic sequences are available. Sequences from Brazilian EIAV are currently available only for gag and LTR regions. Thus, the present study aimed for the first time to sequence the entire EIAV genomic RNA in naturally infected horses from an endemic area in Brazil. RNA in plasma from naturally infected horses was used for next-generation sequencing (NGS), and gaps were filled using Sanger sequencing methodology. Complete viral genomes of EIAV from two horses were obtained and annotated (Access Number: MN560970 and MN560971). Putative genes were analyzed and compared with previously described genes, showing conservation in gag and pol genes and high variations in LTR and env sequences. Amino acid changes were identified in the p26 protein, one of the most common targets used for diagnosis, and p26 molecular modelling showed surface amino acid alterations in some epitopes. Brazilian genome sequences presented 88.6% nucleotide identity with one another and 75.8 to 77.3% with main field strains, such as EIAV Liaoning, Wyoming, Ireland, and Italy isolates. Furthermore, phylogenetic analysis suggested that this Brazilian strain comprises a separate monophyletic group. These results may help to better characterize EIAV and to overcome the challenges of diagnosing and controlling EIA in endemic regions.

Download Full-text

Species-specific transcriptional profiles of the gut and gut microbiome of Ceratitis quilicii and Ceratitis rosa sensu stricto

Scientific Reports ◽

10.1038/s41598-019-54989-z ◽

2019 ◽

Vol 9 (1) ◽

Cited By ~ 1

Author(s):

Fathiya M. Khamis ◽

Paul O. Mireji ◽

Fidelis L. O. Ombura ◽

Anna R. Malacrida ◽

Erick O. Awuoche ◽

...

Keyword(s):

Sibling Species ◽

Gut Microbiome ◽

De Novo ◽

Differential Expression Analysis ◽

Fruit Fly ◽

Sensu Stricto ◽

Transcriptional Profiles ◽

Species Specific ◽

Next Generation Sequencing Ngs ◽

Generation Sequencing

AbstractThe fruit fly species, Ceratitis rosa sensu stricto and Ceratitis quilicii, are sibling species restricted to the lowland and highland regions, respectively. Until recently, these sibling species were considered as allopatric populations of C. rosa with distinct bionomics. We used deep Next Generation Sequencing (NGS) technology on intact guts of individuals from the two sibling species to compare their transcriptional profiles and simultaneously understand gut microbiome and host molecular processes and identify distinguishing genetic differences between the two species. Since the genomes of both species had not been published previously, the transcriptomes were assembled de novo into transcripts. Microbe-specific transcript orthologs were separated from the assembly by filtering searches of the transcripts against microbe databases using OrthoMCL. We then used differential expression analysis of host-specific transcripts (i.e. those remaining after the microbe-specific transcripts had been removed) and microbe-specific transcripts from the two-sibling species to identify defining species-specific transcripts that were present in only one fruit fly species or the other, but not in both. In C. quilicii females, bacterial transcripts of Pectobacterium spp., Enterobacterium buttiauxella, Enterobacter cloacae and Klebsiella variicola were upregulated compared to the C. rosa s.s. females. Comparison of expression levels of the host transcripts revealed a heavier investment by C. quilicii (compared with C. rosa s.s.) in: immunity; energy production; cell proliferation; insecticide resistance; reproduction and proliferation; and redox reactions that are usually associated with responses to stress and degradation of fruit metabolites.

Download Full-text

A Streamlined and High-Throughput Error-Corrected Next-Generation Sequencing Method for Low Variant Allele Frequency Quantitation

Toxicological Sciences ◽

10.1093/toxsci/kfz221 ◽

2019 ◽

Author(s):

Page B McKinzie ◽

Michelle E Bishop

Keyword(s):

Next Generation Sequencing ◽

Suppressor Gene ◽

Tumor Formation ◽

Read Length ◽

Next Generation ◽

Sequencing Method ◽

Proportional Increase ◽

Duplex Sequencing ◽

Next Generation Sequencing Ngs ◽

Generation Sequencing

Abstract Quantifying mutant or variable allele frequencies (VAFs) of ≤10−3 using next-generation sequencing (NGS) has utility in both clinical and nonclinical settings. Two common approaches for quantifying VAFs using NGS are tagged single-strand sequencing and duplex sequencing. While duplex sequencing is reported to have sensitivity up to 10−8 VAF, it is not a quick, easy, or inexpensive method. We report a method for quantifying VAFs that are ≥10−4 that is as easy and quick for processing samples as standard sequencing kits, yet less expensive than the kits. The method was developed using PCR fragment-based VAFs of Kras codon 12 in log10 increments from 10−5 to 10−1, then applied and tested on native genomic DNA. For both sources of DNA, there is a proportional increase in the observed VAF to input VAF from 10−4 to 100% mutant samples. Variability of quantitation was evaluated within experimental replicates and shown to be consistent across sample preparations. The error at each successive base read was evaluated to determine if there is a limit of read length for quantitation of ≥10−4, and it was determined that read lengths up to 70 bases are reliable for quantitation. The method described here is adaptable to various oncogene or tumor suppressor gene targets, with the potential to implement multiplexing at the initial tagging step. While easy to perform manually, it is also suited for robotic handling and batch processing of samples, facilitating detection and quantitation of genetic carcinogenic biomarkers before tumor formation or in normal-appearing tissue.

Download Full-text