scholarly journals The effect of variant interference on de novo assembly for viral deep sequencing

2019 ◽  
Author(s):  
Christina J. Castro ◽  
Rachel L. Marine ◽  
Edward Ramos ◽  
Terry Fei Fan Ng

AbstractViruses have high mutation rates and generally exist as a mixture of variants in biological samples. Next-generation sequencing (NGS) approach has surpassed Sanger for generating long viral sequences, yet how variants affect NGS de novo assembly remains largely unexplored. Our results from >15,000 simulated experiments showed that presence of variants can turn an assembly of one genome into tens to thousands of contigs. This “variant interference” (VI) is highly consistent and reproducible by ten most used de novo assemblers, and occurs independent of genome length, read length, and GC content. The main driver of VI is pairwise identities between viral variants. These findings were further supported by in silico simulations, where selective removal of minor variant reads from clinical datasets allow the “rescue” of full viral genomes from fragmented contigs. These results call for careful interpretation of contigs and contig numbers from de novo assembly in viral deep sequencing.

Author(s):  
Matthew L Bendall ◽  
Keylie M Gibson ◽  
Margaret C Steiner ◽  
Uzma Rentia ◽  
Marcos Pérez-Losada ◽  
...  

Abstract Deep sequencing of viral populations using next generation sequencing (NGS) offers opportunities to understand and investigate evolution, transmission dynamics, and population genetics. Currently, the standard practice for processing NGS data to study viral populations is to summarize all the observed sequences from a sample as a single consensus sequence, thus discarding valuable information about the intra-host viral molecular epidemiology. Furthermore, existing analytical pipelines may only analyze genomic regions involved in drug resistance, thus are not suited for full viral genome analysis. Here we present HAPHPIPE, a HAplotype and PHylodynamics PIPEline for genome-wide assembly of viral consensus sequences and haplotypes. The HAPHPIPE protocol includes modules for quality trimming, error correction, de novo assembly, alignment, and haplotype reconstruction. The resulting consensus sequences, haplotypes, and alignments can be further analyzed using a variety of phylogenetic and population genetic software. HAPHPIPE is designed to provide users with a single pipeline to rapidly analyze sequences from viral populations generated from NGS platforms and provide quality output properly formatted for downstream evolutionary analyses.


2013 ◽  
Vol 26 (7) ◽  
pp. 801-811 ◽  
Author(s):  
Marco Morroni ◽  
Mireille Jacquemond ◽  
Mark Tepfer

Recombination is a major source of virus variability, and the question of whether novel recombinant viruses would emerge in transgenic plants expressing viral sequences has been a biosafety issue. We describe the results of pyrosequencing the recombinant viral RNAs appearing in transgenic plants expressing the coat protein (CP) gene and 3′ noncoding region of Cucumber mosaic virus RNA3, as well as in nontransgenic controls. The populations of recombinants in both transgenic and nontransgenic plants were similar to those previously described from Sanger sequencing but many more recombinant types were observed, including a novel class of large deletions removing all or nearly the entire CP gene. These results show that populations of recombinant viral genomes arising de novo can be characterized in detail by pyrosequencing, and confirm that the transgenic plants did not harbor novel recombinants of biosafety concern.


PeerJ ◽  
2017 ◽  
Vol 5 ◽  
pp. e3982 ◽  
Author(s):  
RuiJuan Feng ◽  
Xin Wang ◽  
Min Tao ◽  
Guanchao Du ◽  
Qishuo Wang

Vallisneria spinulosa is a freshwater aquatic plant of ecological and economic importance. However, there is limited cytogenetic and genomics information on Vallisneria. In this study, we measured the nuclear DNA content of Vallisneria spinulosa by flow cytometry, performed a de novo assembly, and annotated repetitive sequences by using a combination of next-generation sequencing (NGS) and bioinformatics tools. The genome size of Vallisneria spinulosa is approximately 3,595 Mbp, in which nearly 60% of the genome consists of repetitive sequences. The majority of the repetitive sequences are LTR-retrotransposons comprising 43% of the genome. Although the amount of sequencing data used in this study was not sufficient for a whole-genome assembly, it could generate an overview of representative elements in the genome. These results will lay a new foundation for further studies on various species that belong to the Vallisneria genus.


2021 ◽  
pp. PHYTOFR-07-20-0
Author(s):  
Kuan Liang ◽  
Jianbin Lan ◽  
Baoquan Wang ◽  
Yuanyuan Liu ◽  
Qi Lu ◽  
...  

Kiwifruit soft rot caused by the fungal pathogen Botryosphaeria dothidea is a serious disease in kiwifruit-growing regions worldwide. In this study, we reported the high-quality genome sequence of the highly virulent B. dothidea strain PTZ1 using PacBio Sequel techniques. In total, 100.87 million clean reads with mean read length of 9,871 bp were obtained. De novo assembly resulted in 28 contigs with a total size of 44.45 Mb. The GC content of the genome was 54.59%. Furthermore, genes related to specific virulence of the strain were identified, including 259 fungal cytochrome P450s, 550 carbohydrate-active enzymes, 860 secretory proteins, and 1,182 pathogen–host interactions related proteins. The genome is a useful resource to serve as a reference to facilitate the analysis of B. dothidea isolates and comparative genomic studies of the necrotroph pathogens. [Formula: see text] Copyright © 2021 The Author(s). This is an open access article distributed under the CC BY-NC-ND 4.0 International license .


2021 ◽  
Author(s):  
yanjiang liu ◽  
Xiao Zhu ◽  
Mingli Wu ◽  
Xue Xu ◽  
Zhaoxia Dai ◽  
...  

Abstract Chimonobambusa hirtinoda is a threatened species and only naturally distributed in Doupeng Mountain, Duyun, Guizhou, China. Next-generation sequencing (NGS) is used obtained the complete chloroplast (cp) genome sequence of C. hirtinoda, and then the sequence was assembled and analyze for phylogenetic and evolutionary. We also analyzed comparing the cp genome among Chimonobambusa species with previously published. The complete cp genome of C. hirtinoda has the total length of 139, 561 bp, 38.90% GC content was detected. A total of 130 genes were founded in the cp genome, including 85 protein coding genes, 37 tRNA genes, 8 rRNA. Some genes are missing and the introns occur lost in the cp genome of C. hirtinoda. A total of 48 simple sequence repeat (SSR) were detected and by measuring the codon usage frequency of amino acids, the A/U preference of the third nucleotide in the cp genome of C. hirtinoda was obtained. Furthermore, phylogenetic analysis using complete cp sequences, matk gene exhibited genetic relationship within the Chimonobambusa genus.


Viruses ◽  
2020 ◽  
Vol 12 (2) ◽  
pp. 207 ◽  
Author(s):  
Camila Dantas Malossi ◽  
Eduardo Gorzoni Fioratti ◽  
Jedson Ferreira Cardoso ◽  
Angelo Jose Magro ◽  
Erna Geessien Kroon ◽  
...  

Equine infectious anemia virus (EIAV) is a persistent lentivirus that causes equine infectious anemia (EIA). In Brazil, EIAV is endemic in the Pantanal region, and euthanasia is not mandatory in this area. All of the complete genomic sequences from field viruses are from North America, Asia, and Europe, and only proviral genomic sequences are available. Sequences from Brazilian EIAV are currently available only for gag and LTR regions. Thus, the present study aimed for the first time to sequence the entire EIAV genomic RNA in naturally infected horses from an endemic area in Brazil. RNA in plasma from naturally infected horses was used for next-generation sequencing (NGS), and gaps were filled using Sanger sequencing methodology. Complete viral genomes of EIAV from two horses were obtained and annotated (Access Number: MN560970 and MN560971). Putative genes were analyzed and compared with previously described genes, showing conservation in gag and pol genes and high variations in LTR and env sequences. Amino acid changes were identified in the p26 protein, one of the most common targets used for diagnosis, and p26 molecular modelling showed surface amino acid alterations in some epitopes. Brazilian genome sequences presented 88.6% nucleotide identity with one another and 75.8 to 77.3% with main field strains, such as EIAV Liaoning, Wyoming, Ireland, and Italy isolates. Furthermore, phylogenetic analysis suggested that this Brazilian strain comprises a separate monophyletic group. These results may help to better characterize EIAV and to overcome the challenges of diagnosing and controlling EIA in endemic regions.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Fathiya M. Khamis ◽  
Paul O. Mireji ◽  
Fidelis L. O. Ombura ◽  
Anna R. Malacrida ◽  
Erick O. Awuoche ◽  
...  

AbstractThe fruit fly species, Ceratitis rosa sensu stricto and Ceratitis quilicii, are sibling species restricted to the lowland and highland regions, respectively. Until recently, these sibling species were considered as allopatric populations of C. rosa with distinct bionomics. We used deep Next Generation Sequencing (NGS) technology on intact guts of individuals from the two sibling species to compare their transcriptional profiles and simultaneously understand gut microbiome and host molecular processes and identify distinguishing genetic differences between the two species. Since the genomes of both species had not been published previously, the transcriptomes were assembled de novo into transcripts. Microbe-specific transcript orthologs were separated from the assembly by filtering searches of the transcripts against microbe databases using OrthoMCL. We then used differential expression analysis of host-specific transcripts (i.e. those remaining after the microbe-specific transcripts had been removed) and microbe-specific transcripts from the two-sibling species to identify defining species-specific transcripts that were present in only one fruit fly species or the other, but not in both. In C. quilicii females, bacterial transcripts of Pectobacterium spp., Enterobacterium buttiauxella, Enterobacter cloacae and Klebsiella variicola were upregulated compared to the C. rosa s.s. females. Comparison of expression levels of the host transcripts revealed a heavier investment by C. quilicii (compared with C. rosa s.s.) in: immunity; energy production; cell proliferation; insecticide resistance; reproduction and proliferation; and redox reactions that are usually associated with responses to stress and degradation of fruit metabolites.


2019 ◽  
Author(s):  
Page B McKinzie ◽  
Michelle E Bishop

Abstract Quantifying mutant or variable allele frequencies (VAFs) of ≤10−3 using next-generation sequencing (NGS) has utility in both clinical and nonclinical settings. Two common approaches for quantifying VAFs using NGS are tagged single-strand sequencing and duplex sequencing. While duplex sequencing is reported to have sensitivity up to 10−8 VAF, it is not a quick, easy, or inexpensive method. We report a method for quantifying VAFs that are ≥10−4 that is as easy and quick for processing samples as standard sequencing kits, yet less expensive than the kits. The method was developed using PCR fragment-based VAFs of Kras codon 12 in log10 increments from 10−5 to 10−1, then applied and tested on native genomic DNA. For both sources of DNA, there is a proportional increase in the observed VAF to input VAF from 10−4 to 100% mutant samples. Variability of quantitation was evaluated within experimental replicates and shown to be consistent across sample preparations. The error at each successive base read was evaluated to determine if there is a limit of read length for quantitation of ≥10−4, and it was determined that read lengths up to 70 bases are reliable for quantitation. The method described here is adaptable to various oncogene or tumor suppressor gene targets, with the potential to implement multiplexing at the initial tagging step. While easy to perform manually, it is also suited for robotic handling and batch processing of samples, facilitating detection and quantitation of genetic carcinogenic biomarkers before tumor formation or in normal-appearing tissue.


Sign in / Sign up

Export Citation Format

Share Document