scholarly journals VIGA: a sensitive, precise and automatic de novo VIral Genome Annotator

2018 ◽  
Author(s):  
Enrique González-Tortuero ◽  
Thomas David Sean Sutton ◽  
Vimalkumar Velayudhan ◽  
Andrey Nikolaevich Shkoporov ◽  
Lorraine Anne Draper ◽  
...  

AbstractViral (meta)genomics is a rapidly growing field of study that is hampered by an inability to annotate the majority of viral sequences; therefore, the development of new bioinformatic approaches is very important. Here, we present a new automatic de novo genome annotation pipeline, called VIGA, to annotate prokaryotic and eukaryotic viral sequences from (meta)genomic studies. VIGA was benchmarked on a database of known viral genomes and a viral metagenomics case study. VIGA generated the most accurate outputs according to the number of coding sequences and their coordinates, outputs also had a lower number of non-informative annotations compared to other programs.

1980 ◽  
Vol 210 (1180) ◽  
pp. 423-435 ◽  

We have cloned and propagated in prokaryotic vectors the viral DNA sequences that are integrated in a variety of cells transformed by adenovirus 2 or SV40. Analysis of the clones reveals that the viral DNA sequences sometimes are arranged in a simple fashion, collinear with the viral genome; in other cell lines there are complex arrangements of viral sequences in which tracts of the viral genome are inverted with respect to each other. In several cases the nucleotide sequences at the joints between cell and viral sequences have been determined: usually there is a sharp transition between cellular and viral DNAs. The viral sequences are integrated at different locations within the genomes of different cell lines; likewise there is no specific site on the viral genomes at which integration occurs. Sometimes the viral sequences are integrated within repetitive cellular DNA, and sometimes within unique sequences. In some cases there is evidence that the viral sequences along with the flanking cell DNA have been amplified after integration. The sequences that flank the viral insertion in the line of SV40-transformed rat cells known as 14B have been used as probes to isolate, from untransformed rat cells, clones that carry the region of the chromosome in which integration occurred. Analysis of the structure of these clones by restriction endonuclease digestion and heteroduplex formation shows that a rearrangement of cellular sequences has occurred, presumably as a consequence of integration.


Viruses ◽  
2019 ◽  
Vol 11 (11) ◽  
pp. 979 ◽  
Author(s):  
Ping Liu ◽  
Wu Chen ◽  
Jin-Ping Chen

Pangolins are endangered animals in urgent need of protection. Identifying and cataloguing the viruses carried by pangolins is a logical approach to evaluate the range of potential pathogens and help with conservation. This study provides insight into viral communities of Malayan Pangolins (Manis javanica) as well as the molecular epidemiology of dominant pathogenic viruses between Malayan Pangolin and other hosts. A total of 62,508 de novo assembled contigs were constructed, and a BLAST search revealed 3600 ones (≥300 nt) were related to viral sequences, of which 68 contigs had a high level of sequence similarity to known viruses, while dominant viruses were the Sendai virus and Coronavirus. This is the first report on the viral diversity of pangolins, expanding our understanding of the virome in endangered species, and providing insight into the overall diversity of viruses that may be capable of directly or indirectly crossing over into other mammals.


2013 ◽  
Vol 26 (7) ◽  
pp. 801-811 ◽  
Author(s):  
Marco Morroni ◽  
Mireille Jacquemond ◽  
Mark Tepfer

Recombination is a major source of virus variability, and the question of whether novel recombinant viruses would emerge in transgenic plants expressing viral sequences has been a biosafety issue. We describe the results of pyrosequencing the recombinant viral RNAs appearing in transgenic plants expressing the coat protein (CP) gene and 3′ noncoding region of Cucumber mosaic virus RNA3, as well as in nontransgenic controls. The populations of recombinants in both transgenic and nontransgenic plants were similar to those previously described from Sanger sequencing but many more recombinant types were observed, including a novel class of large deletions removing all or nearly the entire CP gene. These results show that populations of recombinant viral genomes arising de novo can be characterized in detail by pyrosequencing, and confirm that the transgenic plants did not harbor novel recombinants of biosafety concern.


2019 ◽  
Author(s):  
Christina J. Castro ◽  
Rachel L. Marine ◽  
Edward Ramos ◽  
Terry Fei Fan Ng

AbstractViruses have high mutation rates and generally exist as a mixture of variants in biological samples. Next-generation sequencing (NGS) approach has surpassed Sanger for generating long viral sequences, yet how variants affect NGS de novo assembly remains largely unexplored. Our results from >15,000 simulated experiments showed that presence of variants can turn an assembly of one genome into tens to thousands of contigs. This “variant interference” (VI) is highly consistent and reproducible by ten most used de novo assemblers, and occurs independent of genome length, read length, and GC content. The main driver of VI is pairwise identities between viral variants. These findings were further supported by in silico simulations, where selective removal of minor variant reads from clinical datasets allow the “rescue” of full viral genomes from fragmented contigs. These results call for careful interpretation of contigs and contig numbers from de novo assembly in viral deep sequencing.


2019 ◽  
Author(s):  
J. Pace ◽  
K. Youens-Clark ◽  
C. Freeman ◽  
B. Hurwitz ◽  
K. Van Doorslaer

ABSTRACTHigh-throughput sequencing technologies provide unprecedented power to identify novel viruses from a wide variety of (environmental) samples. The field of ‘viral metagenomics’ has dramatically expanded our understanding of viral diversity. Viral metagenomic approaches imply that many novel viruses will not be described by researchers who are experts on the genomic organization of that virus. There is a need to develop analytical approaches to reconstruct, annotate, and classify viral genomes. We have developed the papillomavirus annotation tool (PuMA) to provide researchers with a convenient and reproducible method to annotate novel papillomaviruses. PuMA provides an accessible method for automated papillomavirus genome annotation. PuMA currently has a 98% accuracy when benchmarked against the 481 reference genomes in the papillomavirus episteme (PaVE). Finally, PuMA was used to annotate 168 newly isolated papillomaviruses, and successfully annotated 1424 viral features. To demonstrate its general applicability, we developed a version of PuMA that can annotate polyomaviruses.PuMA is available on GitHub (https://github.com/KVD-lab/puma) and through the iMicrobe online environment (https://www.imicrobe.us/#/apps/puma)


2021 ◽  
Author(s):  
Enrique González-Tortuero ◽  
Revathy Krishnamurthi ◽  
Heather E. Allison ◽  
Ian B. Goodhead ◽  
Chloe E. James

The number of newly available viral genomes and metagenomes has increased exponentially since the development of high throughput sequencing platforms and genome analysis tools. Bioinformatic annotation pipelines are largely based on open reading frame (ORF) calling software, which identifies genes independently of the sequence taxonomical background. Although ORF-calling programs provide a rapid genome annotation, they can misidentify ORFs and start codons; errors that might be perpetuated and propagated over time. This study evaluated the performance of multiple ORF-calling programs for viral genome annotation against the complete RefSeq viral database. Programs outputs varied when considering the viral nucleic acid type versus the viral host. According to the number of ORFs, Prodigal and Metaprodigal were the most accurate programs for DNA viruses, while FragGeneScan and Prodigal generated the most accurate outputs for RNA viruses. Similarly, Prodigal outperformed the benchmark for viruses infecting prokaryotes, and GLIMMER and GeneMarkS produced the most accurate annotations for viruses infecting eukaryotes. When the coordinates of the ORFs were considered, Prodigal scored high for all scenarios except for RNA viruses, where GeneMarkS generated the most reliable results. Overall, the quality of the coordinates predicted for RNA viruses was poorer than for DNA viruses, suggesting the need for improved ORF-calling programs to deal with RNA viruses. Moreover, none of the ORF-calling programs reached 90% accuracy for annotation of DNA viruses. Any automatic annotation can still be improved by manual curation, especially when the presence of ORFs is validated with wet-lab experiments. However, our evaluation of the current ORF-calling programs is expected to be useful for the improvement of viral genome annotation pipelines and highlights the need for more expression data to improve the rigor of reference genomes.


2019 ◽  
Author(s):  
Asher Leeks ◽  
Rafael Sanjuán ◽  
Stuart A. West

Viruses frequently spread among cells or hosts in groups, with multiple viral genomes inside the same infectious unit. These collective infectious units can consist of multiple viral genomes inside the same virion, or multiple virions inside a larger structure such as a vesicle. Collective infectious units deliver multiple viral genomes to the same cell simultaneously, which can have important implications for viral pathogenesis, antiviral resistance, and social evolution. However, little is known about why some viruses transmit in collective infectious units, whereas others do not. We used a simple evolutionary approach to model the potential costs and benefits of transmitting in a collective infectious unit. We found that collective infectious units could be favoured if cells infected by multiple viral genomes were significantly more productive than cells infected by just one viral genome, and especially if there were also efficiency benefits to packaging multiple viral genomes inside the same infectious unit. We also found that if some viral sequences are defective, then collective infectious units could evolve to become very large, but that if these defective sequences interfered with wild-type virus replication, then collective infectious units were disfavoured.


2020 ◽  
Vol 49 (D1) ◽  
pp. D764-D775 ◽  
Author(s):  
Simon Roux ◽  
David Páez-Espino ◽  
I-Min A Chen ◽  
Krishna Palaniappan ◽  
Anna Ratner ◽  
...  

Abstract Viruses are integral components of all ecosystems and microbiomes on Earth. Through pervasive infections of their cellular hosts, viruses can reshape microbial community structure and drive global nutrient cycling. Over the past decade, viral sequences identified from genomes and metagenomes have provided an unprecedented view of viral genome diversity in nature. Since 2016, the IMG/VR database has provided access to the largest collection of viral sequences obtained from (meta)genomes. Here, we present the third version of IMG/VR, composed of 18 373 cultivated and 2 314 329 uncultivated viral genomes (UViGs), nearly tripling the total number of sequences compared to the previous version. These clustered into 935 362 viral Operational Taxonomic Units (vOTUs), including 188 930 with two or more members. UViGs in IMG/VR are now reported as single viral contigs, integrated proviruses or genome bins, and are annotated with a new standardized pipeline including genome quality estimation using CheckV, taxonomic classification reflecting the latest ICTV update, and expanded host taxonomy prediction. The new IMG/VR interface enables users to efficiently browse, search, and select UViGs based on genome features and/or sequence similarity. IMG/VR v3 is available at https://img.jgi.doe.gov/vr, and the underlying data are available to download at https://genome.jgi.doe.gov/portal/IMG_VR.


Author(s):  
Stephen Nayfach ◽  
Antonio Pedro Camargo ◽  
Emiley Eloe-Fadrosh ◽  
Simon Roux ◽  
Nikos Kyrpides

AbstractOver the last several years, metagenomics has enabled the assembly of millions of new viral sequences that have vastly expanded our knowledge of Earth’s viral diversity. However, these sequences range from small fragments to complete genomes and no tools currently exist for estimating their quality. To address this problem, we developed CheckV, which is an automated pipeline for estimating the completeness of viral genomes as well as the identification and removal of non-viral regions found on integrated proviruses. After validating the approach on mock datasets, CheckV was applied to large and diverse viral genome collections, including IMG/VR and the Global Ocean Virome, revealing that the majority of viral sequences were small fragments, with just 3.6% classified as high-quality (i.e. > 90% completeness) or complete genomes. Additionally, we found that removal of host contamination significantly improved identification of auxiliary metabolic genes and interpretation of viral-encoded functions. We expect CheckV will be broadly useful for all researchers studying and reporting viral genomes assembled from metagenomes. CheckV is freely available at: http://bitbucket.org/berkeleylab/CheckV.


Sign in / Sign up

Export Citation Format

Share Document