PuMA: A papillomavirus genome annotation tool

Josh Pace; Ken Youens-Clark; Cordell Freeman; Bonnie Hurwitz; Koenraad Van Doorslaer

doi:10.1093/ve/veaa068

PuMA: A papillomavirus genome annotation tool

Virus Evolution ◽

10.1093/ve/veaa068 ◽

2020 ◽

Vol 6 (2) ◽

Author(s):

Josh Pace ◽

Ken Youens-Clark ◽

Cordell Freeman ◽

Bonnie Hurwitz ◽

Koenraad Van Doorslaer

Keyword(s):

High Throughput Sequencing ◽

Viral Metagenomics ◽

Annotation Tool ◽

General Applicability ◽

Virus Family ◽

Sequencing Technologies ◽

Preliminary Version ◽

Reproducible Method ◽

Reference Genomes ◽

Viral Annotation

Abstract High-throughput sequencing technologies provide unprecedented power to identify novel viruses from a wide variety of (environmental) samples. The field of ‘viral metagenomics’ has dramatically expanded our understanding of viral diversity. Viral metagenomic approaches imply that many novel viruses will not be described by researchers who are experts on (the genomic organization of) that virus family. We have developed the papillomavirus annotation tool (PuMA) to provide researchers with a convenient and reproducible method to annotate and report novel papillomaviruses. PuMA currently correctly annotates 99% of the papillomavirus genes when benchmarked against the 655 reference genomes in the papillomavirus episteme. Compared to another viral annotation pipeline, PuMA annotates more viral features while being more accurate. To demonstrate its general applicability, we also developed a preliminary version of PuMA that can annotate polyomaviruses. PuMA is available on GitHub (https://github.com/KVD-lab/puma) and through the iMicrobe online environment (https://www.imicrobe.us/#/apps/puma).

Download Full-text

PuMA: a papillomavirus genome annotation tool

10.1101/736991 ◽

2019 ◽

Author(s):

J. Pace ◽

K. Youens-Clark ◽

C. Freeman ◽

B. Hurwitz ◽

K. Van Doorslaer

Keyword(s):

Genome Annotation ◽

High Throughput Sequencing ◽

Viral Metagenomics ◽

Annotation Tool ◽

General Applicability ◽

Viral Genomes ◽

Link Type ◽

Sequencing Technologies ◽

Reproducible Method ◽

Analytical Approaches

ABSTRACTHigh-throughput sequencing technologies provide unprecedented power to identify novel viruses from a wide variety of (environmental) samples. The field of ‘viral metagenomics’ has dramatically expanded our understanding of viral diversity. Viral metagenomic approaches imply that many novel viruses will not be described by researchers who are experts on the genomic organization of that virus. There is a need to develop analytical approaches to reconstruct, annotate, and classify viral genomes. We have developed the papillomavirus annotation tool (PuMA) to provide researchers with a convenient and reproducible method to annotate novel papillomaviruses. PuMA provides an accessible method for automated papillomavirus genome annotation. PuMA currently has a 98% accuracy when benchmarked against the 481 reference genomes in the papillomavirus episteme (PaVE). Finally, PuMA was used to annotate 168 newly isolated papillomaviruses, and successfully annotated 1424 viral features. To demonstrate its general applicability, we developed a version of PuMA that can annotate polyomaviruses.PuMA is available on GitHub (https://github.com/KVD-lab/puma) and through the iMicrobe online environment (https://www.imicrobe.us/#/apps/puma)

Download Full-text

BrumiR: A toolkit for de novo discovery of microRNAs from sRNA-seq data

10.1101/2020.08.07.240689 ◽

2020 ◽

Author(s):

Carol Moraga ◽

Evelyn Sanchez ◽

Mariana Galvão Ferrarini ◽

Rodrigo A. Gutierrez ◽

Elena A. Vidal ◽

...

Keyword(s):

High Throughput Sequencing ◽

Reference Genome ◽

De Novo ◽

Regulation Of Gene Expression ◽

Additional Information ◽

Sequencing Technologies ◽

Mapping Tool ◽

Biological Insight ◽

Non Coding Rnas ◽

Reference Genomes

AbstractMicroRNAs (miRNAs) are small non-coding RNAs that are key players in the regulation of gene expression. In the last decade, with the increasing accessibility of high-throughput sequencing technologies, different methods have been developed to identify miRNAs, most of which rely on pre-existing reference genomes. However, when a reference genome is absent or is not of high quality, such identification becomes more difficult. In this context, we developed BrumiR, an algorithm that is able to discover miRNAs directly and exclusively from sRNA-seq data. We benchmarked BrumiR with datasets encompassing animal and plant species using real and simulated sRNA-seq experiments. The results demonstrate that BrumiR reaches the highest recall for miRNA discovery, while at the same time being much faster and more efficient than the state-of-the-art tools evaluated. The latter allows BrumiR to analyze a large number of sRNA-seq experiments, from plants or animals species. Moreover, BrumiR detects additional information regarding other expressed sequences (sRNAs, isomiRs, etc.), thus maximizing the biological insight gained from sRNA-seq experiments. Finally, when a reference genome is available, BrumiR provides a new mapping tool (BrumiR2ref) that performs an a posteriori exhaustive search to identify the precursor sequences. The code of BrumiR is freely available at https://github.com/camoragaq/BrumiR.

Download Full-text

Reassortment of Genome Segments Creates Stable Lineages Among Strains of Orchid Fleck Virus Infecting Citrus in Mexico

Phytopathology ◽

10.1094/phyto-07-19-0253-fi ◽

2020 ◽

Vol 110 (1) ◽

pp. 106-120 ◽

Cited By ~ 1

Author(s):

Avijit Roy ◽

Andrew L. Stone ◽

Gabriel Otero-Colina ◽

Gang Wei ◽

Ronald H. Brlansky ◽

...

Keyword(s):

High Throughput Sequencing ◽

Sensu Stricto ◽

Genome Segment ◽

Rt Pcr ◽

Sequence Comparisons ◽

Orchid Fleck Virus ◽

Reverse Transcription Pcr ◽

Sequencing Technologies ◽

Negative Sense

The genus Dichorhavirus contains viruses with bipartite, negative-sense, single-stranded RNA genomes that are transmitted by flat mites to hosts that include orchids, coffee, the genus Clerodendrum, and citrus. A dichorhavirus infecting citrus in Mexico is classified as a citrus strain of orchid fleck virus (OFV-Cit). We previously used RNA sequencing technologies on OFV-Cit samples from Mexico to develop an OFV-Cit–specific reverse transcription PCR (RT-PCR) assay. During assay validation, OFV-Cit–specific RT-PCR failed to produce an amplicon from some samples with clear symptoms of OFV-Cit. Characterization of this virus revealed that dichorhavirus-like particles were found in the nucleus. High-throughput sequencing of small RNAs from these citrus plants revealed a novel citrus strain of OFV, OFV-Cit2. Sequence comparisons with known orchid and citrus strains of OFV showed variation in the protein products encoded by genome segment 1 (RNA1). Strains of OFV clustered together based on host of origin, whether orchid or citrus, and were clearly separated from other dichorhaviruses described from infected citrus in Brazil. The variation in RNA1 between the original (now OFV-Cit1) and the new (OFV-Cit2) strain was not observed with genome segment 2 (RNA2), but instead, a common RNA2 molecule was shared among strains of OFV-Cit1 and -Cit2, a situation strikingly similar to OFV infecting orchids. We also collected mites at the affected groves, identified them as Brevipalpus californicus sensu stricto, and confirmed that they were infected by OFV-Cit1 or with both OFV-Cit1 and -Cit2. OFV-Cit1 and -Cit2 have coexisted at the same site in Toliman, Queretaro, Mexico since 2012. OFV strain-specific diagnostic tests were developed.

Download Full-text

Application of Oxford Nanopore Technology to Plant Virus Detection

Viruses ◽

10.3390/v13081424 ◽

2021 ◽

Vol 13 (8) ◽

pp. 1424

Author(s):

Lia W. Liefting ◽

David W. Waite ◽

Jeremy R. Thompson

Keyword(s):

Plant Virus ◽

High Throughput Sequencing ◽

Virus Detection ◽

Diagnostic Methods ◽

Plant Virus Detection ◽

Sequencing Technologies ◽

Oxford Nanopore ◽

Virus Diagnostics ◽

Post Entry ◽

Read Accuracy

The adoption of Oxford Nanopore Technologies (ONT) sequencing as a tool in plant virology has been relatively slow despite its promise in more recent years to yield large quantities of long nucleotide sequences in real time without the need for prior amplification. The portability of the MinION and Flongle platforms combined with lowering costs and continued improvements in read accuracy make ONT an attractive method for both low- and high-scale virus diagnostics. Here, we provide a detailed step-by-step protocol using the ONT Flongle platform that we have developed for the routine application on a range of symptomatic post-entry quarantine and domestic surveillance plant samples. The aim of this methods paper is to highlight ONT’s feasibility as a valuable component to the diagnostician’s toolkit and to hopefully stimulate other laboratories towards the eventual goal of integrating high-throughput sequencing technologies as validated plant virus diagnostic methods in their own right.

Download Full-text

Assessing genotyping errors in mammalian museum study skins using high-throughput genotyping-by-sequencing

Conservation Genetics Resources ◽

10.1007/s12686-021-01213-8 ◽

2021 ◽

Author(s):

Stella C. Yuan ◽

Eric Malekos ◽

Melissa T. R. Hawkins

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Massively Parallel Sequencing ◽

Massively Parallel ◽

Museum Specimens ◽

Museum Specimen ◽

Genotyping Errors ◽

Allelic Dropout ◽

Parallel Sequencing ◽

Sequencing Technologies

AbstractThe use of museum specimens held in natural history repositories for population and conservation genetic research is increasing in tandem with the use of massively parallel sequencing technologies. Short Tandem Repeats (STRs), or microsatellite loci, are commonly used genetic markers in wildlife and population genetic studies. However, they traditionally suffered from a host of issues including length homoplasy, high costs, low throughput, and difficulties in reproducibility across laboratories. Massively parallel sequencing technologies can address these problems, but the incorporation of museum specimen derived DNA suffers from significant fragmentation and exogenous DNA contamination. Combatting these issues requires extra measures of stringency in the lab and during data analysis, yet there have not been any high-throughput sequencing studies evaluating microsatellite allelic dropout from museum specimen extracted DNA. In this study, we evaluate genotyping errors derived from mammalian museum skin DNA extracts for previously characterized microsatellites across PCR replicates utilizing high-throughput sequencing. We found it useful to classify samples based on DNA concentration, which determined the rate by which genotypes were accurately recovered. Longer microsatellites performed worse in all museum specimens. Allelic dropout rates across loci were dependent on sample quantity, with high concentration museum specimens performing as well and recovering quality metrics nearly as high as the frozen tissue sample. Based on our results, we provide a set of best practices for quality assurance and incorporation of reliable genotypes from museum specimens.

Download Full-text

Multi-omics approach to precision medicine for immune-mediated diseases

Inflammation and Regeneration ◽

10.1186/s41232-021-00173-8 ◽

2021 ◽

Vol 41 (1) ◽

Author(s):

Mineto Ota ◽

Keishi Fujio

Keyword(s):

Treatment Response ◽

High Throughput Sequencing ◽

Disease Risk ◽

Clinical Information ◽

Clinical Settings ◽

Social Significance ◽

Sequencing Technologies ◽

Immune Mediated ◽

Recent Innovation ◽

Future Direction

AbstractRecent innovation in high-throughput sequencing technologies has drastically empowered the scientific research. Consequently, now, it is possible to capture comprehensive profiles of samples at multiple levels including genome, epigenome, and transcriptome at a time. Applying these kinds of rich information to clinical settings is of great social significance. For some traits such as cardiovascular diseases, attempts to apply omics datasets in clinical practice for the prediction of the disease risk have already shown promising results, although still under way for immune-mediated diseases. Multiple studies have tried to predict treatment response in immune-mediated diseases using genomic, transcriptomic, or clinical information, showing various possible indicators. For better prediction of treatment response or disease outcome in immune-mediated diseases, combining multi-layer information together may increase the power. In addition, in order to efficiently pick up meaningful information from the massive data, high-quality annotation of genomic functions is also crucial. In this review, we discuss the achievement so far and the future direction of multi-omics approach to immune-mediated diseases.

Download Full-text

Profiling DNA Methylation Based on Next-Generation Sequencing Approaches: New Insights and Clinical Applications

Genes ◽

10.3390/genes9090429 ◽

2018 ◽

Vol 9 (9) ◽

pp. 429 ◽

Cited By ~ 38

Author(s):

Daniela Barros-Silva ◽

C. Marques ◽

Rui Henrique ◽

Carmen Jerónimo

Keyword(s):

Dna Methylation ◽

Next Generation Sequencing ◽

High Throughput Sequencing ◽

Epigenetic Modification ◽

Response To Therapy ◽

Next Generation ◽

Sequencing Technologies ◽

Prognosis And Prediction ◽

Novel Biomarkers ◽

Generation Sequencing

DNA methylation is an epigenetic modification that plays a pivotal role in regulating gene expression and, consequently, influences a wide variety of biological processes and diseases. The advances in next-generation sequencing technologies allow for genome-wide profiling of methyl marks both at a single-nucleotide and at a single-cell resolution. These profiling approaches vary in many aspects, such as DNA input, resolution, coverage, and bioinformatics analysis. Thus, the selection of the most feasible method according with the project’s purpose requires in-depth knowledge of those techniques. Currently, high-throughput sequencing techniques are intensively used in epigenomics profiling, which ultimately aims to find novel biomarkers for detection, diagnosis prognosis, and prediction of response to therapy, as well as to discover new targets for personalized treatments. Here, we present, in brief, a portrayal of next-generation sequencing methodologies’ evolution for profiling DNA methylation, highlighting its potential for translational medicine and presenting significant findings in several diseases.

Download Full-text

Detection of novel allelic variations in soybean mutant population using Tilling by Sequencing

10.1101/711440 ◽

2019 ◽

Author(s):

Reneth Millas ◽

Mary Espina ◽

CM Sabbir Ahmed ◽

Angelina Bernardini ◽

Ekundayo Adeleke ◽

...

Keyword(s):

Fatty Acid ◽

High Throughput ◽

Reverse Genetics ◽

High Throughput Sequencing ◽

Fatty Acid Biosynthesis ◽

Induced Mutations ◽

Mutant Population ◽

Sequencing Technologies ◽

Allelic Variations ◽

Tilling By Sequencing

ABSTRACTOne of the most important tools in genetic improvement is mutagenesis, which is a useful tool to induce genetic and phenotypic variation for trait improvement and discovery of novel genes. JTN-5203 (MG V) mutant population was generated using an induced ethyl methane sulfonate (EMS) mutagenesis and was used for detection of induced mutations in FAD2-1A and FAD2-1B genes using reverse genetics approach. Optimum concentration of EMS was used to treat 15,000 bulk JTN-5203 seeds producing 1,820 M2 population. DNA was extracted, normalized, and pooled from these individuals. Specific primers were designed from FAD2-1A and FAD2-1B genes that are involved in the fatty acid biosynthesis pathway for further analysis using next-generation sequencing. High throughput mutation discovery through TILLING-by-Sequencing approach was used to detect novel allelic variations in this population. Several mutations and allelic variations with high impacts were detected for FAD2-1A and FAD2-1B. This includes GC to AT transition mutations in FAD2-1A (20%) and FAD2-1B (69%). Mutation density for this population is estimated to be about 1/136kb. Through mutagenesis and high-throughput sequencing technologies, novel alleles underlying the mutations observed in mutants with reduced polyunsaturated fatty acids will be identified, and these mutants can be further used in breeding soybean lines with improved fatty acid profile, thereby developing heart-healthy-soybeans.

Download Full-text

A New Paralog Removal Pipeline Resolves Conflict between RAD-seq and Enrichment

10.1101/2020.10.26.355248 ◽

2020 ◽

Author(s):

Wenbin Zhou ◽

John Soghigian ◽

Qiu-yun (Jenny) Xiang

Keyword(s):

High Throughput Sequencing ◽

Sequence Similarity ◽

Phylogenetic Analyses ◽

Disjunct Distribution ◽

Divergence Times ◽

Target Enrichment ◽

Sequencing Technologies ◽

Duplication Events ◽

The Witch ◽

Phylogenomic Analyses

ABSTRACTTarget enrichment and RAD-seq are well-established high throughput sequencing technologies that have been increasingly used for phylogenomic studies, and the choice between methods is a practical issue for plant systematists studying the evolutionary histories of biodiversity of relatively recent origins. However, few studies have compared the congruence and conflict between results from the two methods within the same group of organisms, especially in plants, where extensive genome duplication events may complicate phylogenomic analyses. Unfortunately, currently widely used pipelines for target enrichment data analysis do not have a vigorous procedure for remove paralogs in Hyb-Seq data. In this study, we employed RAD-seq and Hyb-Seq of Angiosperm 353 genes in phylogenomic and biogeographic studies of Hamamelis (the witch-hazels) and Castanea (chestnuts), two classic examples exhibiting the well-known eastern Asian-eastern North American disjunct distribution. We compared these two methods side by side and developed a new pipeline (PPD) with a more vigorous removal of putative paralogs from Hyb-Seq data. The new pipeline considers both sequence similarity and heterozygous sites at each locus in identification of paralogous. We used our pipeline to construct robust datasets for comparison between methods and downstream analyses on the two genera. Our results demonstrated that the PPD identified many more putative paralogs than the popular method HybPiper. Comparisons of tree topologies and divergence times showed significant differences between data from HybPiper and data from our new PPD pipeline, likely due to the error signals from the paralogous genes undetected by HybPiper, but trimmed by PPD. We found that phylogenies and divergence times estimated from our RAD-seq and Hyb-Seq-PPD were largely congruent. We highlight the importance of removal paralogs in enrichment data, and discuss the merits of RAD-seq and Hyb-Seq. Finally, phylogenetic analyses of RAD-seq and Hyb-Seq resulted in well-resolved species relationships, and revealed ancient introgression in both genera. Biogeographic analyses including fossil data revealed a complicated history of each genus involving multiple intercontinental dispersals and local extinctions in areas outside of the taxa’s modern ranges in both the Paleogene and Neogene. Our study demonstrates the value of additional steps for filtering paralogous gene content from Angiosperm 353 data, such as our new PPD pipeline described in this study. [RAD-seq, Hyb-Seq, paralogs, Castanea, Hamamelis, eastern Asia-eastern North America disjunction, biogeography, ancient introgression]

Download Full-text

Utilizing the VirIdAl Pipeline to Search for Viruses in the Metagenomic Data of Bat Samples

Viruses ◽

10.3390/v13102006 ◽

2021 ◽

Vol 13 (10) ◽

pp. 2006

Author(s):

Anna Y Budkina ◽

Elena V Korneenko ◽

Ivan A Kotov ◽

Daniil A Kiselev ◽

Ilya V Artyushin ◽

...

Keyword(s):

Large Scale ◽

High Throughput Sequencing ◽

Metagenomic Data ◽

Sequencing Data ◽

Viral Pathogens ◽

Genomic Databases ◽

Bioinformatic Pipeline ◽

Viral Genomes ◽

Sequencing Technologies ◽

Viral Screening

According to various estimates, only a small percentage of existing viruses have been discovered, naturally much less being represented in the genomic databases. High-throughput sequencing technologies develop rapidly, empowering large-scale screening of various biological samples for the presence of pathogen-associated nucleotide sequences, but many organisms are yet to be attributed specific loci for identification. This problem particularly impedes viral screening, due to vast heterogeneity in viral genomes. In this paper, we present a new bioinformatic pipeline, VirIdAl, for detecting and identifying viral pathogens in sequencing data. We also demonstrate the utility of the new software by applying it to viral screening of the feces of bats collected in the Moscow region, which revealed a significant variety of viruses associated with bats, insects, plants, and protozoa. The presence of alpha and beta coronavirus reads, including the MERS-like bat virus, deserves a special mention, as it once again indicates that bats are indeed reservoirs for many viral pathogens. In addition, it was shown that alignment-based methods were unable to identify the taxon for a large proportion of reads, and we additionally applied other approaches, showing that they can further reveal the presence of viral agents in sequencing data. However, the incompleteness of viral databases remains a significant problem in the studies of viral diversity, and therefore necessitates the use of combined approaches, including those based on machine learning methods.

Download Full-text