A Complex of Badnavirus Species Infecting Cacao Reveals Mixed Infections, Extensive Genomic Variability, and Interspecific Recombination

Roberto Ramos-Sobrinho; Nomatter Chingandu; Osman A. Gutierrez; Jean-Philippe Marelli; Judith K. Brown

doi:10.3390/v12040443

A Complex of Badnavirus Species Infecting Cacao Reveals Mixed Infections, Extensive Genomic Variability, and Interspecific Recombination

Viruses ◽

10.3390/v12040443 ◽

2020 ◽

Vol 12 (4) ◽

pp. 443

Author(s):

Roberto Ramos-Sobrinho ◽

Nomatter Chingandu ◽

Osman A. Gutierrez ◽

Jean-Philippe Marelli ◽

Judith K. Brown

Keyword(s):

Phylogenetic Trees ◽

High Throughput Sequencing ◽

Purifying Selection ◽

Rnase H ◽

Genomic Diversity ◽

Viral Genomes ◽

Coding Regions ◽

B Virus ◽

Interspecific Recombination ◽

Theobroma Cacao L

The incidence of cacao swollen shoot disease (CSSD) in cacao (Theobroma cacao L.) has increased in West Africa since ~2000. To investigate the genomic and species diversity of the CSSD-badnaviruses infecting cacao in Côte d’Ivoire and Ghana, symptomatic leaves were subjected to high-throughput sequencing. Among the 30 newly determined genomes, three badnaviruses were identified, Cacao swollen shoot Togo B virus (CSSTBV), Cacao swollen shoot CD virus, and Cacao swollen shoot CE virus (CSSCEV). The phylogenetic trees reconstructed for the reverse transcriptase (RT) and ribonuclease H (RNase H) sequences were incongruent with the complete viral genomes, which had the most robust statistical support. Recombination seems to be involved in the CSSD-badnavirus diversification. The genomic diversity varied among different CSSD-badnaviruses, with CSSTBV showing the lowest nucleotide diversity (π = 0.06236), and CSSCEV exhibiting the greatest variability (π = 0.21911). Evidence of strong purifying selection was found in the coding regions of the CSSTBV isolates.

Download Full-text

One is not enough: On the effects of reference genome for the mapping and subsequent analyses of short-reads

PLoS Computational Biology ◽

10.1371/journal.pcbi.1008678 ◽

2021 ◽

Vol 17 (1) ◽

pp. e1008678

Author(s):

Carlos Valiente-Mullor ◽

Beatriz Beamud ◽

Iván Ansari ◽

Carlos Francés-Cuesta ◽

Neris García-González ◽

...

Keyword(s):

Legionella Pneumophila ◽

Phylogenetic Trees ◽

High Throughput Sequencing ◽

Reference Genome ◽

Sequence Data ◽

Genetic Distances ◽

Genomic Diversity ◽

Nucleotide Polymorphisms ◽

Recombination Rates ◽

Almost All

Mapping of high-throughput sequencing (HTS) reads to a single arbitrary reference genome is a frequently used approach in microbial genomics. However, the choice of a reference may represent a source of errors that may affect subsequent analyses such as the detection of single nucleotide polymorphisms (SNPs) and phylogenetic inference. In this work, we evaluated the effect of reference choice on short-read sequence data from five clinically and epidemiologically relevant bacteria (Klebsiella pneumoniae, Legionella pneumophila, Neisseria gonorrhoeae, Pseudomonas aeruginosa and Serratia marcescens). Publicly available whole-genome assemblies encompassing the genomic diversity of these species were selected as reference sequences, and read alignment statistics, SNP calling, recombination rates, dN/dS ratios, and phylogenetic trees were evaluated depending on the mapping reference. The choice of different reference genomes proved to have an impact on almost all the parameters considered in the five species. In addition, these biases had potential epidemiological implications such as including/excluding isolates of particular clades and the estimation of genetic distances. These findings suggest that the single reference approach might introduce systematic errors during mapping that affect subsequent analyses, particularly for data sets with isolates from genetically diverse backgrounds. In any case, exploring the effects of different references on the final conclusions is highly recommended.

Download Full-text

Genomic Diversity in a Population of Spodoptera frugiperda Nucleopolyhedrovirus

10.1101/2020.10.27.358317 ◽

2020 ◽

Author(s):

Tomás Masson ◽

María Laura Fabre ◽

Matias Luis Pidre ◽

José María Niz ◽

Marcelo Facundo Berretta ◽

...

Keyword(s):

Spodoptera Frugiperda ◽

High Throughput Sequencing ◽

Biological Control Agent ◽

Fall Armyworm ◽

Control Agent ◽

Genomic Diversity ◽

List Type ◽

Structural Variants ◽

Single Nucleotide Variants ◽

Coding Regions

AbstractSpodoptera frugiperda multiple nucleopolyhedrovirus (SfMNPV) represents a strong candidate to develop environmental-friendly pesticides against the fall armyworm (Spodoptera frugiperda), a widespread pest that poses a severe threat to different crops around the world. However, little is known regarding the genomic diversity present inside SfMNPV isolates and how it shapes the interactions between virus and host. Here, the genomic diversity present inside an isolate of SfMNPV was explored using high-throughput sequencing for the first time. We identified 704 intrahost single nucleotide variants, from which 184 are nonsynonymous mutations distributed among 82 different coding sequences. We detected several structural variants affecting SfMNPV genome, including two previously reported deletions inside the egt region. A comparative analysis between polymorphisms present in different SfMNPV isolates and our intraisolate diversity data suggests that coding regions with higher genetic diversity are associated with oral infectivity or unknown functions. In this context, through molecular evolution studies we provide evidence of diversifying selection acting on sf29, a putative collagenase which could contribute to the oral infectivity of SfMNPV. Overall, our results contribute to deepen our understanding of the coevolution between SfMNPV and the fall armyworm and will be useful to improve the applicability of this virus as a biological control agent.HighlightsWe characterized the genomic diversity within a population of SfMNPV.Coding regions with higher genetics diversity are associated with oral infectivity or unknown functions.Several structural variants contribute to the genomic diversity of SfMNPV.Sf29, a putative collagenase, shows signs of adaptive evolution.

Download Full-text

One is not enough: on the effects of reference genome for the mapping and subsequent analyses of short-reads

10.1101/2020.04.14.041004 ◽

2020 ◽

Author(s):

Carlos Valiente-Mullor ◽

Beatriz Beamud ◽

Iván Ansari ◽

Carlos Francés-Cuesta ◽

Neris García-González ◽

...

Keyword(s):

High Throughput ◽

Legionella Pneumophila ◽

Phylogenetic Trees ◽

High Throughput Sequencing ◽

Reference Genome ◽

Bacterial Species ◽

Genomic Diversity ◽

Reference Sequence ◽

The Impact ◽

Reference Genomes

AbstractMapping of high-throughput sequencing (HTS) reads to a single arbitrary reference genome is a frequently used approach in microbial genomics. However, the choice of a reference may represent a source of errors that may affect subsequent analyses such as the detection of single nucleotide polymorphisms (SNPs) and phylogenetic inference. In this work, we evaluated the effect of reference choice on short-read sequence data from five clinically and epidemiologically relevant bacteria (Klebsiella pneumoniae, Legionella pneumophila, Neisseria gonorrhoeae, Pseudomonas aeruginosa and Serratia marcescens). Publicly available whole-genome assemblies encompassing the genomic diversity of these species were selected as reference sequences, and read alignment statistics, SNP calling, recombination rates, dN/dS ratios, and phylogenetic trees were evaluated depending on the mapping reference. The choice of different reference genomes proved to have an impact on almost all the parameters considered in the five species. In addition, these biases had potential epidemiological implications such as including/excluding isolates of particular clades and the estimation of genetic distances. These findings suggest that the single reference approach might introduce systematic errors during mapping that affect subsequent analyses, particularly for data sets with isolates from genetically diverse backgrounds. In any case, exploring the effects of different references on the final conclusions is highly recommended.Author summaryMapping consists in the alignment of reads (i.e., DNA fragments) obtained through high-throughput genome sequencing to a previously assembled reference sequence. It is a common practice in genomic studies to use a single reference for mapping, usually the ‘reference genome’ of a species —a high-quality assembly. However, the selection of an optimal reference is hindered by intrinsic intra-species genetic variability, particularly in bacteria. Biases/errors due to reference choice for mapping in bacteria have been identified. These are mainly originated in alignment errors due to genetic differences between the reference genome and the read sequences. Eventually, they could lead to misidentification of variants and biased reconstruction of phylogenetic trees (which reflect ancestry between different bacterial lineages). However, a systematic work on the effects of reference choice in different bacterial species is still missing, particularly regarding its impact on phylogenies. This work intended to fill that gap. The impact of reference choice has proved to be pervasive in the five bacterial species that we have studied and, in some cases, alterations in phylogenetic trees could lead to incorrect epidemiological inferences. Hence, the use of different reference genomes may be prescriptive to assess the potential biases of mapping.

Download Full-text

Phylogeographic Genetic Diversity in the White Sucker Hepatitis B Virus across the Great Lakes Region and Alberta, Canada

Viruses ◽

10.3390/v13020285 ◽

2021 ◽

Vol 13 (2) ◽

pp. 285

Author(s):

Cynthia R. Adams ◽

Vicki S. Blazer ◽

Jim Sherry ◽

Robert Scott Cornman ◽

Luke R. Iwanowicz

Keyword(s):

Genetic Diversity ◽

Hepatitis B Virus ◽

Hepatitis B ◽

Lake Michigan ◽

Illumina Miseq ◽

Genomic Variation ◽

Genomic Diversity ◽

White Sucker ◽

Fish Health ◽

B Virus

Hepatitis B viruses belong to a family of circular, double-stranded DNA viruses that infect a range of organisms, with host responses that vary from mild infection to chronic infection and cancer. The white sucker hepatitis B virus (WSHBV) was first described in the white sucker (Catostomus commersonii), a freshwater teleost, and belongs to the genus Parahepadnavirus. At present, the host range of WSHBV and its impact on fish health are unknown, and neither genetic diversity nor association with fish health have been studied in any parahepadnavirus. Given the relevance of genomic diversity to disease outcome for the orthohepadnaviruses, we sought to characterize genomic variation in WSHBV and determine how it is structured among watersheds. We identified WSHBV-positive white sucker inhabiting tributaries of Lake Michigan, Lake Superior, Lake Erie (USA), and Lake Athabasca (Canada). Copy number in plasma and in liver tissue was estimated via qPCR. Templates from 27 virus-positive fish were amplified and sequenced using a primer-specific, circular long-range amplification method coupled with amplicon sequencing on the Illumina MiSeq. Phylogenetic analysis of the WSHBV genome identified phylogeographical clustering reminiscent of that observed with human hepatitis B virus genotypes. Notably, most non-synonymous substitutions were found to cluster in the pre-S/spacer overlap region, which is relevant for both viral entry and replication. The observed predominance of p1/s3 mutations in this region is indicative of adaptive change in the polymerase open reading frame (ORF), while, at the same time, the surface ORF is under purifying selection. Although the levels of variation we observed do not meet the criteria used to define sub/genotypes of human and avian hepadnaviruses, we identified geographically associated genome variation in the pre-S and spacer domain sufficient to define five WSHBV haplotypes. This study of WSHBV genetic diversity should facilitate the development of molecular markers for future identification of genotypes and provide evidence in future investigations of possible differential disease outcomes.

Download Full-text

Illuminating the Plant Rhabdovirus Landscape through Metatranscriptomics Data

Viruses ◽

10.3390/v13071304 ◽

2021 ◽

Vol 13 (7) ◽

pp. 1304

Author(s):

Nicolás Bejerman ◽

Ralf G. Dietzgen ◽

Humberto Debat

Keyword(s):

Plant Species ◽

High Throughput Sequencing ◽

Plant Viruses ◽

The Novel ◽

Coding Regions ◽

Public Data ◽

Invaluable Tool ◽

Sequencing Platforms ◽

Viral Sequences ◽

Plant Rhabdovirus

Rhabdoviruses infect a large number of plant species and cause significant crop diseases. They have a negative-sense, single-stranded unsegmented or bisegmented RNA genome. The number of plant-associated rhabdovirid sequences has grown in the last few years in concert with the extensive use of high-throughput sequencing platforms. Here, we report the discovery of 27 novel rhabdovirus genomes associated with 25 different host plant species and one insect, which were hidden in public databases. These viral sequences were identified through homology searches in more than 3000 plant and insect transcriptomes from the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) using known plant rhabdovirus sequences as the query. The identification, assembly and curation of raw SRA reads resulted in sixteen viral genome sequences with full-length coding regions and ten partial genomes. Highlights of the obtained sequences include viruses with unique and novel genome organizations among known plant rhabdoviruses. Phylogenetic analysis showed that thirteen of the novel viruses were related to cytorhabdoviruses, one to alphanucleorhabdoviruses, five to betanucleorhabdoviruses, one to dichorhaviruses and seven to varicosaviruses. These findings resulted in the most complete phylogeny of plant rhabdoviruses to date and shed new light on the phylogenetic relationships and evolutionary landscape of this group of plant viruses. Furthermore, this study provided additional evidence for the complexity and diversity of plant rhabdovirus genomes and demonstrated that analyzing SRA public data provides an invaluable tool to accelerate virus discovery, gain evolutionary insights and refine virus taxonomy.

Download Full-text

Unravelling the hidden inter and intra-varietal diversity of durum wheat commercial varieties used in Portugal

Plant Genetic Resources ◽

10.1017/s1479262119000133 ◽

2019 ◽

Vol 17 (04) ◽

pp. 386-389

Author(s):

Miguel Bento ◽

Sónia Gomes Pereira ◽

Wanda Viegas ◽

Manuela Silva

Keyword(s):

Durum Wheat ◽

Repetitive Sequences ◽

Inter Simple Sequence Repeat ◽

Wheat Breeding ◽

Genomic Diversity ◽

Varietal Diversity ◽

Wheat Varieties ◽

Coding Sequences ◽

Coding Regions ◽

High Level

AbstractAssessing durum wheat genomic diversity is crucial in a changing environmental particularly in the Mediterranean region where it is largely used to produce pasta. Durum wheat varieties cultivated in Portugal and previously assessed regarding thermotolerance ability were screened for the variability of coding sequences associated with technological traits and repetitive sequences. As expected, reduced variability was observed regarding low molecular weight glutenin subunits (LMW-GS) but a specific LMW-GS allelic form associated with improved pasta-making characteristics was absent in one variety. Contrastingly, molecular markers targeting repetitive elements like microsatellites and retrotransposons – Inter Simple Sequence Repeat (ISSR) and Inter Retrotransposons Amplified Polymorphism (IRAP) – disclosed significant inter and intra-varietal diversity. This high level of polymorphism was revealed by the 20 distinct ISSR/IRAP concatenated profiles observed among the 23 individuals analysed. Interestingly, median joining networks and PCoA analysis grouped individuals of the same variety and clustered varieties accordingly with geographical origin. Globally, this work demonstrates that durum wheat breeding strategies induced selection pressure for some relevant coding sequences while maintaining high levels of genomic variability in non-coding regions enriched in repetitive sequences.

Download Full-text

Hepatitis B virus polymerase restricts LINE-1 retrotransposition

10.1101/2021.05.07.443105 ◽

2021 ◽

Author(s):

Yasuo Ariumi

Keyword(s):

Hepatitis B Virus ◽

Hepatitis B ◽

Rna Binding ◽

Rnase H ◽

Ribonuclease H ◽

Independent Manner ◽

Cytoplasmic Rna ◽

B Virus ◽

Long Interspersed Element ◽

L1 Retrotransposon

Long interspersed element-1 (LINE-1, L1) retrotransposon composes about 17% of the human genome. However, genetic and biochemical interactions between L1 and hepatitis B virus (HBV) remain poorly understood. In this study, we found that HBV restricts L1 mobility without inhibiting the L1 promoter activity. Notably, HBV polymerase (Pol) strongly inhibited L1 retrotransposition in a reverse transcriptase (RT)-independent manner. Indeed, the ribonuclease H (RNase H) domain was essential for inhibition of L1 retrotransposition. L1 ORF1p RNA-binding protein predominantly localized into cytoplasmic RNA granule termed P-body. However, HBV Pol sequestered L1 ORF1p from P-body and colocalized with L1 ORF1p in cytoplasm, when both proteins were co-expressed. Altogether, HBV Pol seems to restrict L1 mobility through a sequestration of L1 ORF1p from P-body. Thus, these results suggest a novel function or activity of HBV Pol in regulation of L1 retrotransposition.

Download Full-text

Whole genome sequencing reveals the genomic diversity, taxonomic classification, and evolutionary relationships of the genus Nocardia

PLoS Neglected Tropical Diseases ◽

10.1371/journal.pntd.0009665 ◽

2021 ◽

Vol 15 (8) ◽

pp. e0009665

Author(s):

Shuai Xu ◽

Zhenpeng Li ◽

Yuanming Huang ◽

Lichao Han ◽

Yanlin Che ◽

...

Keyword(s):

Genetic Diversity ◽

Phylogenetic Trees ◽

Gene Families ◽

Single Copy ◽

Taxonomic Classification ◽

Genomic Diversity ◽

Evolutionary Relationships ◽

Taxonomic Structure ◽

Pan Genome ◽

Dipeptidyl Aminopeptidase

Nocardia is a complex and diverse genus of aerobic actinomycetes that cause complex clinical presentations, which are difficult to diagnose due to being misunderstood. To date, the genetic diversity, evolution, and taxonomic structure of the genus Nocardia are still unclear. In this study, we investigated the pan-genome of 86 Nocardia type strains to clarify their genetic diversity. Our study revealed an open pan-genome for Nocardia containing 265,836 gene families, with about 99.7% of the pan-genome being variable. Horizontal gene transfer appears to have been an important evolutionary driver of genetic diversity shaping the Nocardia genome and may have caused historical taxonomic confusion from other taxa (primarily Rhodococcus, Skermania, Aldersonia, and Mycobacterium). Based on single-copy gene families, we established a high-accuracy phylogenomic approach for Nocardia using 229 genome sequences. Furthermore, we found 28 potentially new species and reclassified 16 strains. Finally, by comparing the topology between a phylogenomic tree and 384 phylogenetic trees (from 384 single-copy genes from the core genome), we identified a novel locus for inferring the phylogeny of this genus. The dapb1 gene, which encodes dipeptidyl aminopeptidase BI, was far superior to commonly used markers for Nocardia and yielded a topology almost identical to that of genome-based phylogeny. In conclusion, the present study provides insights into the genetic diversity, contributes a robust framework for the taxonomic classification, and elucidates the evolutionary relationships of Nocardia. This framework should facilitate the development of rapid tests for the species identification of highly variable species and has given new insight into the behavior of this genus.

Download Full-text

Implication of the Identification of an Earlier Pseudorabies Virus (PRV) Strain HLJ-2013 to the Evolution of Chinese PRVs

Frontiers in Microbiology ◽

10.3389/fmicb.2020.612474 ◽

2020 ◽

Vol 11 ◽

Author(s):

Huimin Liu ◽

Zhibin Shi ◽

Chunguo Liu ◽

Pengfei Wang ◽

Ming Wang ◽

...

Keyword(s):

Phylogenetic Trees ◽

Pseudorabies Virus ◽

High Throughput Sequencing ◽

Genomic Sequence ◽

Full Genome Sequence ◽

Genome Sequences ◽

Protein Coding ◽

One Step ◽

Human Infections ◽

Full Length Genome

Pseudorabies viruses (PRVs) pose a great threat to the pig industry of many countries around the world. Human infections with PRV have also been reported occasionally in China. Therefore, understanding the epidemiology and evolution of PRVs is of great importance for disease control in the pig populations and humans as well. In this study, we isolated a PRV designated HLJ-2013 from PRV-positive samples that had been collected in Heilongjiang, China, in 2013. The full genome sequence of the virus was determined to be ∼143 kbp in length using high-throughput sequencing. The genomic sequence identities between this isolate and 21 other previous PRV isolates ranged from 92.4% (with Bartha) to 97.3% (with SC). Phylogenetic analysis based on the full-length genome sequences revealed that PRV HLJ-2013 clustered together with all the Chinese strains in one group belonging to Genotype II, but this virus occurred phylogenetically earlier than all the other Chinese PRV strains. Phylogenetic trees based on both protein-coding genes and non-coding regions revealed that HLJ-2013 probably obtained its genome sequences from three origins: a yet unknown parent virus, the European viruses, and the same ancestor of all Chinese PRVs. Recombination analysis showed that HLJ-2013-like virus possibly donated the main framework of the genome of the Chinese PRVs. HLJ-2013 exhibited cytopathic and growth characteristics similar to that of the Chinese PRV strains SC and HeN1, but its pathogenicity in mice was higher than that of SC and lower than that of HeN1. The identification of HLJ-2013 takes us one step closer to understanding the origin of PRVs in China and provides new knowledge about the evolution of PRVs worldwide.

Download Full-text

Utilizing the VirIdAl Pipeline to Search for Viruses in the Metagenomic Data of Bat Samples

Viruses ◽

10.3390/v13102006 ◽

2021 ◽

Vol 13 (10) ◽

pp. 2006

Author(s):

Anna Y Budkina ◽

Elena V Korneenko ◽

Ivan A Kotov ◽

Daniil A Kiselev ◽

Ilya V Artyushin ◽

...

Keyword(s):

Large Scale ◽

High Throughput Sequencing ◽

Metagenomic Data ◽

Sequencing Data ◽

Viral Pathogens ◽

Genomic Databases ◽

Bioinformatic Pipeline ◽

Viral Genomes ◽

Sequencing Technologies ◽

Viral Screening

According to various estimates, only a small percentage of existing viruses have been discovered, naturally much less being represented in the genomic databases. High-throughput sequencing technologies develop rapidly, empowering large-scale screening of various biological samples for the presence of pathogen-associated nucleotide sequences, but many organisms are yet to be attributed specific loci for identification. This problem particularly impedes viral screening, due to vast heterogeneity in viral genomes. In this paper, we present a new bioinformatic pipeline, VirIdAl, for detecting and identifying viral pathogens in sequencing data. We also demonstrate the utility of the new software by applying it to viral screening of the feces of bats collected in the Moscow region, which revealed a significant variety of viruses associated with bats, insects, plants, and protozoa. The presence of alpha and beta coronavirus reads, including the MERS-like bat virus, deserves a special mention, as it once again indicates that bats are indeed reservoirs for many viral pathogens. In addition, it was shown that alignment-based methods were unable to identify the taxon for a large proportion of reads, and we additionally applied other approaches, showing that they can further reveal the presence of viral agents in sequencing data. However, the incompleteness of viral databases remains a significant problem in the studies of viral diversity, and therefore necessitates the use of combined approaches, including those based on machine learning methods.

Download Full-text