Multiple Occurrences of a 168-Nucleotide Deletion in SARS-CoV-2 ORF8, Unnoticed by Standard Amplicon Sequencing and Variant Calling Pipelines

David Brandt; Marina Simunovic; Tobias Busche; Markus Haak; Peter Belmann; Sebastian Jünemann; Tizian Schulz; Levin Joe Klages; Svenja Vinke; Michael Beckstette; Ehmke Pohl; Christiane Scherer; Alexander Sczyrba; Jörn Kalinowski

doi:10.3390/v13091870

Multiple Occurrences of a 168-Nucleotide Deletion in SARS-CoV-2 ORF8, Unnoticed by Standard Amplicon Sequencing and Variant Calling Pipelines

Viruses ◽

10.3390/v13091870 ◽

2021 ◽

Vol 13 (9) ◽

pp. 1870

Author(s):

David Brandt ◽

Marina Simunovic ◽

Tobias Busche ◽

Markus Haak ◽

Peter Belmann ◽

...

Keyword(s):

Variant Calling ◽

Amplicon Sequencing ◽

Virus Genome ◽

Consensus Building ◽

Coding Region ◽

Local Hospital ◽

Standard Data ◽

Large Deletions ◽

Hospital Outbreak ◽

Almost All

Genomic surveillance of the SARS-CoV-2 pandemic is crucial and mainly achieved by amplicon sequencing protocols. Overlapping tiled-amplicons are generated to establish contiguous SARS-CoV-2 genome sequences, which enable the precise resolution of infection chains and outbreaks. We investigated a SARS-CoV-2 outbreak in a local hospital and used nanopore sequencing with a modified ARTIC protocol employing 1200 bp long amplicons. We detected a long deletion of 168 nucleotides in the ORF8 gene in 76 samples from the hospital outbreak. This deletion is difficult to identify with the classical amplicon sequencing procedures since it removes two amplicon primer-binding sites. We analyzed public SARS-CoV-2 sequences and sequencing read data from ENA and identified the same deletion in over 100 genomes belonging to different lineages of SARS-CoV-2, pointing to a mutation hotspot or to positive selection. In almost all cases, the deletion was not represented in the virus genome sequence after consensus building. Additionally, further database searches point to other deletions in the ORF8 coding region that have never been reported by the standard data analysis pipelines. These findings and the fact that ORF8 is especially prone to deletions, make a clear case for the urgent necessity of public availability of the raw data for this and other large deletions that might change the physiology of the virus towards endemism.

Download Full-text

First Detection of Tobacco Mosaic Virus in Tobacco Fields in Northern Lebanon

Infectious Disorders - Drug Targets ◽

10.2174/1871526520666200928164057 ◽

2020 ◽

Vol 20 ◽

Author(s):

Rami Obeid ◽

Elias Wehbe ◽

Mohamad Rima ◽

Mohammad Kabara ◽

Romeo Al Bersaoui ◽

...

Keyword(s):

Tobacco Mosaic Virus ◽

Mosaic Virus ◽

Viral Genome ◽

Virus Genome ◽

Enzyme Linked Immunosorbent Assay ◽

Virus Family ◽

Crop Fields ◽

Wide Range ◽

Almost All ◽

Das Elisa

Background: Tobacco mosaic virus (TMV) is the most known virus in the plant mosaic virus family and is able to infect a wide range of crops, in particularly tobacco, causing a production loss. Objectives: Herein, and for the first time in Lebanon, we investigated the presence of TMV infection in crops by analyzing 88 samples of tobacco, tomato, cucumber and pepper collected from different regions in North Lebanon. Methods: Double-antibody sandwich enzyme-linked immunosorbent assay (DAS-ELISA), revealed a potential TMV infection of four tobacco samples out of 88 crops samples collected. However, no tomato, cucumber and pepper samples were infected. The TMV+ tobacco samples were then extensively analyzed by RT-PCR to detect viral RNA using different primers covering all the viral genome. Results and Discussion: PCR results confirmed those of DAS-ELISA showing TMV infection of four tobacco samples collected from three crop fields of North Lebanon. In only one of four TMV+ samples, we were able to amplify almost all the regions of viral genome, suggesting possible mutations in the virus genome or an infection with a new, not yet identified, TMV strain. Conclusion: Our study is the first in Lebanon revealing TMV infection in crop fields, and highlighting the danger that may affect the future of agriculture.

Download Full-text

Functional alterations caused by mutations reflect evolutionary trends of SARS-CoV-2

Briefings in Bioinformatics ◽

10.1093/bib/bbab042 ◽

2021 ◽

Author(s):

Liang Cheng ◽

Xudong Han ◽

Zijun Zhu ◽

Changlu Qi ◽

Ping Wang ◽

...

Keyword(s):

Reference Genome ◽

Sequence Data ◽

Purifying Selection ◽

Virus Genome ◽

Receptor Binding Domain ◽

Evolutionary Trends ◽

Synonymous Mutations ◽

Almost All ◽

Virus Strains ◽

New Mutations

Abstract Since the first report of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in December 2019, the COVID-19 pandemic has spread rapidly worldwide. Due to the limited virus strains, few key mutations that would be very important with the evolutionary trends of virus genome were observed in early studies. Here, we downloaded 1809 sequence data of SARS-CoV-2 strains from GISAID before April 2020 to identify mutations and functional alterations caused by these mutations. Totally, we identified 1017 nonsynonymous and 512 synonymous mutations with alignment to reference genome NC_045512, none of which were observed in the receptor-binding domain (RBD) of the spike protein. On average, each of the strains could have about 1.75 new mutations each month. The current mutations may have few impacts on antibodies. Although it shows the purifying selection in whole-genome, ORF3a, ORF8 and ORF10 were under positive selection. Only 36 mutations occurred in 1% and more virus strains were further analyzed to reveal linkage disequilibrium (LD) variants and dominant mutations. As a result, we observed five dominant mutations involving three nonsynonymous mutations C28144T, C14408T and A23403G and two synonymous mutations T8782C, and C3037T. These five mutations occurred in almost all strains in April 2020. Besides, we also observed two potential dominant nonsynonymous mutations C1059T and G25563T, which occurred in most of the strains in April 2020. Further functional analysis shows that these mutations decreased protein stability largely, which could lead to a significant reduction of virus virulence. In addition, the A23403G mutation increases the spike-ACE2 interaction and finally leads to the enhancement of its infectivity. All of these proved that the evolution of SARS-CoV-2 is toward the enhancement of infectivity and reduction of virulence.

Download Full-text

The Diversity of Root-Associated Endophytic Fungi from Four Epiphytic Orchids in China

Diversity ◽

10.3390/d13050197 ◽

2021 ◽

Vol 13 (5) ◽

pp. 197

Author(s):

Tao Wang ◽

Miao Chi ◽

Ling Guo ◽

Donghuan Liu ◽

Yu Yang ◽

...

Keyword(s):

Endophytic Fungi ◽

Southern China ◽

Culture Method ◽

Amplicon Sequencing ◽

Community Diversity ◽

Mycorrhizal Symbiosis ◽

Epiphytic Orchids ◽

Culture Independent ◽

Almost All

Root-associated endophytic fungi (RAF) are found asymptomatically in almost all plant groups. However, little is known about the compositions and potential functions of RAF communities associated with most Orchidaceae species. In this study, the diversity of RAF was examined in four wild epiphytic orchids, Acampe rigida, Doritis pulcherrima, Renanthera coccinea, and Robiquetia succisa, that occur in southern China. A culture-independent method involving Illumina amplicon sequencing, and an in vitro culture method, were used to identify culturable fungi. The RAF community diversity differed among the orchid roots, and some fungal taxa were clearly concentrated in a certain orchid species, with more OTUs being detected. By investigating mycorrhizal associations, the results showed that 28 (about 0.8%) of the 3527 operational taxonomic units (OTUs) could be assigned as OMF, while the OTUs of non-mycorrhizal fungal were about 99.2%. Among the OMFs, Ceratobasidiaceae OTUs were the most abundant with different richness, followed by Thelephoraceae. In addition, five Ceratobasidium sp. strains were isolated from D. pulcherrima, R. succisa, and R. coccinea roots with high separation rates. These culturable Ceratobasidium strains will provide materials for host orchid conservation and for studying the mechanisms underlying mycorrhizal symbiosis.

Download Full-text

Biochemical and genetic analysis of the role of the viral polymerase in enterovirus recombination

Nucleic Acids Research ◽

10.1093/nar/gkw567 ◽

2016 ◽

Vol 44 (14) ◽

pp. 6883-6895 ◽

Cited By ~ 20

Author(s):

Andrew Woodman ◽

Jamie J. Arnold ◽

Craig E. Cameron ◽

David J. Evans

Keyword(s):

Genetic Recombination ◽

Evolutionary Process ◽

Virus Genome ◽

Biochemical Assay ◽

Strand Transfer ◽

Turnover Rates ◽

Coding Region ◽

Cis Acting ◽

Polymerase Fidelity

Abstract Genetic recombination in single-strand, positive-sense RNA viruses is a poorly understand mechanism responsible for generating extensive genetic change and novel phenotypes. By moving a critical cis-acting replication element (CRE) from the polyprotein coding region to the 3′ non-coding region we have further developed a cell-based assay (the 3′CRE-REP assay) to yield recombinants throughout the non-structural coding region of poliovirus from dually transfected cells. We have additionally developed a defined biochemical assay in which the only protein present is the poliovirus RNA dependent RNA polymerase (RdRp), which recapitulates the strand transfer events of the recombination process. We have used both assays to investigate the role of the polymerase fidelity and nucleotide turnover rates in recombination. Our results, of both poliovirus intertypic and intratypic recombination in the CRE-REP assay and using a range of polymerase variants in the biochemical assay, demonstrate that RdRp fidelity is a fundamental determinant of recombination frequency. High fidelity polymerases exhibit reduced recombination and low fidelity polymerases exhibit increased recombination in both assays. These studies provide the basis for the analysis of poliovirus recombination throughout the non-structural region of the virus genome and provide a defined biochemical assay to further dissect this important evolutionary process.

Download Full-text

Forfeiting the founder effect: turnover defines biofilm community succession

10.1101/282574 ◽

2018 ◽

Cited By ~ 2

Author(s):

Colin J. Brislawn ◽

Emily B. Graham ◽

Karl Dana ◽

Peter Ihardt ◽

Sarah J. Fansler ◽

...

Keyword(s):

Bacterial Colonization ◽

Amplicon Sequencing ◽

18S Rrna Gene ◽

Rrna Gene ◽

Community Succession ◽

Ecological Processes ◽

Functional Capabilities ◽

Resolution Imaging ◽

Successional Stages ◽

Almost All

ABSTRACTMicrobial community succession is a fundamental process that effects underlying functions of almost all ecosystems; yet the roles and fates of the most abundant colonizers are poorly understood. Does early abundance spur long term persistence? How do deterministic and stochastic processes influence the roles of founder species? We performed a succession experiment within a hypersaline microbial mat ecosystem to investigate how ecological processes contributed to the turnover of founder species. Bacterial and micro-eukaryotic founder species were identified from primary succession and tracked through a defined maturation period using 16S and 18S rRNA gene amplicon sequencing in combination with high resolution imaging that utilized stable isotope tracers to evaluate basic functional capabilities. The majority of the founder species did not maintain high relative abundances in later stages of succession. Turnover (versus nestedness) was the dominant process shaping the final community structure. We also asked if different ecological processes acted on bacteria versus eukaryotes during successional stages and found that deterministic and stochastic forces corresponded more with eukaryote and bacterial colonization, respectively. Our results show that taxa from different kingdoms, that share habitat in the tight spatial confines of a biofilm, were influenced by different ecological forces and time scales of succession.

Download Full-text

Relative and quantitative rhizosphere microbiome profiling result in distinct abundance patterns

10.1101/2021.02.19.431941 ◽

2021 ◽

Author(s):

Hamed Azarbad ◽

Julien Tremblay ◽

Luke D. Bainard ◽

Etienne Yergeau

Keyword(s):

Water Stress ◽

Relative Abundance ◽

Its Region ◽

Cost Effective ◽

Amplicon Sequencing ◽

Rrna Gene ◽

Real Time Quantitative Pcr ◽

Rhizosphere Microbiome ◽

History Of ◽

Almost All

AbstractNext-generation sequencing is recognized as one of the most popular and cost-effective way of characterizing microbiome in multiple samples. However, most of the currently available amplicon sequencing approaches are inherently limited, as they are often presented based on the relative abundance of microbial taxa, which may not fully represent actual microbiome profiles. Here, we combined amplicon sequencing (16S rRNA gene for bacteria and ITS region for fungi) with real-time quantitative PCR (qPCR) to characterize the rhizosphere microbiome of wheat. We show that the increase in relative abundance of major microbial phyla does not necessarily result in an increase in abundance. One striking observation when comparing relative and quantitative abundances was a substantial increase in the abundance of almost all phyla associated with the rhizosphere of plants grown in soil with no history of water stress as compared with the rhizosphere of plants growing in soil with a history of water stress, which was in contradiction with the trends observed in the relative abundance data. Our results suggest that the estimated absolute abundance approach gives a different perspective than the relative abundance approach, providing complementary information that helps to better understand the rhizosphere microbiome.

Download Full-text

A unified haplotype-based method for accurate and comprehensive variant calling

10.1101/456103 ◽

2018 ◽

Cited By ~ 3

Author(s):

Daniel P Cooke ◽

David C Wedge ◽

Gerton Lunter

Keyword(s):

De Novo ◽

Variant Calling ◽

Normal Sample ◽

Sequencing Data ◽

Somatic Variation ◽

Data Set ◽

Small Complex ◽

Physical Linkage ◽

Germline Variation ◽

Almost All

Haplotype-based variant callers, which consider physical linkage between variant sites, are currently among the best tools for germline variation discovery and genotyping from short-read sequencing data. However, almost all such tools were designed specifically for detecting common germline variation in diploid populations, and give sub-optimal results in other scenarios. Here we present Octopus, a versatile haplotype-based variant caller that uses a polymorphic Bayesian genotyping model capable of modeling sequencing data from a range of experimental designs within a unified haplotype-aware framework. We show that Octopus accurately calls de novo mutations in parent-offspring trios and germline variants in individuals, including SNVs, indels, and small complex replacements such as microinversions. In addition, using a carefully designed synthetic-tumour data set derived from clean sequencing data from a sample with known germline haplotypes, and observed mutations in large cohort of tumour samples, we show that Octopus accurately characterizes germline and somatic variation in tumours, both with and without a paired normal sample. Sequencing reads and prior information are combined to phase called genotypes of arbitrary ploidy, including those with somatic mutations. Octopus also outputs realigned evidence BAMs to aid validation and interpretation.

Download Full-text

Workstation benchmark of Spark Capable Genome Analysis ToolKit 4 Variant Calling

10.1101/2020.05.17.101105 ◽

2020 ◽

Author(s):

Marcus H. Hansen ◽

Anita T. Simonsen ◽

Hans B. Ommen ◽

Charlotte G. Nyvold

Keyword(s):

Dna Sequencing ◽

Genome Analysis ◽

High Speed ◽

High Performance ◽

Variant Calling ◽

Amplicon Sequencing ◽

Targeted Sequencing ◽

Sequencing Analysis ◽

Genome Analysis Toolkit ◽

Order Of Magnitude

AbstractBackgroundRapid and practical DNA-sequencing processing has become essential for modern biomedical laboratories, especially in the field of cancer, pathology and genetics. While sequencing turn-over time has been, and still is, a bottleneck in research and diagnostics, the field of bioinformatics is moving at a rapid pace – both in terms of hardware and software development. Here, we benchmarked the local performance of three of the most important Spark-enabled Genome analysis toolkit 4 (GATK4) tools in a targeted sequencing workflow: Duplicate marking, base quality score recalibration (BQSR) and variant calling on targeted DNA sequencing using a modest hyperthreading 12-core single CPU and a high-speed PCI express solid-state drive.ResultsCompared to the previous GATK version the performance of Spark-enabled BQSR and HaplotypeCaller is shifted towards a more efficient usage of the available cores on CPU and outperforms the earlier GATK3.8 version with an order of magnitude reduction in processing time to analysis ready variants, whereas MarkDuplicateSpark was found to be thrice as fast. Furthermore, HaploTypeCallerSpark and BQSRPipelineSpark were significantly faster than the equivalent GATK4 standard tools with a combined ∼86% reduction in execution time, reaching a median rate of ten million processed bases per second, and duplicate marking was reduced ∼42%. The called variants were found to be in close agreement between the Spark and non-Spark versions, with an overall concordance of 98%. In this setup, the tools were also highly efficient when compared execution on a small 72 virtual CPU/18-node Google Cloud cluster.ConclusionIn conclusion, GATK4 offers practical parallelization possibilities for DNA sequence processing, and the Spark-enabled tools optimize performance and utilization of local CPUs. Spark utilizing GATK variant calling is several times faster than previous GATK3.8 multithreading with the same multi-core, single CPU, configuration. The improved opportunities for parallel computations not only hold implications for high-performance cluster, but also for modest laboratory or research workstations for targeted sequencing analysis, such as exome, panel or amplicon sequencing.

Download Full-text

The Gender Gap in Secondary School Mathematics at High Achievement Levels: Evidence from the American Mathematics Competitions

The Journal of Economic Perspectives ◽

10.1257/jep.24.2.109 ◽

2010 ◽

Vol 24 (2) ◽

pp. 109-128 ◽

Cited By ~ 75

Author(s):

Glenn Ellison ◽

Ashley Swanson

Keyword(s):

High School Students ◽

Gender Gap ◽

Unobserved Heterogeneity ◽

High Achievement ◽

School Students ◽

Standard Data ◽

Small Set ◽

Data Source ◽

Almost All ◽

Achievement Levels

This paper uses a new data source, American Mathematics Competitions, to examine the gender gap among high school students at very high achievement levels. The data bring out several new facts. There is a large gender gap that widens dramatically at percentiles above those that can be examined using standard data sources. An analysis of unobserved heterogeneity indicates that there is only moderate variation in the gender gap across schools. The highest achieving girls in the U.S. are concentrated in a very small set of elite schools, suggesting that almost all girls with the ability to reach high math achievement levels are not doing so.

Download Full-text

Identification and functional characterization of new missense SNPs in the coding region of the TP53 gene

Cell Death and Differentiation ◽

10.1038/s41418-020-00672-0 ◽

2020 ◽

Author(s):

Flora Doffe ◽

Vincent Carbonnier ◽

Manon Tissier ◽

Bernard Leroy ◽

Isabelle Martins ◽

...

Keyword(s):

Ethnic Diversity ◽

Functional Characterization ◽

Variant Calling ◽

Asian Population ◽

Tp53 Gene ◽

Loss Of Function ◽

Coding Region ◽

Multiple Datasets ◽

Novel Variants ◽

Rare Genetic Variants

AbstractInfrequent and rare genetic variants in the human population vastly outnumber common ones. Although they may contribute significantly to the genetic basis of a disease, these seldom-encountered variants may also be miss-identified as pathogenic if no correct references are available. Somatic and germline TP53 variants are associated with multiple neoplastic diseases, and thus have come to serve as a paradigm for genetic analyses in this setting. We searched 14 independent, globally distributed datasets and recovered TP53 SNPs from 202,767 cancer-free individuals. In our analyses, 19 new missense TP53 SNPs, including five novel variants specific to the Asian population, were recurrently identified in multiple datasets. Using a combination of in silico, functional, structural, and genetic approaches, we showed that none of these variants displayed loss of function compared to the normal TP53 gene. In addition, classification using ACMG criteria suggested that they are all benign. Considered together, our data reveal that the TP53 coding region shows far more polymorphism than previously thought and present high ethnic diversity. They furthermore underline the importance of correctly assessing novel variants in all variant-calling pipelines associated with genetic diagnoses for cancer.

Download Full-text