Genomic characterization and phylogenetic analysis of the first SARS-CoV-2 variants introduced in Lebanon

PeerJ ◽

10.7717/peerj.11015 ◽

2021 ◽

Vol 9 ◽

pp. e11015

Author(s):

Rita Feghali ◽

Georgi Merhi ◽

Aurelia Kwasiborski ◽

Veronique Hourdel ◽

Nada Ghosn ◽

...

Keyword(s):

Phylogenetic Analysis ◽

Consensus Sequence ◽

Unknown Origin ◽

International Travel ◽

Molecular Diagnostic ◽

Diagnostic Tools ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

Genetic Lineages ◽

Consensus Sequences

Background In December 2019, the COVID-19 pandemic initially erupted from a cluster of pneumonia cases of unknown origin in the city of Wuhan, China. Presently, it has almost reached 94 million cases worldwide. Lebanon on the brink of economic collapse and its healthcare system thrown into turmoil, has previously managed to cope with the initial SARS-CoV-2 wave. In this study, we sequenced 11 viral genomes from positive cases isolated between 2 February 2020 and 15 March 2020. Methods Sequencing data was quality controlled, consensus sequences generated, and a maximum-likelihood tree was generated with IQTREE v2. Genetic lineages were assigned with Pangolin v1.1.14 and single nucleotide variants (SNVs) were called from read files and manually curated from consensus sequence alignment through JalView v2.11 and the genomic mutational interference with molecular diagnostic tools was assessed with the CoV-GLUE pipeline. Phylogenetic analysis of whole genome sequences confirmed a multiple introduction scenario due to international travel. Results Three major lineages were identified to be circulating in Lebanon in the studied period. The B.1 (20A clade) was the most prominent, followed by the B.4 lineage (19A clade) and the B.1.1 lineage (20B clade). SNV analysis showed 15 novel mutations from which only one was observed in the spike region.

Download Full-text

SARS-CoV-2 genomic analyses in cancer patients reveal elevated intrahost genetic diversity

Virus Evolution ◽

10.1093/ve/veab013 ◽

2021 ◽

Author(s):

Juliana D Siqueira ◽

Livia R Goes ◽

Brunna M Alves ◽

Pedro S de Carvalho ◽

Claudia Cicala ◽

...

Keyword(s):

Genetic Diversity ◽

Phylogenetic Analysis ◽

Cancer Patients ◽

Disease Severity ◽

Healthcare Workers ◽

Viral Population ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Consensus Sequences ◽

Vulnerable Patients

Abstract Numerous factors have been identified to influence susceptibility to SARS-CoV-2 infection and disease severity. Cancer patients are more prone to clinically evolve to more severe COVID-19 conditions, but the determinants of such a more severe outcome remain largely unknown. We have determined the full-length SARS-CoV-2 genomic sequences of cancer patients and healthcare workers (non-cancer controls) by deep sequencing and investigated the within-host viral population of each infection, quantifying intrahost genetic diversity. Naso- and oropharyngeal SARS-CoV-2+ swabs from 57 cancer patients and 14 healthcare workers from the Brazilian National Cancer Institute were collected in April–May 2020. Complete genome amplification using ARTIC network V3 multiplex primers was performed followed by next-generation sequencing. Assemblies were conducted in Geneious R11, where consensus sequences were extracted and intrahost single nucleotide variants were identified. Maximum likelihood phylogenetic analysis was performed using PhyMLv.3.0 and lineages were classified using Pangolin and CoV-GLUE. Phylogenetic analysis showed that all but one strain belonged to clade B1.1. Four genetically linked mutations known as the globally dominant SARS-CoV-2 haplotype (C241T, C3037T, C14408T and A23403G) were found in the majority of consensus sequences. SNV signatures of previously characterized Brazilian genomes were also observed in most samples. Another 85 SNVs were found at a lower frequency (1.4-19.7%) among the consensus sequences. Cancer patients displayed a significantly higher intrahost viral genetic diversity compared to healthcare workers. This difference was independent of SARS-CoV-2 Ct values obtained at the diagnostic tests, which did not differ between the two groups. The most common nucleotide changes of intrahost SNVs in both groups were consistent with APOBEC and ADAR activities. Intrahost genetic diversity in cancer patients was not associated with disease severity, use of corticosteroids, or use of antivirals, characteristics that could influence viral diversity. Moreover, the presence of metastasis, either in general or specifically in the lung, was not associated with intrahost diversity among cancer patients. Cancer patients carried significantly higher numbers of minor variants compared to non-cancer counterparts. Further studies on SARS-CoV-2 diversity in especially vulnerable patients will shed light onto the understanding of the basis of COVID-19 different outcomes in humans.

Download Full-text

A fully phased accurate assembly of an individual human genome

10.1101/855049 ◽

2019 ◽

Cited By ~ 19

Author(s):

David Porubsky ◽

Peter Ebert ◽

Peter A. Audano ◽

Mitchell R. Vollger ◽

William T. Harvey ◽

...

Keyword(s):

Genome Assembly ◽

De Novo ◽

Consensus Sequence ◽

Error Rates ◽

Single Individual ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Consensus Sequences ◽

Oxford Nanopore ◽

Genome Assemblies

The prevailing genome assembly paradigm is to produce consensus sequences that “collapse” parental haplotypes into a consensus sequence. Here, we leverage the chromosome-wide phasing and scaffolding capabilities of single-cell strand sequencing (Strand-seq)1,2 and combine them with high-fidelity (HiFi) long sequencing reads3, in a novel reference-free workflow for diploid de novo genome assembly. Employing this strategy, we produce completely phased de novo genome assemblies separately for each haplotype of a single individual of Puerto Rican origin (HG00733) in the absence of parental data. The assemblies are accurate (QV > 40), highly contiguous (contig N50 > 25 Mbp) with low switch error rates (0.4%) providing fully phased single-nucleotide variants (SNVs), indels, and structural variants (SVs). A comparison of Oxford Nanopore and PacBio phased assemblies identifies 150 regions that are preferential sites of contig breaks irrespective of sequencing technology or phasing algorithms.

Download Full-text

Rapid and economical drug resistance profiling with Nanopore MinION for clinical specimens with low bacillary burden of Mycobacterium tuberculosis

BMC Research Notes ◽

10.1186/s13104-020-05287-9 ◽

2020 ◽

Vol 13 (1) ◽

Author(s):

Wai Sing Chan ◽

Chun Hang Au ◽

Yvonne Chung ◽

Henry Chi Ming Leung ◽

Dona N. Ho ◽

...

Keyword(s):

Drug Resistance ◽

Illumina Miseq ◽

Allelic Frequency ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

Clinical Specimens ◽

Consensus Sequences ◽

Reagent Cost ◽

Sequencing Chemistry ◽

Assay Time

Abstract Objective We designed and tested a Nanopore sequencing panel for direct tuberculosis drug resistance profiling. The panel targeted 10 resistance-associated loci. We assessed the feasibility of amplifying and sequencing these loci from 23 clinical specimens with low bacillary burden. Results At least 8 loci were successfully amplified from the majority for predicting first- and second-line drug resistance (14/23, 60.87%), and the 12 specimens yielding all 10 targets were sequenced with Nanopore MinION and Illumina MiSeq. MinION sequencing data was corrected by Nanopolish and recurrent variants were filtered. A total of 67,082 bases across all consensus sequences were analyzed, with 67,019 bases called by both MinION and MiSeq as wildtype. For the 41 single nucleotide variants (SNVs) called by MiSeq with 100% variant allelic frequency (VAF), 39 (95.1%) were called by MinION. For the 22 mixed bases called by MiSeq, a SNV with the highest VAF (70%) was called by MinION. With short assay time, reasonable reagent cost as well as continuously improving sequencing chemistry and signal correction pipelines, this Nanopore method can be a viable option for direct tuberculosis drug resistance profiling in the near future.

Download Full-text

DeCiFering the Elusive Cancer Cell Fraction in Tumor Heterogeneity and Evolution

10.1101/2021.02.27.429196 ◽

2021 ◽

Author(s):

Gryte Satas ◽

Simone Zaccaria ◽

Mohammed El-Kebir ◽

Benjamin J. Raphael

Keyword(s):

Phylogenetic Analysis ◽

Cancer Cells ◽

Cancer Cell ◽

Copy Number ◽

Tumor Heterogeneity ◽

Cell Fraction ◽

Tumor Evolution ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

Cancer Cell Fraction

AbstractMost tumors are heterogeneous mixtures of normal cells and cancer cells, with individual cancer cells distinguished by somatic mutations that accumulated during the evolution of the tumor. The fundamental quantity used to measure tumor heterogeneity from somatic single-nucleotide variants (SNVs) is the Cancer Cell Fraction (CCF), or proportion of cancer cells that contain the SNV. However, in tumors containing copy-number aberrations (CNAs) – e.g. most solid tumors – the estimation of CCFs from DNA sequencing data is challenging because a CNA may alter the mutation multiplicity, or number of copies of an SNV. Existing methods to estimate CCFs rely on the restrictive Constant Mutation Multiplicity (CMM) assumption that the mutation multiplicity is constant across all tumor cells containing the mutation. However, the CMM assumption is commonly violated in tumors containing CNAs, and thus CCFs computed under the CMM assumption may yield unrealistic conclusions about tumor heterogeneity and evolution. The CCF also has a second limitation for phylogenetic analysis: the CCF measures the presence of a mutation at the present time, but SNVs may be lost during the evolution of a tumor due to deletions of chromosomal segments. Thus, SNVs that co-occur on the same phylogenetic branch may have different CCFs.In this work, we address these limitations of the CCF in two ways. First, we show how to compute the CCF of an SNV under a less restrictive and more realistic assumption called the Single Split Copy Number (SSCN) assumption. Second, we introduce a novel statistic, the descendant cell fraction (DCF), that quantifies both the prevalence of an SNV and the past evolutionary history of SNVs under an evolutionary model that allows for mutation losses. That is, SNVs that co-occur on the same phylogenetic branch will have the same DCF. We implement these ideas in an algorithm named DeCiFer. DeCiFer computes the DCFs of SNVs from read counts and copy-number proportions and also infers clusters of mutations that are suitable for phylogenetic analysis. We show that DeCiFer clusters SNVs more accurately than existing methods on simulated data containing mutation losses. We apply DeCiFer to sequencing data from 49 metastatic prostate cancer samples and show that DeCiFer produces more parsimonious and reasonable reconstructions of tumor evolution compared to previous approaches. Thus, DeCiFer enables more accurate quantification of intra-tumor heterogeneity and improves downstream inference of tumor evolution.Code availabilitySoftware is available at https://github.com/raphael-group/decifer

Download Full-text

VCFCons: a versatile VCF-based consensus sequence generator for small genomes

10.1101/2021.02.26.433111 ◽

2021 ◽

Author(s):

Elizabeth Tseng ◽

Qiandong Zeng ◽

Lax Iyer

Keyword(s):

Consensus Sequence ◽

Low Frequency ◽

Sequencing Data ◽

Consensus Sequences ◽

The Future ◽

Variant Frequency ◽

Viral Surveillance ◽

Sequence Generator ◽

Low Coverage ◽

Robust Consensus

AbstractWe had developed VCFCons to address urgent need for a robust consensus sequence generator for SARS-CoV-2 viral surveillance, which presented several unique requirements, including: (a) low coverage areas should be noted with ‘N’s, (b) low frequency or suspicious variant calls need to be filtered. We have found that, while some existing tools such as bcftools can generate the desired consensus sequence, it required multiple filtering steps and additional scripting. VCFCons can generate consensus sequences based on variant calls in a VCF format with versatile filtering criteria based on coverage and estimated variant frequency. We applied VCFCons to the Labcorp SARS-CoV-2 sequencing data and showed that it generated correct consensus sequences that were successfully submitted to GISAID and NCBI. We hope the community will find value in this tool and aim to continue developing VCFCons to handle more complex viral data in the future.

Download Full-text

Increased yields of duplex sequencing data by a series of quality control tools

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqab002 ◽

2021 ◽

Vol 3 (1) ◽

Author(s):

Gundula Povysil ◽

Monika Heinzl ◽

Renato Salazar ◽

Nicholas Stoler ◽

Anton Nekrutenko ◽

...

Keyword(s):

Low Frequency ◽

Variant Calling ◽

Data Loss ◽

Sequencing Data ◽

Bioinformatics Pipeline ◽

Consensus Sequences ◽

Sequencing Errors ◽

Data Output ◽

Reverse Strand ◽

Duplex Sequencing

Abstract Duplex sequencing is currently the most reliable method to identify ultra-low frequency DNA variants by grouping sequence reads derived from the same DNA molecule into families with information on the forward and reverse strand. However, only a small proportion of reads are assembled into duplex consensus sequences (DCS), and reads with potentially valuable information are discarded at different steps of the bioinformatics pipeline, especially reads without a family. We developed a bioinformatics toolset that analyses the tag and family composition with the purpose to understand data loss and implement modifications to maximize the data output for the variant calling. Specifically, our tools show that tags contain polymerase chain reaction and sequencing errors that contribute to data loss and lower DCS yields. Our tools also identified chimeras, which likely reflect barcode collisions. Finally, we also developed a tool that re-examines variant calls from raw reads and provides different summary data that categorizes the confidence level of a variant call by a tier-based system. With this tool, we can include reads without a family and check the reliability of the call, that increases substantially the sequencing depth for variant calling, a particular important advantage for low-input samples or low-coverage regions.

Download Full-text

Development of a loop-mediated isothermal amplification (LAMP) assay for the identification of the invasive wood borer Aromia bungii (Coleoptera: Cerambycidae) from frass

3 Biotech ◽

10.1007/s13205-020-02602-w ◽

2021 ◽

Vol 11 (2) ◽

Author(s):

Domenico Rizzo ◽

Nicola Luchi ◽

Daniele Da Lio ◽

Linda Bartolini ◽

Francesco Nugnes ◽

...

Keyword(s):

Lamp Assay ◽

Isothermal Amplification ◽

Molecular Diagnostic ◽

Diagnostic Tools ◽

Loop Mediated Isothermal Amplification ◽

Larval Stages ◽

Longhorn Beetle ◽

Rapid Monitoring ◽

Major Pest ◽

Wood Borer

AbstractThe red-necked longhorn beetle Aromia bungii (Faldermann, 1835) (Coleoptera: Cerambycidae) is native to east Asia, where it is a major pest of cultivated and ornamental species of the genus Prunus. Morphological or molecular discrimination of adults or larval specimens is required to identify this invasive wood borer. However, recovering larval stages of the pest from trunks and branches causes extensive damage to plants and is timewasting. An alternative approach consists in applying non-invasive molecular diagnostic tools to biological traces (i.e., fecal pellets, frass). In this way, infestations in host plants can be detected without destructive methods. This paper presents a protocol based on both real-time and visual loop-mediated isothermal amplification (LAMP), using DNA of A. bungii extracted from fecal particles in larval frass. Laboratory validations demonstrated the robustness of the protocols adopted and their reliability was confirmed performing an inter-lab blind panel. The LAMP assay and the qPCR SYBR Green method using the F3/B3 LAMP external primers were equally sensitive, and both were more sensitive than the conventional PCR (sensitivity > 103 to the same starting matrix). The visual LAMP protocol, due to the relatively easy performance of the method, could be a useful tool to apply in rapid monitoring of A. bungii and in the management of its outbreaks.

Download Full-text

The new Internal Transcribed Spacer 2 diagnostic tool clarifies the taxonomic position and geographic distribution of the North American malaria vector Anopheles punctipennis

Malaria Journal ◽

10.1186/s12936-021-03676-4 ◽

2021 ◽

Vol 20 (1) ◽

Author(s):

James M. Hodge ◽

Andrey A. Yurchenko ◽

Dmitriy A. Karagodin ◽

Reem A. Masri ◽

Ryan C. Smith ◽

...

Keyword(s):

Malaria Vector ◽

Internal Transcribed Spacer ◽

Single Species ◽

Its2 Sequence ◽

Pathogen Transmission ◽

Molecular Diagnostic ◽

Diagnostic Tools ◽

Sequence Length ◽

Phylogenetic Position ◽

Internal Transcribed Spacer 2

Abstract Background The malaria mosquito Anopheles punctipennis, a widely distributed species in North America, is capable of transmitting human malaria and is actively involved in the transmission of the ungulate malaria parasite Plasmodium odocoilei. However, molecular diagnostic tools based on Internal Transcribed Spacer 2 (ITS2) of ribosomal DNA are lacking for this species. Anopheles punctipennis is a former member of the Anopheles maculipennis complex but its systematic position remains unclear. Methods In this study, ITS2 sequences were obtained from 276 An. punctipennis specimens collected in the eastern and midwestern United States and a simple and robust Restriction Fragment Length Polymorphism approach for species identification was developed. The maximum-likelihood phylogenetic tree was constructed based on ITS2 sequences available through this study and from GenBank for 20 species of Anopheles. Results The analysis demonstrated a consistent ITS2 sequence length and showed no indications of intragenomic variation among the samples based on ITS2, suggesting that An. punctipennis represents a single species in the studied geographic locations. In this study, An. punctipennis was found in urban, rural, and forest settings, suggesting its potential broad role in pathogen transmission. Phylogeny based on ITS2 sequence comparison demonstrated the close relationship of this species with other members of the Maculipennis group. Conclusions This study developed molecular tools based on ITS2 sequences for the malaria vector An. punctipennis and clarified the phylogenetic position of the species within the Maculipennis group.

Download Full-text

The Complete Chloroplast Genome of the Vulnerable Oreocharis esquirolii (Gesneriaceae): Structural Features, Comparative and Phylogenetic Analysis

Plants ◽

10.3390/plants9121692 ◽

2020 ◽

Vol 9 (12) ◽

pp. 1692

Author(s):

Li Gu ◽

Ting Su ◽

Ming-Tai An ◽

Guo-Xiong Hu

Keyword(s):

Phylogenetic Analysis ◽

Sequence Similarity ◽

Single Copy ◽

Structural Features ◽

Rrna Genes ◽

Trna Genes ◽

Sequencing Data ◽

High Sequence Similarity ◽

Plastid Genomes ◽

Cp Genome

Oreocharis esquirolii, a member of Gesneriaceae, is known as Thamnocharis esquirolii, which has been regarded a synonym of the former. The species is endemic to Guizhou, southwestern China, and is evaluated as vulnerable (VU) under the International Union for Conservation of Nature (IUCN) criteria. Until now, the sequence and genome information of O. esquirolii remains unknown. In this study, we assembled and characterized the complete chloroplast (cp) genome of O. esquirolii using Illumina sequencing data for the first time. The total length of the cp genome was 154,069 bp with a typical quadripartite structure consisting of a pair of inverted repeats (IRs) of 25,392 bp separated by a large single copy region (LSC) of 85,156 bp and a small single copy region (SSC) of18,129 bp. The genome comprised 114 unique genes with 80 protein-coding genes, 30 tRNA genes, and four rRNA genes. Thirty-one repeat sequences and 74 simple sequence repeats (SSRs) were identified. Genome alignment across five plastid genomes of Gesneriaceae indicated a high sequence similarity. Four highly variable sites (rps16-trnQ, trnS-trnG, ndhF-rpl32, and ycf 1) were identified. Phylogenetic analysis indicated that O. esquirolii grouped together with O. mileensis, supporting resurrection of the name Oreocharis esquirolii from Thamnocharisesquirolii. The complete cp genome sequence will contribute to further studies in molecular identification, genetic diversity, and phylogeny.

Download Full-text

LAMP in Neglected Tropical Diseases: A Focus on Parasites

Diagnostics ◽

10.3390/diagnostics11030521 ◽

2021 ◽

Vol 11 (3) ◽

pp. 521

Author(s):

Juan García-Bernalt Diego ◽

Pedro Fernández-Soto ◽

Antonio Muro

Keyword(s):

Neglected Tropical Diseases ◽

Treatment Success ◽

Public Health Problem ◽

Nucleic Acid Amplification ◽

Molecular Diagnostic ◽

Diagnostic Tools ◽

World Population ◽

Major Public Health Problem ◽

Tropical Diseases ◽

Early 21St Century

Neglected Tropical Diseases (NTDs), particularly those caused by parasites, remain a major Public Health problem in tropical and subtropical regions, with 10% of the world population being infected. Their management and control have been traditionally hampered, among other factors, by the difficulty to deploy rapid, specific, and affordable diagnostic tools in low resource settings. This is especially true for complex PCR-based methods. Isothermal nucleic acid amplification techniques, particularly loop-mediated isothermal amplification (LAMP), appeared in the early 21st century as an alternative to PCR, allowing for a much more affordable molecular diagnostic. Here, we present the status of LAMP assays development in parasite-caused NTDs. We address the progress made in different research applications of the technique: xenomonitoring, epidemiological studies, work in animal models and clinical application both for diagnosis and evaluation of treatment success. Finally, we try to shed a light on the improvements needed to achieve a true point-of-care test and the future perspectives in this field.

Download Full-text