GESS: a database of global evaluation of SARS-CoV-2/hCoV-19 sequences

Shuyi Fang; Kailing Li; Jikui Shen; Sheng Liu; Juli Liu; Lei Yang; Chang-Deng Hu; Jun Wan

doi:10.1093/nar/gkaa808

GESS: a database of global evaluation of SARS-CoV-2/hCoV-19 sequences

Nucleic Acids Research ◽

10.1093/nar/gkaa808 ◽

2020 ◽

Vol 49 (D1) ◽

pp. D706-D714 ◽

Cited By ~ 2

Author(s):

Shuyi Fang ◽

Kailing Li ◽

Jikui Shen ◽

Sheng Liu ◽

Juli Liu ◽

...

Keyword(s):

Genomic Region ◽

Comprehensive Analysis ◽

Mutation Rates ◽

Global Evaluation ◽

Single Nucleotide Variants ◽

High Coverage ◽

Single Nucleotide ◽

Genomic Variations ◽

Area Of Interest ◽

The World

Abstract The COVID-19 outbreak has become a global emergency since December 2019. Analysis of SARS-CoV-2 sequences can uncover single nucleotide variants (SNVs) and corresponding evolution patterns. The Global Evaluation of SARS-CoV-2/hCoV-19 Sequences (GESS, https://wan-bioinfo.shinyapps.io/GESS/) is a resource to provide comprehensive analysis results based on tens of thousands of high-coverage and high-quality SARS-CoV-2 complete genomes. The database allows user to browse, search and download SNVs at any individual or multiple SARS-CoV-2 genomic positions, or within a chosen genomic region or protein, or in certain country/area of interest. GESS reveals geographical distributions of SNVs around the world and across the states of USA, while exhibiting time-dependent patterns for SNV occurrences which reflect development of SARS-CoV-2 genomes. For each month, the top 100 SNVs that were firstly identified world-widely can be retrieved. GESS also explores SNVs occurring simultaneously with specific SNVs of user's interests. Furthermore, the database can be of great help to calibrate mutation rates and identify conserved genome regions. Taken together, GESS is a powerful resource and tool to monitor SARS-CoV-2 migration and evolution according to featured genomic variations. It provides potential directive information for prevalence prediction, related public health policy making, and vaccine designs.

Download Full-text

Initial Insights Into the Genetic Epidemiology of SARS-CoV-2 Isolates From Kerala Suggest Local Spread From Limited Introductions

Frontiers in Genetics ◽

10.3389/fgene.2021.630542 ◽

2021 ◽

Vol 12 ◽

Author(s):

Chandni Radhakrishnan ◽

Mohit Kumar Divakar ◽

Abhinav Jain ◽

Prasanth Viswanathan ◽

Rahul C. Bhoyar ◽

...

Keyword(s):

Genetic Epidemiology ◽

Binding Sites ◽

Haplotype Analysis ◽

Rt Pcr ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Functional Variants ◽

Local Spread ◽

Novel Variants ◽

The World

Coronavirus disease 2019 (COVID-19) rapidly spread from a city in China to almost every country in the world, affecting millions of individuals. The rapid increase in the COVID-19 cases in the state of Kerala in India has necessitated the understanding of SARS-CoV-2 genetic epidemiology. We sequenced 200 samples from patients in Kerala using COVIDSeq protocol amplicon-based sequencing. The analysis identified 166 high-quality single-nucleotide variants encompassing four novel variants and 89 new variants in the Indian isolated SARS-CoV-2. Phylogenetic and haplotype analysis revealed that the virus was dominated by three distinct introductions followed by local spread suggesting recent outbreaks and that it belongs to the A2a clade. Further analysis of the functional variants revealed that two variants in the S gene associated with increased infectivity and five variants mapped in primer binding sites affect the efficacy of RT-PCR. To the best of our knowledge, this is the first and most comprehensive report of SARS-CoV-2 genetic epidemiology from Kerala.

Download Full-text

Linked-read analysis identifies mutations in single cell DNA sequencing data

10.1101/211169 ◽

2017 ◽

Cited By ~ 6

Author(s):

Craig L. Bohrson ◽

Allison R. Barton ◽

Michael A. Lodato ◽

Rachel E. Rodin ◽

Vinay Viswanadham ◽

...

Keyword(s):

Single Cell ◽

Dna Isolation ◽

Single Cells ◽

Mutation Rates ◽

Accurate Estimation ◽

Single Cell Level ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

Cell Level ◽

Single Nucleotide

AbstractWhole-genome sequencing of DNA from single cells has the potential to reshape our understanding of the mutational heterogeneity in normal and disease tissues. A major difficulty, however, is distinguishing artifactual mutations that arise from DNA isolation and amplification from true mutations. Here, we describe linked-read analysis (LiRA), a method that utilizes phasing of somatic single nucleotide variants with nearby germline variants to identify true mutations, thereby allowing accurate estimation of somatic mutation rates at the single cell level.

Download Full-text

Distinct genetic spectrums and evolution patterns of SARS-CoV-2

10.1101/2020.06.16.20132902 ◽

2020 ◽

Cited By ~ 5

Author(s):

Sheng Liu ◽

Jikui Shen ◽

Lei Yang ◽

Chang-Deng Hu ◽

Jun Wan

Keyword(s):

Complete Genome ◽

Clustering Method ◽

Single Nucleotide Variants ◽

Genome Sequences ◽

Nucleotide Substitutions ◽

High Quality ◽

High Coverage ◽

Single Nucleotide ◽

Multiple Groups ◽

Over Time

AbstractFour signature groups of single-nucleotide variants (SNVs) were identified using two-way clustering method in about twenty thousand high quality and high coverage SARS-CoV-2 complete genome sequences. Some frequently occurred SNVs predominate but are mutually exclusively presented in patients from different countries and areas. These major SNV signatures exhibited distinguished evolution patterns over time. Although it was rare, our data indicated possible cross-infections with multiple groups of SNVs existed simultaneously in some patients, suggesting infections from different SARS-CoV-2 clades or potential re-combination of SARS-CoV-2 sequences. Interestingly nucleotide substitutions among SARS-CoV-2 genomes tend to occur at the sites where one bat RaTG13 coronavirus sequences differ from Wuhan-Hu-1 genome, indicating the tolerance of mutations on those sites or suggesting that major viral strains might exist between Wuhan-Hu-1 and RaTG13 coronavirus.

Download Full-text

The genetics of Cannabis – genomic variations of key synthases and their effect on cannabinoids content

Genome ◽

10.1139/gen-2020-0087 ◽

2020 ◽

Author(s):

Aparna Singh ◽

Andriy Bilichak ◽

Igor Kovalchuk

Keyword(s):

Cannabis Sativa ◽

Deep Understanding ◽

Biosynthesis Pathway ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Systematic Analysis ◽

Genomic Variations ◽

Coding Regions ◽

The World ◽

History Of

Despite being a controversial crop, Cannabis sativa L. has a long history of cultivation throughout the world. Following recent legalisation in Canada, it is emerging as an important plant for both medicinal and recreational purposes. Recent progress in genome sequencing of both cannabis and hemp varieties allows for systematic analysis of genes coding for enzymes involved in the cannabinoid biosynthesis pathway. Single nucleotide polymorphisms in the coding regions of cannabinoid synthases play important role in determining plant chemotype. Deep understanding of how these variants affect enzymes activity and accumulation of cannabinoids will allow breeding of novel cultivars with desirable cannabinoid profile. Here we present a short overview of the major cannabinoid synthases and present the data on the analysis of their genetic variants and their effect on cannabinoid content using several in-house sequenced Cannabis cultivars.

Download Full-text

Accurate and comprehensive analysis of single nucleotide variants and large deletions of the human mitochondrial genome in DNA and single cells

European Journal of Human Genetics ◽

10.1038/ejhg.2017.129 ◽

2017 ◽

Vol 25 (11) ◽

pp. 1229-1236 ◽

Cited By ~ 7

Author(s):

Filippo Zambelli ◽

Kim Vancampenhout ◽

Dorien Daneels ◽

Daniel Brown ◽

Joke Mertens ◽

...

Keyword(s):

Mitochondrial Genome ◽

Single Cells ◽

Comprehensive Analysis ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Large Deletions ◽

Human Mitochondrial Genome

Download Full-text

Predicting pathogenic single nucleotide variants through a comprehensive analysis on multiple level features

Chemometrics and Intelligent Laboratory Systems ◽

10.1016/j.chemolab.2016.05.012 ◽

2016 ◽

Vol 156 ◽

pp. 224-230

Author(s):

Yiming Wu ◽

Qifan Kuang ◽

Yongcheng Dong ◽

Ziyan Huang ◽

Yan Li ◽

...

Keyword(s):

Comprehensive Analysis ◽

Multiple Level ◽

Single Nucleotide Variants ◽

Single Nucleotide

Download Full-text

Whole-genome sequencing of nine esophageal adenocarcinoma cell lines

F1000Research ◽

10.12688/f1000research.7033.1 ◽

2016 ◽

Vol 5 ◽

pp. 1336 ◽

Cited By ~ 8

Author(s):

Gianmarco Contino ◽

Matthew D. Eldridge ◽

Maria Secrier ◽

Lawrence Bower ◽

Rachael Fels Elliott ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Esophageal Adenocarcinoma ◽

Cell Lines ◽

Genome Sequencing ◽

Sequence Data ◽

Whole Genome ◽

Nucleotide Polymorphisms ◽

Single Nucleotide Variants ◽

High Coverage ◽

Single Nucleotide

Esophageal adenocarcinoma (EAC) is highly mutated and molecularly heterogeneous. The number of cell lines available for study is limited and their genome has been only partially characterized. The availability of an accurate annotation of their mutational landscape is crucial for accurate experimental design and correct interpretation of genotype-phenotype findings. We performed high coverage, paired end whole genome sequencing on eight EAC cell lines—ESO26, ESO51, FLO-1, JH-EsoAd1, OACM5.1 C, OACP4 C, OE33, SK-GT-4—all verified against original patient material, and one esophageal high grade dysplasia cell line, CP-D. We have made available the aligned sequence data and report single nucleotide variants (SNVs), small insertions and deletions (indels), and copy number alterations, identified by comparison with the human reference genome and known single nucleotide polymorphisms (SNPs). We compare these putative mutations to mutations found in primary tissue EAC samples, to inform the use of these cell lines as a model of EAC.

Download Full-text

Estimation of allele-specific fitness effects across human protein-coding sequences and implications for disease

10.1101/441337 ◽

2018 ◽

Cited By ~ 2

Author(s):

Yi-Fei Huang ◽

Adam Siepel

Keyword(s):

Single Nucleotide Variants ◽

High Coverage ◽

Single Nucleotide ◽

Protein Coding ◽

Human Genomics ◽

Coding Sequences ◽

Genomic Features ◽

Fitness Effects ◽

Machine Learning Model ◽

Allele Specific

AbstractA central challenge in human genomics is to understand the cellular, evolutionary, and clinical significance of genetic variants. Here we introduce a unified population-genetic and machine-learning model, called Linear Allele-Specific Selection InferencE (LASSIE), for estimating the fitness effects of all potential single-nucleotide variants, based on polymorphism data and predictive genomic features. We applied LASSIE to 51 high-coverage genome sequences annotated with 33 genomic features, and constructed a map of allele-specific selection coefficients across all protein-coding sequences in the human genome. We show that this map is informative about both human evolution and disease.

Download Full-text

Genomic variants concurrently listed in a somatic and a germline mutation database have implications for disease-variant discovery and genomic privacy

10.1101/450239 ◽

2018 ◽

Author(s):

William Meyerson ◽

Mark Gerstein

Keyword(s):

Germline Mutation ◽

Molecular Mechanisms ◽

Read Depth ◽

General Purpose ◽

Mutation Rates ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Depth Analysis ◽

Genomic Privacy ◽

Human Mutation

AbstractBackgroundMutations arise in the human genome in two major settings: the germline and soma. These settings involve different inheritance patterns, chromatin structures, and environmental exposures, all of which might be predicted to differentially affect the distribution of substitutions found in these settings. Nonetheless, recent studies have found that somatic and germline mutation rates are similarly affected by endogenous mutational processes and epigenetic factors.ResultsHere, we quantified the number of single nucleotide variants that co-occur between somatic and germline call-sets (cSNVs), compared this quantity with expectations, and explained noted departures. We found that three times as many variants are shared between the soma and germline than is expected by independence. We developed a new, general-purpose statistical framework to explain the observed excess of cSNVs in terms of the varying mutation rates of different kinds substitution types and of genomic regions. Using this metric, we find that more than 90% of this excess can be explained by our observation that the basic substitution types (such as N[C->T]G, C->A, etc.) have correlated mutation rates in the germline and soma. Matched-normal read depth analysis suggests that an appreciable fraction of this excess may also derive from germline contamination of somatic samples.ConclusionOverall, our results highlight the commonalities in substitution patterns between the germline and soma. The universality of some aspects of human mutation rates offers insight into the potential molecular mechanisms of human mutation. The highlighted similarities between somatic and germline mutation rates also lay the groundwork for future studies that distinguish disease-causing variants from a genomic background informed by both somatic and germline variant data. Moreover, our results also indicate that the depth of matched normal sequencing necessary to ensure genomic privacy of donors of somatic samples may be higher than previously appreciated. Furthermore, the fact that we were able to explain such a high portion of recurrent variants using known determinants of mutation rates is evidence that the genomics community has already discovered the most important predictors of mutation rates for single nucleotide variants.

Download Full-text

Comprehensive Analysis to Improve the Validation Rate for Single Nucleotide Variants Detected by Next-Generation Sequencing

PLoS ONE ◽

10.1371/journal.pone.0086664 ◽

2014 ◽

Vol 9 (1) ◽

pp. e86664 ◽

Cited By ~ 17

Author(s):

Mi-Hyun Park ◽

Hwanseok Rhee ◽

Jung Hoon Park ◽

Hae-Mi Woo ◽

Byung-Ok Choi ◽

...

Keyword(s):

Next Generation Sequencing ◽

Comprehensive Analysis ◽

Next Generation ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Validation Rate ◽

Generation Sequencing

Download Full-text