scholarly journals GESS: a database of global evaluation of SARS-CoV-2/hCoV-19 sequences

2020 ◽  
Vol 49 (D1) ◽  
pp. D706-D714 ◽  
Author(s):  
Shuyi Fang ◽  
Kailing Li ◽  
Jikui Shen ◽  
Sheng Liu ◽  
Juli Liu ◽  
...  

Abstract The COVID-19 outbreak has become a global emergency since December 2019. Analysis of SARS-CoV-2 sequences can uncover single nucleotide variants (SNVs) and corresponding evolution patterns. The Global Evaluation of SARS-CoV-2/hCoV-19 Sequences (GESS, https://wan-bioinfo.shinyapps.io/GESS/) is a resource to provide comprehensive analysis results based on tens of thousands of high-coverage and high-quality SARS-CoV-2 complete genomes. The database allows user to browse, search and download SNVs at any individual or multiple SARS-CoV-2 genomic positions, or within a chosen genomic region or protein, or in certain country/area of interest. GESS reveals geographical distributions of SNVs around the world and across the states of USA, while exhibiting time-dependent patterns for SNV occurrences which reflect development of SARS-CoV-2 genomes. For each month, the top 100 SNVs that were firstly identified world-widely can be retrieved. GESS also explores SNVs occurring simultaneously with specific SNVs of user's interests. Furthermore, the database can be of great help to calibrate mutation rates and identify conserved genome regions. Taken together, GESS is a powerful resource and tool to monitor SARS-CoV-2 migration and evolution according to featured genomic variations. It provides potential directive information for prevalence prediction, related public health policy making, and vaccine designs.

2021 ◽  
Vol 12 ◽  
Author(s):  
Chandni Radhakrishnan ◽  
Mohit Kumar Divakar ◽  
Abhinav Jain ◽  
Prasanth Viswanathan ◽  
Rahul C. Bhoyar ◽  
...  

Coronavirus disease 2019 (COVID-19) rapidly spread from a city in China to almost every country in the world, affecting millions of individuals. The rapid increase in the COVID-19 cases in the state of Kerala in India has necessitated the understanding of SARS-CoV-2 genetic epidemiology. We sequenced 200 samples from patients in Kerala using COVIDSeq protocol amplicon-based sequencing. The analysis identified 166 high-quality single-nucleotide variants encompassing four novel variants and 89 new variants in the Indian isolated SARS-CoV-2. Phylogenetic and haplotype analysis revealed that the virus was dominated by three distinct introductions followed by local spread suggesting recent outbreaks and that it belongs to the A2a clade. Further analysis of the functional variants revealed that two variants in the S gene associated with increased infectivity and five variants mapped in primer binding sites affect the efficacy of RT-PCR. To the best of our knowledge, this is the first and most comprehensive report of SARS-CoV-2 genetic epidemiology from Kerala.


2017 ◽  
Author(s):  
Craig L. Bohrson ◽  
Allison R. Barton ◽  
Michael A. Lodato ◽  
Rachel E. Rodin ◽  
Vinay Viswanadham ◽  
...  

AbstractWhole-genome sequencing of DNA from single cells has the potential to reshape our understanding of the mutational heterogeneity in normal and disease tissues. A major difficulty, however, is distinguishing artifactual mutations that arise from DNA isolation and amplification from true mutations. Here, we describe linked-read analysis (LiRA), a method that utilizes phasing of somatic single nucleotide variants with nearby germline variants to identify true mutations, thereby allowing accurate estimation of somatic mutation rates at the single cell level.


Author(s):  
Sheng Liu ◽  
Jikui Shen ◽  
Lei Yang ◽  
Chang-Deng Hu ◽  
Jun Wan

AbstractFour signature groups of single-nucleotide variants (SNVs) were identified using two-way clustering method in about twenty thousand high quality and high coverage SARS-CoV-2 complete genome sequences. Some frequently occurred SNVs predominate but are mutually exclusively presented in patients from different countries and areas. These major SNV signatures exhibited distinguished evolution patterns over time. Although it was rare, our data indicated possible cross-infections with multiple groups of SNVs existed simultaneously in some patients, suggesting infections from different SARS-CoV-2 clades or potential re-combination of SARS-CoV-2 sequences. Interestingly nucleotide substitutions among SARS-CoV-2 genomes tend to occur at the sites where one bat RaTG13 coronavirus sequences differ from Wuhan-Hu-1 genome, indicating the tolerance of mutations on those sites or suggesting that major viral strains might exist between Wuhan-Hu-1 and RaTG13 coronavirus.


Genome ◽  
2020 ◽  
Author(s):  
Aparna Singh ◽  
Andriy Bilichak ◽  
Igor Kovalchuk

Despite being a controversial crop, Cannabis sativa L. has a long history of cultivation throughout the world. Following recent legalisation in Canada, it is emerging as an important plant for both medicinal and recreational purposes. Recent progress in genome sequencing of both cannabis and hemp varieties allows for systematic analysis of genes coding for enzymes involved in the cannabinoid biosynthesis pathway. Single nucleotide polymorphisms in the coding regions of cannabinoid synthases play important role in determining plant chemotype. Deep understanding of how these variants affect enzymes activity and accumulation of cannabinoids will allow breeding of novel cultivars with desirable cannabinoid profile. Here we present a short overview of the major cannabinoid synthases and present the data on the analysis of their genetic variants and their effect on cannabinoid content using several in-house sequenced Cannabis cultivars.


F1000Research ◽  
2016 ◽  
Vol 5 ◽  
pp. 1336 ◽  
Author(s):  
Gianmarco Contino ◽  
Matthew D. Eldridge ◽  
Maria Secrier ◽  
Lawrence Bower ◽  
Rachael Fels Elliott ◽  
...  

Esophageal adenocarcinoma (EAC) is highly mutated and molecularly heterogeneous. The number of cell lines available for study is limited and their genome has been only partially characterized. The availability of an accurate annotation of their mutational landscape is crucial for accurate experimental design and correct interpretation of genotype-phenotype findings. We performed high coverage, paired end whole genome sequencing on eight EAC cell lines—ESO26, ESO51, FLO-1, JH-EsoAd1, OACM5.1 C, OACP4 C, OE33, SK-GT-4—all verified against original patient material, and one esophageal high grade dysplasia cell line, CP-D. We have made available the aligned sequence data and report single nucleotide variants (SNVs), small insertions and deletions (indels), and copy number alterations, identified by comparison with the human reference genome and known single nucleotide polymorphisms (SNPs). We compare these putative mutations to mutations found in primary tissue EAC samples, to inform the use of these cell lines as a model of EAC.


2018 ◽  
Author(s):  
Yi-Fei Huang ◽  
Adam Siepel

AbstractA central challenge in human genomics is to understand the cellular, evolutionary, and clinical significance of genetic variants. Here we introduce a unified population-genetic and machine-learning model, called Linear Allele-Specific Selection InferencE (LASSIE), for estimating the fitness effects of all potential single-nucleotide variants, based on polymorphism data and predictive genomic features. We applied LASSIE to 51 high-coverage genome sequences annotated with 33 genomic features, and constructed a map of allele-specific selection coefficients across all protein-coding sequences in the human genome. We show that this map is informative about both human evolution and disease.


2018 ◽  
Author(s):  
William Meyerson ◽  
Mark Gerstein

AbstractBackgroundMutations arise in the human genome in two major settings: the germline and soma. These settings involve different inheritance patterns, chromatin structures, and environmental exposures, all of which might be predicted to differentially affect the distribution of substitutions found in these settings. Nonetheless, recent studies have found that somatic and germline mutation rates are similarly affected by endogenous mutational processes and epigenetic factors.ResultsHere, we quantified the number of single nucleotide variants that co-occur between somatic and germline call-sets (cSNVs), compared this quantity with expectations, and explained noted departures. We found that three times as many variants are shared between the soma and germline than is expected by independence. We developed a new, general-purpose statistical framework to explain the observed excess of cSNVs in terms of the varying mutation rates of different kinds substitution types and of genomic regions. Using this metric, we find that more than 90% of this excess can be explained by our observation that the basic substitution types (such as N[C->T]G, C->A, etc.) have correlated mutation rates in the germline and soma. Matched-normal read depth analysis suggests that an appreciable fraction of this excess may also derive from germline contamination of somatic samples.ConclusionOverall, our results highlight the commonalities in substitution patterns between the germline and soma. The universality of some aspects of human mutation rates offers insight into the potential molecular mechanisms of human mutation. The highlighted similarities between somatic and germline mutation rates also lay the groundwork for future studies that distinguish disease-causing variants from a genomic background informed by both somatic and germline variant data. Moreover, our results also indicate that the depth of matched normal sequencing necessary to ensure genomic privacy of donors of somatic samples may be higher than previously appreciated. Furthermore, the fact that we were able to explain such a high portion of recurrent variants using known determinants of mutation rates is evidence that the genomics community has already discovered the most important predictors of mutation rates for single nucleotide variants.


Sign in / Sign up

Export Citation Format

Share Document