scholarly journals Genomic variants concurrently listed in a somatic and a germline mutation database have implications for disease-variant discovery and genomic privacy

2018 ◽  
Author(s):  
William Meyerson ◽  
Mark Gerstein

AbstractBackgroundMutations arise in the human genome in two major settings: the germline and soma. These settings involve different inheritance patterns, chromatin structures, and environmental exposures, all of which might be predicted to differentially affect the distribution of substitutions found in these settings. Nonetheless, recent studies have found that somatic and germline mutation rates are similarly affected by endogenous mutational processes and epigenetic factors.ResultsHere, we quantified the number of single nucleotide variants that co-occur between somatic and germline call-sets (cSNVs), compared this quantity with expectations, and explained noted departures. We found that three times as many variants are shared between the soma and germline than is expected by independence. We developed a new, general-purpose statistical framework to explain the observed excess of cSNVs in terms of the varying mutation rates of different kinds substitution types and of genomic regions. Using this metric, we find that more than 90% of this excess can be explained by our observation that the basic substitution types (such as N[C->T]G, C->A, etc.) have correlated mutation rates in the germline and soma. Matched-normal read depth analysis suggests that an appreciable fraction of this excess may also derive from germline contamination of somatic samples.ConclusionOverall, our results highlight the commonalities in substitution patterns between the germline and soma. The universality of some aspects of human mutation rates offers insight into the potential molecular mechanisms of human mutation. The highlighted similarities between somatic and germline mutation rates also lay the groundwork for future studies that distinguish disease-causing variants from a genomic background informed by both somatic and germline variant data. Moreover, our results also indicate that the depth of matched normal sequencing necessary to ensure genomic privacy of donors of somatic samples may be higher than previously appreciated. Furthermore, the fact that we were able to explain such a high portion of recurrent variants using known determinants of mutation rates is evidence that the genomics community has already discovered the most important predictors of mutation rates for single nucleotide variants.

Epigenomics ◽  
2020 ◽  
Vol 12 (18) ◽  
pp. 1633-1650
Author(s):  
Xi Xu ◽  
Chaoju Gong ◽  
Yunfeng Wang ◽  
Yanyan Hu ◽  
Hong Liu ◽  
...  

Aim: We aim to identify driving genes of colorectal cancer (CRC) through multi-omics analysis. Materials & methods: We downloaded multi-omics data of CRC from The Cancer Genome Atlas dataset. Integrative analysis of single-nucleotide variants, copy number variations, DNA methylation and differentially expressed genes identified candidate genes that carry CRC risk. Kernal genes were extracted from the weighted gene co-expression network analysis. A competing endogenous RNA network composed of CRC-related genes was constructed. Biological roles of genes were further investigated in vitro. Results: We identified LRRC26 and REP15 as novel prognosis-related driving genes for CRC. LRRC26 hindered tumorigenesis of CRC in vitro. Conclusion: Our study identified novel driving genes and may provide new insights into the molecular mechanisms of CRC.


2020 ◽  
Vol 49 (D1) ◽  
pp. D706-D714 ◽  
Author(s):  
Shuyi Fang ◽  
Kailing Li ◽  
Jikui Shen ◽  
Sheng Liu ◽  
Juli Liu ◽  
...  

Abstract The COVID-19 outbreak has become a global emergency since December 2019. Analysis of SARS-CoV-2 sequences can uncover single nucleotide variants (SNVs) and corresponding evolution patterns. The Global Evaluation of SARS-CoV-2/hCoV-19 Sequences (GESS, https://wan-bioinfo.shinyapps.io/GESS/) is a resource to provide comprehensive analysis results based on tens of thousands of high-coverage and high-quality SARS-CoV-2 complete genomes. The database allows user to browse, search and download SNVs at any individual or multiple SARS-CoV-2 genomic positions, or within a chosen genomic region or protein, or in certain country/area of interest. GESS reveals geographical distributions of SNVs around the world and across the states of USA, while exhibiting time-dependent patterns for SNV occurrences which reflect development of SARS-CoV-2 genomes. For each month, the top 100 SNVs that were firstly identified world-widely can be retrieved. GESS also explores SNVs occurring simultaneously with specific SNVs of user's interests. Furthermore, the database can be of great help to calibrate mutation rates and identify conserved genome regions. Taken together, GESS is a powerful resource and tool to monitor SARS-CoV-2 migration and evolution according to featured genomic variations. It provides potential directive information for prevalence prediction, related public health policy making, and vaccine designs.


2017 ◽  
Author(s):  
Craig L. Bohrson ◽  
Allison R. Barton ◽  
Michael A. Lodato ◽  
Rachel E. Rodin ◽  
Vinay Viswanadham ◽  
...  

AbstractWhole-genome sequencing of DNA from single cells has the potential to reshape our understanding of the mutational heterogeneity in normal and disease tissues. A major difficulty, however, is distinguishing artifactual mutations that arise from DNA isolation and amplification from true mutations. Here, we describe linked-read analysis (LiRA), a method that utilizes phasing of somatic single nucleotide variants with nearby germline variants to identify true mutations, thereby allowing accurate estimation of somatic mutation rates at the single cell level.


Author(s):  
Sergey Abramov ◽  
Alexandr Boytsov ◽  
Dariia Bykova ◽  
Dmitry D. Penzar ◽  
Ivan Yevshin ◽  
...  

AbstractSequence variants in gene regulatory regions alter gene expression and contribute to phenotypes of individual cells and the whole organism, including disease susceptibility and progression. Single-nucleotide variants in enhancers or promoters may affect gene transcription by altering transcription factor binding sites. Differential transcription factor binding in heterozygous genomic loci provides a natural source of information on such regulatory variants. We present a novel approach to call the allele-specific transcription factor binding events at single-nucleotide variants in ChIP-Seq data, taking into account the joint contribution of aneuploidy and local copy number variation, that is estimated directly from variant calls. We have conducted a meta-analysis of more than 7 thousand ChIP-Seq experiments and assembled the database of allele-specific binding events listing more than half a million entries at nearly 270 thousand single-nucleotide polymorphisms for several hundred human transcription factors and cell types. These polymorphisms are enriched for associations with phenotypes of medical relevance and often overlap eQTLs, making candidates for causality by linking variants with molecular mechanisms. Specifically, there is a special class of switching sites, where different transcription factors preferably bind alternative alleles, thus revealing allele-specific rewiring of molecular circuitry.


Author(s):  
Zhiying Zhang ◽  
Lifeng Ma ◽  
Xiaowei Fan ◽  
Kun Wang ◽  
Lijun Liu ◽  
...  

AbstractHigh-altitude polycythemia (HAPC) is characterized by excessive proliferation of erythrocytes, resulting from the hypobaric hypoxia condition in high altitude. The genetic variants and molecular mechanisms of HAPC remain unclear in highlanders. We recruited 141 Tibetan dwellers, including 70 HAPC patients and 71 healthy controls, to detect the possible genetic variants associated with the disease; and performed targeted sequencing on 529 genes associated with the oxygen metabolism and erythrocyte regulation, utilized unconditional logistic regression analysis and GO (gene ontology) analysis to investigate the genetic variations of HAPC. We identified 12 single nucleotide variants, harbored in 12 genes, associated with the risk of HAPC (4.7 ≤ odd ratios ≤ 13.6; 7.6E − 08 ≤ p-value ≤ 1E − 04). The pathway enrichment study of these genes indicated the three pathways, the PI3K-AKT pathway, JAK-STAT pathway, and HIF-1 pathway, are essential, which p-values as 3.70E − 08, 1.28 E − 07, and 3.98 E − 06, respectively. We are hopeful that our results will provide a reference for the etiology research of HAPC. However, additional genetic risk factors and functional investigations are necessary to confirm our results further.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Sergey Abramov ◽  
Alexandr Boytsov ◽  
Daria Bykova ◽  
Dmitry D. Penzar ◽  
Ivan Yevshin ◽  
...  

AbstractSequence variants in gene regulatory regions alter gene expression and contribute to phenotypes of individual cells and the whole organism, including disease susceptibility and progression. Single-nucleotide variants in enhancers or promoters may affect gene transcription by altering transcription factor binding sites. Differential transcription factor binding in heterozygous genomic loci provides a natural source of information on such regulatory variants. We present a novel approach to call the allele-specific transcription factor binding events at single-nucleotide variants in ChIP-Seq data, taking into account the joint contribution of aneuploidy and local copy number variation, that is estimated directly from variant calls. We have conducted a meta-analysis of more than 7 thousand ChIP-Seq experiments and assembled the database of allele-specific binding events listing more than half a million entries at nearly 270 thousand single-nucleotide polymorphisms for several hundred human transcription factors and cell types. These polymorphisms are enriched for associations with phenotypes of medical relevance and often overlap eQTLs, making candidates for causality by linking variants with molecular mechanisms. Specifically, there is a special class of switching sites, where different transcription factors preferably bind alternative alleles, thus revealing allele-specific rewiring of molecular circuitry.


2018 ◽  
Author(s):  
Julie Feusier ◽  
W. Scott Watkins ◽  
Jainy Thomas ◽  
Andrew Farrell ◽  
David J. Witherspoon ◽  
...  

AbstractGermline mutation rates in humans have been estimated for a variety of mutation types, including single nucleotide and large structural variants. Here we directly measure the germline retrotransposition rate for the three active retrotransposon elements: L1, Alu, and SVA. We utilized three tools for calling Mobile Element Insertions (MEIs) (MELT, RUFUS, and TranSurVeyor) on blood-derived whole genome sequence (WGS) data from 603 CEPH individuals, comprising 33 three-generation pedigrees. We identified 27 de novo MEIs in 440 births. The retrotransposition rate estimates for Alu elements, one in 40, is roughly half the rate estimated using phylogenetic analyses, a difference in magnitude similar to that observed for single nucleotide variants. The L1 retrotransposition rate is one in 62 births and is within range of previous estimates (1:20-1:200 births). The SVA retrotransposition rate, one in 55 births, is much higher than the previous estimate of one in 900 births. Our large, three-generation pedigrees allowed us to assess parent-of-origin effects and the timing of insertion events in either gametogenesis or early embryonic development. We find a statistically significant paternal bias in Alu retrotransposition. Our study represents the first in-depth analysis of the rate and dynamics of human retrotransposition from WGS data in three-generation human pedigrees.


Database ◽  
2020 ◽  
Vol 2020 ◽  
Author(s):  
Lianming Du ◽  
Tao Guo ◽  
Qin Liu ◽  
Jing Li ◽  
Xiuyue Zhang ◽  
...  

Abstract Macaques are the most widely used non-human primates in biomedical research. The genetic divergence between these animal models is responsible for their phenotypic differences in response to certain diseases. However, the macaque single nucleotide polymorphism resources mainly focused on rhesus macaque (Macaca mulatta), which hinders the broad research and biomedical application of other macaques. In order to overcome these limitations, we constructed a database named MACSNVdb that focuses on the interspecies genetic diversity among macaque genomes. MACSNVdb is a web-enabled database comprising ~74.51 million high-quality non-redundant single nucleotide variants (SNVs) identified among 20 macaque individuals from six species groups (muttla, fascicularis, sinica, arctoides, silenus, sylvanus). In addition to individual SNVs, MACSNVdb also allows users to browse and retrieve groups of user-defined SNVs. In particular, users can retrieve non-synonymous SNVs that may have deleterious effects on protein structure or function within macaque orthologs of human disease and drug-target genes. Besides position, alleles and flanking sequences, MACSNVdb integrated additional genomic information including SNV annotations and gene functional annotations. MACSNVdb will facilitate biomedical researchers to discover molecular mechanisms of diverse responses to diseases as well as primatologist to perform population genetic studies. We will continue updating MACSNVdb with newly available sequencing data and annotation to keep the resource up to date. Database URL: http://big.cdu.edu.cn/macsnvdb/


2020 ◽  
Author(s):  
Lauri Törmä ◽  
Claire Burny ◽  
Christian Schlötterer

AbstractSex biases in mutation rates may affect the rate of adaptive evolution. In many species, males have higher mutation rates than females when single nucleotide variants (SNVs) are considered. In contrast, indel mutations in humans and chimpanzees are female-biased. In Drosophila melanogaster, direct estimates of mutation rates did not uncover sex differences, but a recent analysis suggested the presence of male-biased SNVs mutations. Here we study the sex-specific mutation processes using mutation accumulation data from mismatch-repair deficient D. melanogaster. We find that sex differences in flies are similar to the ones observed in humans: a higher mutation rate for SNVs in males and a higher indel rate in females. These results have major implications for the study of neutral variation and adaptation in Drosophila.


Sign in / Sign up

Export Citation Format

Share Document