Estimation of allele-specific fitness effects across human protein-coding sequences and implications for disease

Mapping Intimacies ◽

10.1101/441337 ◽

2018 ◽

Cited By ~ 2

Author(s):

Yi-Fei Huang ◽

Adam Siepel

Keyword(s):

Single Nucleotide Variants ◽

High Coverage ◽

Single Nucleotide ◽

Protein Coding ◽

Human Genomics ◽

Coding Sequences ◽

Genomic Features ◽

Fitness Effects ◽

Machine Learning Model ◽

Allele Specific

AbstractA central challenge in human genomics is to understand the cellular, evolutionary, and clinical significance of genetic variants. Here we introduce a unified population-genetic and machine-learning model, called Linear Allele-Specific Selection InferencE (LASSIE), for estimating the fitness effects of all potential single-nucleotide variants, based on polymorphism data and predictive genomic features. We applied LASSIE to 51 high-coverage genome sequences annotated with 33 genomic features, and constructed a map of allele-specific selection coefficients across all protein-coding sequences in the human genome. We show that this map is informative about both human evolution and disease.

Download Full-text

Tumour mutations in long noncoding RNAs that enhance cell fitness

10.1101/2021.11.06.467555 ◽

2021 ◽

Author(s):

Roberta Esposito ◽

Andres Lanzos ◽

Taisia Polidori ◽

Hugo Guillen-Ramirez ◽

Bernard Merlin ◽

...

Keyword(s):

Noncoding Rnas ◽

Long Noncoding Rnas ◽

Driver Mutations ◽

Cancer Genes ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Protein Coding ◽

Genomic Features ◽

Coding Regions ◽

Lncrna Neat1

Tumour DNA contains thousands of single nucleotide variants (SNVs) in non-protein-coding regions, yet it remains unclear which are driver mutations that promote cell fitness. Amongst the most highly mutated non-coding elements are long noncoding RNAs (lncRNAs), which can promote cancer and may be targeted therapeutically. We here searched for evidence that driver mutations may act through alteration of lncRNA function. Using an integrative driver discovery algorithm, we analysed single nucleotide variants (SNVs) from 2583 primary tumours and 3527 metastases to reveal 54 candidate driver lncRNAs (FDR<0.1). Their relevance is supported by enrichment for previously-reported cancer genes and by clinical and genomic features. Using knockdown and transgene overexpression, we show that tumour SNVs in two novel lncRNAs can boost cell fitness. Researchers have noted particularly high yet unexplained mutation rates in the iconic cancer lncRNA, NEAT1. We apply in cellulo mutagenesis by CRISPR-Cas9 to identify vulnerable regions of NEAT1 where SNVs reproducibly increase cell fitness in both transformed and normal backgrounds. In particular, mutations in the 5-prime region of NEAT1 alter ribonucleoprotein assembly and boost the population of subnuclear paraspeckles. Together, this work reveals function-altering somatic lncRNA mutations as a new route to enhanced cell fitness during transformation and metastasis.

Download Full-text

Estimation of allele-specific fitness effects across human protein-coding sequences and implications for disease

Genome Research ◽

10.1101/gr.245522.118 ◽

2019 ◽

Vol 29 (8) ◽

pp. 1310-1321 ◽

Cited By ~ 9

Author(s):

Yi-Fei Huang ◽

Adam Siepel

Keyword(s):

Human Protein ◽

Protein Coding ◽

Coding Sequences ◽

Fitness Effects ◽

Allele Specific

Download Full-text

AsCRISPR: a web server for allele-specific sgRNA design in precision medicine

10.1101/672634 ◽

2019 ◽

Author(s):

Guihu Zhao ◽

Jinchen Li ◽

Yu Tang

Keyword(s):

Nucleotide Polymorphisms ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Guide Rna ◽

Inherited Diseases ◽

Bioinformatic Tools ◽

Point Of Entry ◽

Allele Specific ◽

Sgrna Design ◽

Specific Restriction

AbstractAllele-specific genomic targeting by CRISPR provides a point of entry for personalized gene therapy of dominantly inherited diseases, by selectively disrupting the mutant alleles or disease-causing single nucleotide polymorphisms (SNPs), ideally while leaving normal alleles intact. Moreover, the allele-specific engineering has been increasingly exploited not only in treating inherited diseases and mutation-driven cancers, but also in other important fields such as genome imprinting, haploinsufficiency, genome loci imaging and immunocompatible manipulations. Despite the tremendous utilities of allele-specific targeting by CRISPR, very few bioinformatic tools have been implemented for the allele-specific purpose. We thus developed AsCRISPR (Allele-specific CRISPR), a web tool to aid the design of guide RNA (gRNA) sequences that can discriminate between alleles. It provides users with limited bioinformatics skills to analyze both their own identified variants and heterozygous SNPs deposited in the dbSNP database. Multiple CRISPR nucleases and their engineered variants including newly-developed Cas12b and CasX are included for users’ choice. Meanwhile, AsCRISPR evaluates the on-target efficiencies, specificities and potential off-targets of gRNA candidates, and also displays the allele-specific restriction enzyme sites that might be disrupted upon successful genome edits. In addition, AsCRISPR analyzed with dominant single nucleotide variants (SNVs) retrieved from ClinVar and OMIM databases, and generated a Dominant Database of candidate discriminating gRNAs that may specifically target the alternative allele for each dominant SNV site. A Validated Database was also established, which manually curated the discriminating gRNAs that were experimentally validated in the mounting literatures. AsCRISPR is freely available at http://www.genemed.tech/ascrispr.

Download Full-text

Interpreting Deep Neural Networks Beyond Attribution Methods: Quantifying Global Importance of Genomic Features

10.1101/2020.02.19.956896 ◽

2020 ◽

Cited By ~ 1

Author(s):

Peter K. Koo ◽

Matt Ploenzke

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Population Level ◽

Computational Genomics ◽

Great Success ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Genomic Features ◽

High Performing ◽

Importance Analysis

AbstractDespite deep neural networks (DNNs) having found great success at improving performance on various prediction tasks in computational genomics, it remains difficult to understand why they make any given prediction. In genomics, the main approaches to interpret a high-performing DNN are to visualize learned representations via weight visualizations and attribution methods. While these methods can be informative, each has strong limitations. For instance, attribution methods only uncover the independent contribution of single nucleotide variants in a given sequence. Here we discuss and argue for global importance analysis which can quantify population-level importance of putative features and their interactions learned by a DNN. We highlight recent work that has benefited from this interpretability approach and then discuss connections between global importance analysis and causality.

Download Full-text

dbMTS: a comprehensive database of putative human microRNA target site SNVs and their functional predictions

10.1101/554485 ◽

2019 ◽

Cited By ~ 2

Author(s):

Chang Li ◽

Michael D. Swartz ◽

Bing Yu ◽

Yongsheng Bai ◽

Xiaoming Liu

Keyword(s):

Target Site ◽

Genetic Mutations ◽

Messenger Rnas ◽

Microrna Target ◽

Functional Importance ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Protein Coding ◽

Functional Annotations ◽

Non Coding Rnas

AbstractmicroRNAs (miRNAs) are short non-coding RNAs that can repress the expression of protein coding messenger RNAs (mRNAs) by binding to the 3’UTR of the target. Genetic mutations such as single nucleotide variants (SNVs) in the 3’UTR of the mRNAs can disrupt this regulatory effect. In this study, we presented dbMTS, the database for miRNA target site (MTS) SNVs, which includes all potential MTS SNVs in the 3’UTR of human genome along with hundreds of functional annotations. This database can help studies easily identify putative SNVs that affect miRNA targeting and facilitate the prioritization of their functional importance. dbMTS is freely available at: https://sites.google.com/site/jpopgen/dbNSFP.

Download Full-text

GESS: a database of global evaluation of SARS-CoV-2/hCoV-19 sequences

Nucleic Acids Research ◽

10.1093/nar/gkaa808 ◽

2020 ◽

Vol 49 (D1) ◽

pp. D706-D714 ◽

Cited By ~ 2

Author(s):

Shuyi Fang ◽

Kailing Li ◽

Jikui Shen ◽

Sheng Liu ◽

Juli Liu ◽

...

Keyword(s):

Genomic Region ◽

Comprehensive Analysis ◽

Mutation Rates ◽

Global Evaluation ◽

Single Nucleotide Variants ◽

High Coverage ◽

Single Nucleotide ◽

Genomic Variations ◽

Area Of Interest ◽

The World

Abstract The COVID-19 outbreak has become a global emergency since December 2019. Analysis of SARS-CoV-2 sequences can uncover single nucleotide variants (SNVs) and corresponding evolution patterns. The Global Evaluation of SARS-CoV-2/hCoV-19 Sequences (GESS, https://wan-bioinfo.shinyapps.io/GESS/) is a resource to provide comprehensive analysis results based on tens of thousands of high-coverage and high-quality SARS-CoV-2 complete genomes. The database allows user to browse, search and download SNVs at any individual or multiple SARS-CoV-2 genomic positions, or within a chosen genomic region or protein, or in certain country/area of interest. GESS reveals geographical distributions of SNVs around the world and across the states of USA, while exhibiting time-dependent patterns for SNV occurrences which reflect development of SARS-CoV-2 genomes. For each month, the top 100 SNVs that were firstly identified world-widely can be retrieved. GESS also explores SNVs occurring simultaneously with specific SNVs of user's interests. Furthermore, the database can be of great help to calibrate mutation rates and identify conserved genome regions. Taken together, GESS is a powerful resource and tool to monitor SARS-CoV-2 migration and evolution according to featured genomic variations. It provides potential directive information for prevalence prediction, related public health policy making, and vaccine designs.

Download Full-text

Landscape of allele-specific transcription factor binding in the human genome

10.1101/2020.10.07.327643 ◽

2020 ◽

Cited By ~ 1

Author(s):

Sergey Abramov ◽

Alexandr Boytsov ◽

Dariia Bykova ◽

Dmitry D. Penzar ◽

Ivan Yevshin ◽

...

Keyword(s):

Transcription Factor ◽

Transcription Factors ◽

Molecular Mechanisms ◽

Specific Binding ◽

Transcription Factor Binding ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Specific Transcription Factor ◽

Factor Binding ◽

Allele Specific

AbstractSequence variants in gene regulatory regions alter gene expression and contribute to phenotypes of individual cells and the whole organism, including disease susceptibility and progression. Single-nucleotide variants in enhancers or promoters may affect gene transcription by altering transcription factor binding sites. Differential transcription factor binding in heterozygous genomic loci provides a natural source of information on such regulatory variants. We present a novel approach to call the allele-specific transcription factor binding events at single-nucleotide variants in ChIP-Seq data, taking into account the joint contribution of aneuploidy and local copy number variation, that is estimated directly from variant calls. We have conducted a meta-analysis of more than 7 thousand ChIP-Seq experiments and assembled the database of allele-specific binding events listing more than half a million entries at nearly 270 thousand single-nucleotide polymorphisms for several hundred human transcription factors and cell types. These polymorphisms are enriched for associations with phenotypes of medical relevance and often overlap eQTLs, making candidates for causality by linking variants with molecular mechanisms. Specifically, there is a special class of switching sites, where different transcription factors preferably bind alternative alleles, thus revealing allele-specific rewiring of molecular circuitry.

Download Full-text

ProTECT – Prediction of T-cell Epitopes for Cancer Therapy

10.1101/696526 ◽

2019 ◽

Author(s):

Arjun A. Rao ◽

Ada A. Madejska ◽

Jacob Pfeil ◽

Benedict Paten ◽

Sofie R. Salama ◽

...

Keyword(s):

Somatic Mutations ◽

Cytotoxic T Cells ◽

T Cell Epitopes ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

Binding Prediction ◽

Local Cluster ◽

Single Nucleotide ◽

Protein Coding ◽

Cell Therapies

AbstractSomatic mutations in cancers affecting protein coding genes can give rise to potentially therapeutic neoepitopes. These neoepitopes can guide Adoptive Cell Therapies (ACTs) and Peptide Vaccines (PVs) to selectively target tumor cells using autologous patient cytotoxic T-cells. Currently, researchers have to independently align their data, call somatic mutations and haplotype the patient’s HLA to use existing neoepitope prediction tools. We present ProTECT, a fully automated, reproducible, scalable, and efficient end-to-end analysis pipeline to identify and rank therapeutically relevant tumor neoepitopes in terms of immunogenicity starting directly from raw patient sequencing data, or from pre-processed data. The ProTECT pipeline encompasses alignment, HLA haplotyping, mutation calling (single nucleotide variants, short insertions and deletions, and gene fusions), peptide:MHC (pMHC) binding prediction, and ranking of final candidates. We demonstrate ProTECT on 326 samples from the TCGA Prostate Adenocarcinoma cohort, and compare it with published tools. ProTECT can be run on a standalone computer, a local cluster, or on a compute cloud using a Mesos backend. ProTECT is highly scalable and can process TCGA data in under 30 minutes per sample when run in large batches. ProTECT is freely available at https://www.github.com/BD2KGenomics/protect.

Download Full-text

Distinct genetic spectrums and evolution patterns of SARS-CoV-2

10.1101/2020.06.16.20132902 ◽

2020 ◽

Cited By ~ 5

Author(s):

Sheng Liu ◽

Jikui Shen ◽

Lei Yang ◽

Chang-Deng Hu ◽

Jun Wan

Keyword(s):

Complete Genome ◽

Clustering Method ◽

Single Nucleotide Variants ◽

Genome Sequences ◽

Nucleotide Substitutions ◽

High Quality ◽

High Coverage ◽

Single Nucleotide ◽

Multiple Groups ◽

Over Time

AbstractFour signature groups of single-nucleotide variants (SNVs) were identified using two-way clustering method in about twenty thousand high quality and high coverage SARS-CoV-2 complete genome sequences. Some frequently occurred SNVs predominate but are mutually exclusively presented in patients from different countries and areas. These major SNV signatures exhibited distinguished evolution patterns over time. Although it was rare, our data indicated possible cross-infections with multiple groups of SNVs existed simultaneously in some patients, suggesting infections from different SARS-CoV-2 clades or potential re-combination of SARS-CoV-2 sequences. Interestingly nucleotide substitutions among SARS-CoV-2 genomes tend to occur at the sites where one bat RaTG13 coronavirus sequences differ from Wuhan-Hu-1 genome, indicating the tolerance of mutations on those sites or suggesting that major viral strains might exist between Wuhan-Hu-1 and RaTG13 coronavirus.

Download Full-text

Predicting the impact of single nucleotide variants on splicing via sequence‐based deep neural networks and genomic features

Human Mutation ◽

10.1002/humu.23794 ◽

2019 ◽

Vol 40 (9) ◽

pp. 1261-1269 ◽

Cited By ~ 1

Author(s):

Tatsuhiko Naito

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Genomic Features ◽

The Impact

Download Full-text