Unique genomic features and deeply-conserved functions of long non-coding RNAs in the Cancer LncRNA Census (CLC)

Mapping Intimacies ◽

10.1101/152769 ◽

2017 ◽

Cited By ~ 4

Author(s):

Joana Carlevaro-Fita ◽

Andrés Lanzós ◽

Lars Feuerbach ◽

Chen Hong ◽

David Mas-Ponte ◽

...

Keyword(s):

Cancer Progression ◽

Cancer Genomics ◽

De Novo ◽

Cancer Type ◽

Cancer Genes ◽

Driver Genes ◽

Protein Coding ◽

Evidence Type ◽

Non Coding Rnas ◽

Functional Screens

AbstractLong non-coding RNAs (lncRNAs) that drive tumorigenesis are a growing focus of cancer genomics studies. To facilitate further discovery, we have created the “Cancer LncRNA Census” (CLC), a manually-curated and strictly-defined compilation of lncRNAs with causative roles in cancer. CLC has two principle applications: first, as a resource for training and benchmarking de novo identification methods; and second, as a dataset for studying the fundamental properties of these genes.CLC Version 1 comprises 122 lncRNAs implicated in 29 distinct cancers. LncRNAs are included based on functional or genetic evidence for causative roles in cancer progression. All belong to the GENCODE reference annotation, to enable integration across projects and datasets. For each entry, the evidence type, biological activity (oncogene or tumour suppressor), source reference and cancer type are recorded. Supporting its usefulness, CLC genes are significantly enriched amongst de novo predicted driver genes from PCAWG. CLC genes are distinguished from other lncRNAs by a series of features consistent with biological function, including gene length, high expression and sequence conservation of both exons and promoters. We identify a trend for CLC genes to be co-localised with known protein-coding cancer genes along the human genome. Finally, by integrating data from transposon-mutagenesis functional screens, we show that mouse orthologues of CLC genes tend also to be cancer genes.Thus CLC represents a valuable resource for research into long non-coding RNAs in cancer. Their evolutionary and genomic properties have implications for understanding disease mechanisms and point to conserved functions across ~80 million years of evolution.

Download Full-text

Quantifying gene selection in cancer through protein functional alteration bias

Nucleic Acids Research ◽

10.1093/nar/gkz546 ◽

2019 ◽

Vol 47 (13) ◽

pp. 6642-6655 ◽

Cited By ~ 7

Author(s):

Nadav Brandes ◽

Nathan Linial ◽

Michal Linial

Keyword(s):

Somatic Mutations ◽

Gene Selection ◽

De Novo ◽

Cancer Genes ◽

Driver Genes ◽

Protein Coding ◽

Protein Coding Genes ◽

Machine Learning Model ◽

Implicit And Explicit ◽

False Discoveries

Abstract Compiling the catalogue of genes actively involved in cancer is an ongoing endeavor, with profound implications to the understanding and treatment of the disease. An abundance of computational methods have been developed to screening the genome for candidate driver genes based on genomic data of somatic mutations in tumors. Existing methods make many implicit and explicit assumptions about the distribution of random mutations. We present FABRIC, a new framework for quantifying the selection of genes in cancer by assessing the effects of de-novo somatic mutations on protein-coding genes. Using a machine-learning model, we quantified the functional effects of ∼3M somatic mutations extracted from over 10 000 human cancerous samples, and compared them against the effects of all possible single-nucleotide mutations in the coding human genome. We detected 593 protein-coding genes showing statistically significant bias towards harmful mutations. These genes, discovered without any prior knowledge, show an overwhelming overlap with known cancer genes, but also include many overlooked genes. FABRIC is designed to avoid false discoveries by comparing each gene to its own background model using rigorous statistics, making minimal assumptions about the distribution of random somatic mutations. The framework is an open-source project with a simple command-line interface.

Download Full-text

Moonlight: a tool for biological interpretation and driver genes discovery

10.1101/265322 ◽

2018 ◽

Cited By ~ 6

Author(s):

Antonio Colaprico ◽

Catharina Olsen ◽

Claudia Cava ◽

Thilde Terkelsen ◽

Tiago C. Silva ◽

...

Keyword(s):

Cancer Progression ◽

Regulatory Networks ◽

Enrichment Analysis ◽

Functional Enrichment ◽

Cancer Type ◽

Biological Processes ◽

Expression Data ◽

Cancer Genes ◽

Driver Genes ◽

Link Type

AbstractCancer is a complex and heterogeneous disease. It is crucial to identify the key driver genes and their role in cancer mechanisms with attention to different cancer stages, types or subtypes. Cancer driver genes are elusive and their discovery is complicated by the fact that the same gene can play a diverse role in different contexts. Key biological processes, such as cell proliferation and cell death, have been linked to cancer progression. Thus, in principle, they can be exploited to classify the cancer genes and unveil their role. Here, we present a new method, Moonlight, that exploit expression data to classify cancer genes. Moonlight relies on the integration of functional enrichment analysis, gene regulatory networks and upstream regulator analysis from expression data to score the importance of biological cancer-related processes taking into account either the inter- or intra-tumor heterogeneity. We then employed these scores to predict if each gene acts as a tumor suppressor gene (TSG) or as an oncogene (OCG). Our methodology also allow to predict genes with dual role, i.e. the moonlight genes (TSG in one cancer type or stage and OCG in another), as well as to elucidate the underlying biological processes. Availability: https://bioconductor.org/packages/MoonlightR & https://github.com/ibsquare/MoonlightR/

Download Full-text

Ranking cancer drivers via betweenness-based outlier detection and random walks

BMC Bioinformatics ◽

10.1186/s12859-021-03989-w ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Cesim Erten ◽

Aissa Houdjedj ◽

Hilal Kazan

Keyword(s):

Cancer Genomics ◽

Interaction Network ◽

Molecular Data ◽

Alternative Methods ◽

Patient Specific ◽

Cancer Genes ◽

Driver Genes ◽

Cancer Driver ◽

Protein Protein Interaction ◽

Genomic Studies

Abstract Background Recent cancer genomic studies have generated detailed molecular data on a large number of cancer patients. A key remaining problem in cancer genomics is the identification of driver genes. Results We propose BetweenNet, a computational approach that integrates genomic data with a protein-protein interaction network to identify cancer driver genes. BetweenNet utilizes a measure based on betweenness centrality on patient specific networks to identify the so-called outlier genes that correspond to dysregulated genes for each patient. Setting up the relationship between the mutated genes and the outliers through a bipartite graph, it employs a random-walk process on the graph, which provides the final prioritization of the mutated genes. We compare BetweenNet against state-of-the art cancer gene prioritization methods on lung, breast, and pan-cancer datasets. Conclusions Our evaluations show that BetweenNet is better at recovering known cancer genes based on multiple reference databases. Additionally, we show that the GO terms and the reference pathways enriched in BetweenNet ranked genes and those that are enriched in known cancer genes overlap significantly when compared to the overlaps achieved by the rankings of the alternative methods.

Download Full-text

CALINCA—A Novel Pipeline for the Identification of lncRNAs in Podocyte Disease

Cells ◽

10.3390/cells10030692 ◽

2021 ◽

Vol 10 (3) ◽

pp. 692

Author(s):

Sweta Talyan ◽

Samantha Filipów ◽

Michael Ignarski ◽

Magdalena Smieszek ◽

He Chen ◽

...

Keyword(s):

Cell Biology ◽

Mammalian Cells ◽

De Novo ◽

Depth Information ◽

Gene Products ◽

Classical Analysis ◽

Protein Coding ◽

Bioinformatic Pipeline ◽

Non Coding Rnas ◽

Filtration Unit

Diseases of the renal filtration unit—the glomerulus—are the most common cause of chronic kidney disease. Podocytes are the pivotal cell type for the function of this filter and focal-segmental glomerulosclerosis (FSGS) is a classic example of a podocytopathy leading to proteinuria and glomerular scarring. Currently, no targeted treatment of FSGS is available. This lack of therapeutic strategies is explained by a limited understanding of the defects in podocyte cell biology leading to FSGS. To date, most studies in the field have focused on protein-coding genes and their gene products. However, more than 80% of all transcripts produced by mammalian cells are actually non-coding. Here, long non-coding RNAs (lncRNAs) are a relatively novel class of transcripts and have not been systematically studied in FSGS to date. The appropriate tools to facilitate lncRNA research for the renal scientific community are urgently required due to a row of challenges compared to classical analysis pipelines optimized for coding RNA expression analysis. Here, we present the bioinformatic pipeline CALINCA as a solution for this problem. CALINCA automatically analyzes datasets from murine FSGS models and quantifies both annotated and de novo assembled lncRNAs. In addition, the tool provides in-depth information on podocyte specificity of these lncRNAs, as well as evolutionary conservation and expression in human datasets making this pipeline a crucial basis to lncRNA studies in FSGS.

Download Full-text

Discovering the drivers of clonal hematopoiesis

10.1101/2020.10.22.350140 ◽

2020 ◽

Cited By ~ 1

Author(s):

Oriol Pich ◽

Iker Reyes-Salazar ◽

Abel Gonzalez-Perez ◽

Nuria Lopez-Bigas

Keyword(s):

Positive Selection ◽

Molecular Mechanisms ◽

Somatic Mutations ◽

Cancer Genomics ◽

Variant Calling ◽

Selective Advantage ◽

Cancer Genes ◽

Driver Genes ◽

Hematopoietic Stem ◽

Clonal Hematopoiesis

AbstractMutations in genes that confer a selective advantage to hematopoietic stem cells (HSCs) in certain conditions drive clonal hematopoiesis (CH). While some CH drivers have been identified experimentally or through epidemiological studies, the compendium of all genes able to drive CH upon mutations in HSCs is far from complete. We propose that identifying signals of positive selection in blood somatic mutations may be an effective way to identify CH driver genes, similarly as done to identify cancer genes. Using a reverse somatic variant calling approach, we repurposed whole-genome and whole-exome blood/tumor paired samples of more than 12,000 donors from two large cancer genomics cohorts to identify blood somatic mutations. The application of IntOGen, a robust driver discovery pipeline, to blood somatic mutations across both cohorts, and more than 24,000 targeted sequenced samples yielded a list of close to 70 genes with signals of positive selection in CH, available at http://www.intogen.org/ch. This approach recovers all known CH genes, and discovers novel candidates. Generating this compendium is an essential step to understand the molecular mechanisms of CH and to accurately detect individuals with CH to ascertain their risk to develop related diseases.

Download Full-text

Long noncoding RNAs: fine-tuners hidden in the cancer signaling network

Cell Death Discovery ◽

10.1038/s41420-021-00678-8 ◽

2021 ◽

Vol 7 (1) ◽

Author(s):

Shanshan Zhao ◽

Xue Zhang ◽

Shuo Chen ◽

Song Zhang

Keyword(s):

Signaling Pathways ◽

Cancer Progression ◽

Biological Activities ◽

Noncoding Rnas ◽

Fine Tuning ◽

Multiple Perspectives ◽

Sequencing Technology ◽

Protein Coding ◽

Abnormal Signal ◽

Non Coding Rnas

AbstractWith the development of sequencing technology, a large number of long non-coding RNAs (lncRNAs) have been identified in addition to coding genes. LncRNAs, originally considered as junk RNA, are dysregulated in various types of cancer. Although protein-coding signaling pathways underlie various biological activities, and abnormal signal transduction is a key trigger and indicator for tumorigenesis and cancer progression, lncRNAs are sparking keen interest due to their versatile roles in fine-tuning signaling pathways. We are just beginning to scratch the surface of lncRNAs. Therefore, despite the fact that lncRNAs drive malignant phenotypes from multiple perspectives, in this review, we focus on important signaling pathways modulated by lncRNAs in cancer to demonstrate an up-to-date understanding of this emerging field.

Download Full-text

Mutational analysis of driver genes with tumor suppressive and oncogenic roles in gastric cancer

PeerJ ◽

10.7717/peerj.3585 ◽

2017 ◽

Vol 5 ◽

pp. e3585 ◽

Cited By ~ 2

Author(s):

Tianfang Wang ◽

Yining Liu ◽

Min Zhao

Keyword(s):

Gastric Cancer ◽

Cancer Progression ◽

Complex Disease ◽

Mutational Analysis ◽

Driver Mutations ◽

Driver Genes ◽

Protein Coding ◽

Mirna Genes ◽

Genetic Mechanisms ◽

New Treatment

Gastric cancer (GC) is a complex disease with heterogeneous genetic mechanisms. Genomic mutational profiling of gastric cancer not only expands our knowledge about cancer progression at a fundamental genetic level, but also could provide guidance on new treatment decisions, currently based on tumor histology. The fact that precise medicine-based treatment is successful in a subset of tumors indicates the need for better identification of clinically related molecular tumor phenotypes, especially with regard to those driver mutations on tumor suppressor genes (TSGs) and oncogenes (ONGs). We surveyed 313 TSGs and 160 ONGs associated with 48 protein coding and 19 miRNA genes with both TSG and ONG roles. Using public cancer mutational profiles, we confirmed the dual roles of CDKN1A and CDKN1B. In addition to the widely recognized alterations, we identified another 82 frequently mutated genes in public gastric cancer cohort. In summary, these driver mutation profiles of individual GC will form the basis of personalized treatment of gastric cancer, leading to substantial therapeutic improvements.

Download Full-text

Determining the impact of putative loss-of-function variants in protein-coding genes

10.1101/106468 ◽

2017 ◽

Author(s):

Suganthi Balasubramanian ◽

Yao Fu ◽

Mayur Pawashe ◽

Patrick McGillivray ◽

Mike Jin ◽

...

Keyword(s):

De Novo ◽

Mendelian Disease ◽

Loss Of Function ◽

Homozygous State ◽

Driver Genes ◽

Protein Coding ◽

Disease Gene Discovery ◽

Human Genes ◽

Using Data ◽

The Impact

AbstractVariants predicted to result in the loss of function (LoF) of human genes have attracted interest because of their clinical impact and surprising prevalence in healthy individuals. Here, we present ALoFT (Annotation of Loss-of-Function Transcripts), a method to annotate and predict the disease-causing potential of LoF variants. Using data from Mendelian disease-gene discovery projects, we show that ALoFT can distinguish between LoF variants deleterious as heterozygotes and those causing disease only in the homozygous state. Investigation of variants discovered in healthy populations suggests that each individual carries at least two heterozygous premature stop alleles that could potentially lead to disease if present as homozygotes. When applied to de novo pLoF variants in autism-affected families, ALoFT distinguishes between deleterious variants in patients and benign variants in unaffected siblings. Finally, analysis of somatic variants in > 6,500 cancer exomes shows that pLoF variants predicted to be deleterious by ALoFT are enriched in known driver genes.

Download Full-text

Mutational interactions define novel cancer subgroups

10.1101/187260 ◽

2017 ◽

Author(s):

Jack Kuipers ◽

Thomas Thurnherr ◽

Giusi Moffa ◽

Polina Suter ◽

Jonas Behr ◽

...

Keyword(s):

Bayesian Network ◽

Cancer Progression ◽

Large Scale ◽

De Novo ◽

Drug Repositioning ◽

Tissue Type ◽

Survival Prediction ◽

Cancer Type ◽

Cancer Data ◽

Cancer Types

Large-scale genomic data can help to uncover the complexity and diversity of the molecular changes that drive cancer progression. Statistical analysis of cancer data from different tissues of origin highlights differences and similarities which can guide drug repositioning as well as the design of targeted and precise treatments. Here, we developed an improved Bayesian network model for tumour mutational profiles and applied it to 8,198 patient samples across 22 cancer types from TCGA. For each cancer type, we identified the interactions between mutated genes, capturing signatures beyond mere mutational frequencies. When comparing mutation networks, we found genes which interact both within and across cancer types. To detach cancer classification from the tissue type we performed de novo clustering of the pancancer mutational profiles based on the Bayesian network models. We found 22 novel clusters which significantly improved survival prediction beyond clinical and histopathological information. The models highlight key gene interactions for each cluster that can be used for genomic stratification in clinical trials and for identifying drug targets within strata.

Download Full-text

Discovering novel driver mutations from pan-cancer analysis of mutational and gene expression profiles

PLoS ONE ◽

10.1371/journal.pone.0242780 ◽

2020 ◽

Vol 15 (11) ◽

pp. e0242780

Author(s):

Houriiyah Tegally ◽

Kevin H. Kensler ◽

Zahra Mungloo-Dilmohamud ◽

Anisah W. Ghoorah ◽

Timothy R. Rebbeck ◽

...

Keyword(s):

Gene Expression ◽

Large Scale ◽

Cancer Genomics ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Driver Mutations ◽

Literature Mining ◽

Cancer Genes ◽

Multiple Cancer ◽

Driver Genes

As the genomic profile across cancers varies from person to person, patient prognosis and treatment may differ based on the mutational signature of each tumour. Thus, it is critical to understand genomic drivers of cancer and identify potential mutational commonalities across tumors originating at diverse anatomical sites. Large-scale cancer genomics initiatives, such as TCGA, ICGC and GENIE have enabled the analysis of thousands of tumour genomes. Our goal was to identify new cancer-causing mutations that may be common across tumour sites using mutational and gene expression profiles. Genomic and transcriptomic data from breast, ovarian, and prostate cancers were aggregated and analysed using differential gene expression methods to identify the effect of specific mutations on the expression of multiple genes. Mutated genes associated with the most differentially expressed genes were considered to be novel candidates for driver mutations, and were validated through literature mining, pathway analysis and clinical data investigation. Our driver selection method successfully identified 116 probable novel cancer-causing genes, with 4 discovered in patients having no alterations in any known driver genes: MXRA5, OBSCN, RYR1, and TG. The candidate genes previously not officially classified as cancer-causing showed enrichment in cancer pathways and in cancer diseases. They also matched expectations pertaining to properties of cancer genes, for instance, showing larger gene and protein lengths, and having mutation patterns suggesting oncogenic or tumor suppressor properties. Our approach allows for the identification of novel putative driver genes that are common across cancer sites using an unbiased approach without any a priori knowledge on pathways or gene interactions and is therefore an agnostic approach to the identification of putative common driver genes acting at multiple cancer sites.

Download Full-text