A novel method to quantify gene set functional association based on gene ontology

Sali Lv; Yan Li; Qianghu Wang; Shangwei Ning; Teng Huang; Peng Wang; Jie Sun; Yan Zheng; Weisha Liu; Jing Ai; Xia Li

doi:10.1098/rsif.2011.0551

A novel method to quantify gene set functional association based on gene ontology

Journal of The Royal Society Interface ◽

10.1098/rsif.2011.0551 ◽

2011 ◽

Vol 9 (70) ◽

pp. 1063-1072 ◽

Cited By ~ 22

Author(s):

Sali Lv ◽

Yan Li ◽

Qianghu Wang ◽

Shangwei Ning ◽

Teng Huang ◽

...

Keyword(s):

Gene Ontology ◽

Genetic Basis ◽

Protein Complexes ◽

Sequence Similarity ◽

Functional Association ◽

Future Studies ◽

Gene Set ◽

Complex Disorders ◽

Gene Sets ◽

Novel Method

Numerous gene sets have been used as molecular signatures for exploring the genetic basis of complex disorders. These gene sets are distinct but related to each other in many cases; therefore, efforts have been made to compare gene sets for studies such as those evaluating the reproducibility of different experiments. Comparison in terms of biological function has been demonstrated to be helpful to biologists. We improved the measurement of semantic similarity to quantify the functional association between gene sets in the context of gene ontology and developed a web toolkit named Gene Set Functional Similarity (GSFS; http://bioinfo.hrbmu.edu.cn/GSFS ). Validation based on protein complexes for which the functional associations are known demonstrated that the GSFS scores tend to be correlated with sequence similarity scores and that complexes with high GSFS scores tend to be involved in the same functional catalogue. Compared with the pairwise method and the annotation method, the GSFS shows better discrimination and more accurately reflects the known functional catalogues shared between complexes. Case studies comparing differentially expressed genes of prostate tumour samples from different microarray platforms and identifying coronary heart disease susceptibility pathways revealed that the method could contribute to future studies exploring the molecular basis of complex disorders.

Download Full-text

Novel Ultra-Rare Exonic Variants Identified in a Founder Population Implicate Cadherins in Schizophrenia

10.1101/2020.05.29.20115352 ◽

2020 ◽

Author(s):

Todd Lencz ◽

Jin Yu ◽

Raiyan Rashid Khan ◽

Shai Carmi ◽

Max Lam ◽

...

Keyword(s):

Rare Variant ◽

Rare Variants ◽

Gene List ◽

Ashkenazi Jewish ◽

Founder Population ◽

Total N ◽

Gene Set ◽

Complex Disorders ◽

Gene Sets ◽

Common Genetic Variants

AbstractIMPORTANCESchizophrenia is a serious mental illness with high heritability. While common genetic variants account for a portion of the heritability, identification of rare variants associated with the disorder has proven challenging.OBJECTIVETo identify genes and gene sets associated with schizophrenia in a founder population (Ashkenazi Jewish), and to determine the relative power of this population for rare variant discovery.DESIGN, SETTING, AND PARTICIPANTSData on exonic variants were extracted from whole genome sequences drawn from 786 patients with schizophrenia and 463 healthy control subjects, all drawn from the Ashkenazi Jewish population. Variants observed in two large publicly available datasets (total n≈153,000, excluding neuropsychiatric patients) were filtered out, and novel ultra-rare variants (URVs) were compared in cases and controls.MAIN OUTCOMES AND MEASURESThe number of novel URVs and genes carrying them were compared across cases and controls. Genes in which only cases or only controls carried novel, functional URVs were examined using gene set analyses.RESULTSCases had a higher frequency of novel missense or loss of function (MisLoF) variants compared to controls, as well as a greater number of genes impacted by MisLoF variants. Characterizing 141 “case-only” genes (in which ≥ 3 AJ cases in our dataset had MisLoF URVs with none found in our AJ controls), we replicated prior findings of both enrichment for synaptic gene sets, as well as specific genes such as SETD1A and TRIO. Additionally, we identified cadherins as a novel gene set associated with schizophrenia including a recurrent mutation in PCDHA3. Several genes associated with autism and other neurodevelopmental disorders including CACNA1E, ASXL3, SETBP1, and WDFY3, were also identified in our case-only gene list, as was TSC2, which is linked to tuberous sclerosis. Modeling the effects of purifying selection demonstrated that deleterious rare variants are greatly over-represented in a founder population with a tight bottleneck and rapidly expanding census, resulting in enhanced power for rare variant association studies.CONCLUSIONS AND RELEVANCEIdentification of cell adhesion genes in the cadherin/protocadherin family is consistent with evidence from large-scale GWAS in schizophrenia, helps specify the synaptic abnormalities that may be central to the disorder, and suggests novel potential treatment strategies (e.g., inhibition of protein kinase C). Study of founder populations may serve as a cost-effective way to rapidly increase gene discovery in schizophrenia and other complex disorders.

Download Full-text

SimExact – An Efficient Method to Compute Function Similarity Between Proteins Using Gene Ontology

Current Bioinformatics ◽

10.2174/1574893614666191017092842 ◽

2020 ◽

Vol 15 (4) ◽

pp. 318-327

Author(s):

Najmul Ikram ◽

Muhammad Abdul Qadir ◽

Muhammad Tanvir Afzal

Keyword(s):

Gene Ontology ◽

High Speed ◽

Sequence Similarity ◽

Query Protein ◽

Online Tool ◽

Compute Function ◽

Novel Method ◽

Function Similarity ◽

Functional Prototype ◽

Ranked List

Background: The rapidly growing protein and annotation databases necessitate the development of efficient tools to process this valuable information. Biologists frequently need to find proteins similar to a given protein, for which BLAST tools are commonly used. With the development of biomedical ontologies, e.g. Gene Ontology, methods were designed to measure function (semantic) similarity between two proteins. These methods work well on protein pairs, but are not suitable for protein query processing. Objective: Our aim is to facilitate searching of similar proteins in an acceptable time. Methods: A novel method SimExact for high speed searching of functionally similar proteins has been proposed. Results: The experiments of this study show that SimExact gives correct results required for protein searching. A fully functional prototype of an online tool (www.datafurnish.com/protsem.php) has been provided that generates a ranked list of the proteins similar to a query protein, with a response time of less than 20 seconds in our setup. SimExact was used to search for protein pairs having high disparity between function similarity and sequence similarity. Conclusion: SimExact makes such searches practical, which would not be possible in a reasonable time otherwise.

Download Full-text

GRep: Gene Set Representation via Gaussian Embedding

10.1101/519033 ◽

2019 ◽

Cited By ~ 1

Author(s):

Sheng Wang ◽

Emily Flynn ◽

Russ B. Altman

Keyword(s):

Gaussian Distribution ◽

Protein Complexes ◽

Dimensional Space ◽

Single Point ◽

Building Blocks ◽

Multivariate Gaussian Distribution ◽

Gene Set ◽

Gene Sets ◽

Multivariate Gaussian ◽

Low Dimensional

ABSTRACTMolecular interaction networks are our basis for understanding functional interdependencies among genes. Network embedding approaches analyze these complicated networks by representing genes as low-dimensional vectors based on the network topology. These low-dimensional vectors have recently become the building blocks for a larger number of systems biology applications. Despite the success of embedding genes in this way, it remains unclear how to effectively represent gene sets, such as protein complexes and signaling pathways. The direct adaptation of existing gene embedding approaches to gene sets cannot model the diverse functions of genes in a set. Here, we propose GRep, a novel gene set embedding approach, which represents each gene set as a multivariate Gaussian distribution rather than a single point in the low-dimensional space. The diversity of genes in a set, or the uncertainty of their contribution to a particular function, is modeled by the covariance matrix of the multivariate Gaussian distribution. By doing so, GRep produces a highly informative and compact gene set representation. Using our representation, we analyze two major pharmacogenomics studies and observe substantial improvement in drug target identification from expression-derived gene sets. Overall, the GRep framework provides a novel representation of gene sets that can be used as input features to off-the-shelf machine learning classifiers for gene set analysis.

Download Full-text

Association between a Prognostic Gene Signature and Functional Gene Sets

Bioinformatics and Biology Insights ◽

10.4137/bbi.s1018 ◽

2008 ◽

Vol 2 ◽

pp. BBI.S1018 ◽

Cited By ~ 3

Author(s):

Manuela Hummel ◽

Klaus H. Metzeler ◽

Christian Buske ◽

Stefan K. Bohlander ◽

Ulrich Mansmann

Keyword(s):

Gene Ontology ◽

Risk Score ◽

Gene Signature ◽

Functional Gene ◽

Gene Set Enrichment ◽

Functional Interpretation ◽

Data Set ◽

Gene Set ◽

Signature Genes ◽

Gene Sets

Background The development of expression-based gene signatures for predicting prognosis or class membership is a popular and challenging task. Besides their stringent validation, signatures need a functional interpretation and must be placed in a biological context. Popular tools such as Gene Set Enrichment have drawbacks because they are restricted to annotated genes and are unable to capture the information hidden in the signature's non-annotated genes. Methodology We propose concepts to relate a signature with functional gene sets like pathways or Gene Ontology categories. The connection between single signature genes and a specific pathway is explored by hierarchical variable selection and gene association networks. The risk score derived from an individual patient's signature is related to expression patterns of pathways and Gene Ontology categories. Global tests are useful for these tasks, and they adjust for other factors. GlobalAncova is used to explore the effect on gene expression in specific functional groups from the interaction of the score and selected mutations in the patient's genome. Results We apply the proposed methods to an expression data set and a corresponding gene signature for predicting survival in Acute Myeloid Leukemia (AML). The example demonstrates strong relations between the signature and cancer-related pathways. The signature-based risk score was found to be associated with development-related biological processes. Conclusions Many authors interpret the functional aspects of a gene signature by linking signature genes to pathways or relevant functional gene groups. The method of gene set enrichment is preferred to annotating signature genes to specific Gene Ontology categories. The strategies proposed in this paper go beyond the restriction of annotation and deepen the insights into the biological mechanisms reflected in the information given by a signature.

Download Full-text

Utilizing Cancer - Functional Gene Set - Compound Networks to Identify Putative Drugs for Breast Cancer

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1574888x13666180105125347 ◽

2018 ◽

Vol 21 (2) ◽

pp. 74-83

Author(s):

Tzu-Hung Hsiao ◽

Yu-Chiao Chiu ◽

Yu-Heng Chen ◽

Yu-Ching Hsu ◽

Hung-I Harry Chen ◽

...

Keyword(s):

Breast Cancer ◽

Gene Expression ◽

Cancer Therapy ◽

Cancer Treatment ◽

Cancer Survival ◽

Expression Profiles ◽

Functional Gene ◽

Gene Set Enrichment Analysis ◽

Gene Set ◽

Gene Sets

Aim and Objective: The number of anticancer drugs available currently is limited, and some of them have low treatment response rates. Moreover, developing a new drug for cancer therapy is labor intensive and sometimes cost prohibitive. Therefore, “repositioning” of known cancer treatment compounds can speed up the development time and potentially increase the response rate of cancer therapy. This study proposes a systems biology method for identifying new compound candidates for cancer treatment in two separate procedures. Materials and Methods: First, a “gene set–compound” network was constructed by conducting gene set enrichment analysis on the expression profile of responses to a compound. Second, survival analyses were applied to gene expression profiles derived from four breast cancer patient cohorts to identify gene sets that are associated with cancer survival. A “cancer–functional gene set– compound” network was constructed, and candidate anticancer compounds were identified. Through the use of breast cancer as an example, 162 breast cancer survival-associated gene sets and 172 putative compounds were obtained. Results: We demonstrated how to utilize the clinical relevance of previous studies through gene sets and then connect it to candidate compounds by using gene expression data from the Connectivity Map. Specifically, we chose a gene set derived from a stem cell study to demonstrate its association with breast cancer prognosis and discussed six new compounds that can increase the expression of the gene set after the treatment. Conclusion: Our method can effectively identify compounds with a potential to be “repositioned” for cancer treatment according to their active mechanisms and their association with patients’ survival time.

Download Full-text

Gene Set Correlation Analysis and Visualization Using Gene Expression Data

Current Bioinformatics ◽

10.2174/1574893615999200629124444 ◽

2020 ◽

Vol 15 ◽

Author(s):

Chen-An Tsai ◽

James J. Chen

Keyword(s):

Gene Expression ◽

Correlation Analysis ◽

Gene Expression Data ◽

Differentially Expressed Gene ◽

Differentially Expressed ◽

Superior Performance ◽

Expression Data ◽

Gene Set ◽

Gene Sets ◽

Set Correlation

Background: Gene set enrichment analyses (GSEA) provide a useful and powerful approach to identify differentially expressed gene sets with prior biological knowledge. Several GSEA algorithms have been proposed to perform enrichment analyses on groups of genes. However, many of these algorithms have focused on identification of differentially expressed gene sets in a given phenotype. Objective: In this paper, we propose a gene set analytic framework, Gene Set Correlation Analysis (GSCoA), that simultaneously measures within and between gene sets variation to identify sets of genes enriched for differential expression and highly co-related pathways. Methods: We apply co-inertia analysis to the comparisons of cross-gene sets in gene expression data to measure the costructure of expression profiles in pairs of gene sets. Co-inertia analysis (CIA) is one multivariate method to identify trends or co-relationships in multiple datasets, which contain the same samples. The objective of CIA is to seek ordinations (dimension reduction diagrams) of two gene sets such that the square covariance between the projections of the gene sets on successive axes is maximized. Simulation studies illustrate that CIA offers superior performance in identifying corelationships between gene sets in all simulation settings when compared to correlation-based gene set methods. Result and Conclusion: We also combine between-gene set CIA and GSEA to discover the relationships between gene sets significantly associated with phenotypes. In addition, we provide a graphical technique for visualizing and simultaneously exploring the associations of between and within gene sets and their interaction and network. We then demonstrate integration of within and between gene sets variation using CIA and GSEA, applied to the p53 gene expression data using the c2 curated gene sets. Ultimately, the GSCoA approach provides an attractive tool for identification and visualization of novel associations between pairs of gene sets by integrating co-relationships between gene sets into gene set analysis.

Download Full-text

Gene Sets of Gene Ontology are More Stable Diagnostic Biomarkers than Genes in Oral Squamous Cell Carcinoma

Current Bioinformatics ◽

10.2174/1574893611308050009 ◽

2013 ◽

Vol 8 (5) ◽

pp. 577-582 ◽

Cited By ~ 1

Author(s):

Tao Huang ◽

Wei Wu ◽

Honglai Jin ◽

Yu-Dong Cai

Keyword(s):

Squamous Cell Carcinoma ◽

Gene Ontology ◽

Oral Squamous Cell Carcinoma ◽

Cell Carcinoma ◽

Squamous Cell ◽

Diagnostic Biomarkers ◽

Gene Sets

Download Full-text

Establishing a consensus for the hallmarks of cancer based on gene ontology and pathway annotations

BMC Bioinformatics ◽

10.1186/s12859-021-04105-8 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Yi Chen ◽

Fons. J. Verbeek ◽

Katherine Wolstencroft

Keyword(s):

Gene Ontology ◽

Enrichment Analysis ◽

Biological Data ◽

Hallmarks Of Cancer ◽

High Throughput Analysis ◽

Knowledge Resources ◽

Gene Set ◽

Cancer Hallmarks ◽

Starting Point ◽

High Level

Abstract Background The hallmarks of cancer provide a highly cited and well-used conceptual framework for describing the processes involved in cancer cell development and tumourigenesis. However, methods for translating these high-level concepts into data-level associations between hallmarks and genes (for high throughput analysis), vary widely between studies. The examination of different strategies to associate and map cancer hallmarks reveals significant differences, but also consensus. Results Here we present the results of a comparative analysis of cancer hallmark mapping strategies, based on Gene Ontology and biological pathway annotation, from different studies. By analysing the semantic similarity between annotations, and the resulting gene set overlap, we identify emerging consensus knowledge. In addition, we analyse the differences between hallmark and gene set associations using Weighted Gene Co-expression Network Analysis and enrichment analysis. Conclusions Reaching a community-wide consensus on how to identify cancer hallmark activity from research data would enable more systematic data integration and comparison between studies. These results highlight the current state of the consensus and offer a starting point for further convergence. In addition, we show how a lack of consensus can lead to large differences in the biological interpretation of downstream analyses and discuss the challenges of annotating changing and accumulating biological data, using intermediate knowledge resources that are also changing over time.

Download Full-text

Higher Acid-Base Imbalance Associated with Respiratory Failure Could Decrease the Survival of Patients with Scrub Typhus during Intensive Care Unit Stay: A Gene Set Enrichment Analysis

Journal of Clinical Medicine ◽

10.3390/jcm8101580 ◽

2019 ◽

Vol 8 (10) ◽

pp. 1580 ◽

Cited By ~ 1

Author(s):

Kyoung Min Moon ◽

Kyueng-Whan Min ◽

Mi-Hye Kim ◽

Dong-Hoon Kim ◽

Byoung Kwan Son ◽

...

Keyword(s):

Intensive Care Unit ◽

Intensive Care ◽

Respiratory Failure ◽

Scrub Typhus ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Acid Base ◽

Gene Set Enrichment ◽

Gene Set ◽

Gene Sets

Ninety percent of patients with scrub typhus (SC) with vasculitis-like syndrome recover after mild symptoms; however, 10% can suffer serious complications, such as acute respiratory failure (ARF) and admission to the intensive care unit (ICU). Predictors for the progression of SC have not yet been established, and conventional scoring systems for ICU patients are insufficient to predict severity. We aimed to identify simple and robust indicators to predict aggressive behaviors of SC. We evaluated 91 patients with SC and 81 non-SC patients who were admitted to the ICU, and 32 cases from the public functional genomics data repository for gene expression analysis. We analyzed the relationships between several predictors and clinicopathological characteristics in patients with SC. We performed gene set enrichment analysis (GSEA) to identify SC-specific gene sets. The acid-base imbalance (ABI), measured 24 h before serious complications, was higher in patients with SC than in non-SC patients. A high ABI was associated with an increased incidence of ARF, leading to mechanical ventilation and worse survival. GSEA revealed that SC correlated to gene sets reflecting inflammation/apoptotic response and airway inflammation. ABI can be used to indicate ARF in patients with SC and assist with early detection.

Download Full-text

Mitochondrial Syndromes Revisited

Journal of Clinical Medicine ◽

10.3390/jcm10061249 ◽

2021 ◽

Vol 10 (6) ◽

pp. 1249

Author(s):

Daniele Orsucci ◽

Elena Caldarazzo Ienco ◽

Andrea Rossi ◽

Gabriele Siciliano ◽

Michelangelo Mancuso

Keyword(s):

Clinical Trials ◽

Genetic Basis ◽

Phenotypic Variability ◽

Mitochondrial Diseases ◽

Genetic Alterations ◽

Mitochondrial Disorders ◽

Single Mutation ◽

Multicenter Studies ◽

Future Studies ◽

Clinical Syndromes

In the last ten years, the knowledge of the genetic basis of mitochondrial diseases has significantly advanced. However, the vast phenotypic variability linked to mitochondrial disorders and the peculiar characteristics of their genetics make mitochondrial disorders a complex group of disorders. Although specific genetic alterations have been associated with some syndromic presentations, the genotype–phenotype relationship in mitochondrial disorders is complex (a single mutation can cause several clinical syndromes, while different genetic alterations can cause similar phenotypes). This review will revisit the most common syndromic pictures of mitochondrial disorders, from a clinical rather than a molecular perspective. We believe that the new phenotype definitions implemented by recent large multicenter studies, and revised here, may contribute to a more homogeneous patient categorization, which will be useful in future studies on natural history and clinical trials.

Download Full-text