MeSH-informed enrichment analysis and MeSH-guided semantic similarity among functional terms and gene products in chicken

Mapping Intimacies ◽

10.1101/034975 ◽

2015 ◽

Cited By ~ 1

Author(s):

Gota Morota ◽

Timothy M Beissinger ◽

Francisco Peñagaricano

Keyword(s):

Semantic Similarity ◽

Complex Traits ◽

Hierarchical Structures ◽

Enrichment Analysis ◽

Biological Knowledge ◽

Gene Products ◽

Medical Subject Headings ◽

Public Repositories ◽

Alternative Choice ◽

Gene Similarity

AbstractBiomedical vocabularies and ontologies aid in recapitulating biological knowledge. The annotation of gene products is mainly accelerated by Gene Ontology (GO) and more recently by Medical Subject Headings (MeSH). Here we report a suite of MeSH packages for chicken in Bioconductor and illustrate some features of different MeSH-based analyses, including MeSH-informed enrichment analysis and MeSH-guided semantic similarity among terms and gene products, using two lists of chicken genes available in public repositories. The two published datasets that were employed represent (i) differentially expressed genes and (ii) candidate genes under selective sweep or epistatic selection. The comparison of MeSH with GO overrepresentation analyses suggested not only that MeSH supports the findings obtained from GO analysis but also that MeSH is able to further enrich the representation of biological knowledge and often provide more interpretable results. Based on the hierarchical structures of MeSH and GO, we computed semantic similarities among vocabularies as well as semantic similarities among selected genes. These yielded the similarity levels between significant functional terms, and the annotation of each gene yielded the measures of gene similarity. Our findings show the benefits of using MeSH as an alternative choice of annotation in order to draw biological inferences from a list of genes of interest. We argue that the use of MeSH in conjunction with GO will be instrumental in facilitating the understanding of the genetic basis of complex traits.

Download Full-text

MeSH-Informed Enrichment Analysis and MeSH-Guided Semantic Similarity Among Functional Terms and Gene Products in Chicken

G3 Genes|Genome|Genetics ◽

10.1534/g3.116.031096 ◽

2016 ◽

Vol 6 (8) ◽

pp. 2447-2453 ◽

Cited By ~ 6

Author(s):

Gota Morota ◽

Timothy M. Beissinger ◽

Francisco Peñagaricano

Keyword(s):

Semantic Similarity ◽

Enrichment Analysis ◽

Gene Products

Download Full-text

Influence of the go-based semantic similarity measures in multi-objective gene clustering algorithm performance

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720020500389 ◽

2020 ◽

Vol 18 (06) ◽

pp. 2050038

Author(s):

Jorge Parraga-Alava ◽

Mario Inostroza-Ponta

Keyword(s):

Semantic Similarity ◽

Clustering Algorithm ◽

Performance Metrics ◽

Expression Patterns ◽

Biological Significance ◽

Similarity Measures ◽

Gene Clustering ◽

Biological Knowledge ◽

Multi Objective ◽

Gene Similarity

Using a prior biological knowledge of relationships and genetic functions for gene similarity, from repository such as the Gene Ontology (GO), has shown good results in multi-objective gene clustering algorithms. In this scenario and to obtain useful clustering results, it would be helpful to know which measure of biological similarity between genes should be employed to yield meaningful clusters that have both similar expression patterns (co-expression) and biological homogeneity. In this paper, we studied the influence of the four most used GO-based semantic similarity measures in the performance of a multi-objective gene clustering algorithm. We used four publicly available datasets and carried out comparative studies based on performance metrics for the multi-objective optimization field and clustering performance indexes. In most of the cases, using Jiang–Conrath and Wang similarities stand in terms of multi-objective metrics. In clustering properties, Resnik similarity allows to achieve the best values of compactness and separation and therefore of co-expression of groups of genes. Meanwhile, in biological homogeneity, the Wang similarity reports greater number of significant GO terms. However, statistical, visual, and biological significance tests showed that none of the GO-based semantic similarity measures stand out above the rest in order to significantly improve the performance of the multi-objective gene clustering algorithm.

Download Full-text

Correlating Information Contents of Gene Ontology Terms to Infer Semantic Similarity of Gene Products

Computational and Mathematical Methods in Medicine ◽

10.1155/2014/891842 ◽

2014 ◽

Vol 2014 ◽

pp. 1-9 ◽

Cited By ~ 4

Author(s):

Mingxin Gan

Keyword(s):

Gene Ontology ◽

Gene Product ◽

Correlation Coefficient ◽

Semantic Similarity ◽

Biological Process ◽

Jaccard Index ◽

Biological Knowledge ◽

Gene Products ◽

Functional Relationships ◽

Information Contents

Successful applications of the gene ontology to the inference of functional relationships between gene products in recent years have raised the need for computational methods to automatically calculate semantic similarity between gene products based on semantic similarity of gene ontology terms. Nevertheless, existing methods, though having been widely used in a variety of applications, may significantly overestimate semantic similarity between genes that are actually not functionally related, thereby yielding misleading results in applications. To overcome this limitation, we propose to represent a gene product as a vector that is composed of information contents of gene ontology terms annotated for the gene product, and we suggest calculating similarity between two gene products as the relatedness of their corresponding vectors using three measures: Pearson’s correlation coefficient, cosine similarity, and the Jaccard index. We focus on the biological process domain of the gene ontology and annotations of yeast proteins to study the effectiveness of the proposed measures. Results show that semantic similarity scores calculated using the proposed measures are more consistent with known biological knowledge than those derived using a list of existing methods, suggesting the effectiveness of our method in characterizing functional relationships between gene products.

Download Full-text

Improving the Gene Ontology Resource to Facilitate More Informative Analysis and Interpretation of Alzheimer’s Disease Data

Genes ◽

10.3390/genes9120593 ◽

2018 ◽

Vol 9 (12) ◽

pp. 593 ◽

Cited By ~ 8

Author(s):

Barbara Kramarz ◽

Paola Roncaglia ◽

Birgit H. M. Meldal ◽

Rachael P. Huntley ◽

Maria J. Martin ◽

...

Keyword(s):

Gene Ontology ◽

Amyloid Beta ◽

Protein Complexes ◽

Enrichment Analysis ◽

Ontology Development ◽

Biological Knowledge ◽

Gene Products ◽

Associated Proteins ◽

Go Terms ◽

Relevant Gene

The analysis and interpretation of high-throughput datasets relies on access to high-quality bioinformatics resources, as well as processing pipelines and analysis tools. Gene Ontology (GO, geneontology.org) is a major resource for gene enrichment analysis. The aim of this project, funded by the Alzheimer’s Research United Kingdom (ARUK) foundation and led by the University College London (UCL) biocuration team, was to enhance the GO resource by developing new neurological GO terms, and use GO terms to annotate gene products associated with dementia. Specifically, proteins and protein complexes relevant to processes involving amyloid-beta and tau have been annotated and the resulting annotations are denoted in GO databases as ‘ARUK-UCL’. Biological knowledge presented in the scientific literature was captured through the association of GO terms with dementia-relevant protein records; GO itself was revised, and new GO terms were added. This literature biocuration increased the number of Alzheimer’s-relevant gene products that were being associated with neurological GO terms, such as ‘amyloid-beta clearance’ or ‘learning or memory’, as well as neuronal structures and their compartments. Of the total 2055 annotations that we contributed for the prioritised gene products, 526 have associated proteins and complexes with neurological GO terms. To ensure that these descriptive annotations could be provided for Alzheimer’s-relevant gene products, over 70 new GO terms were created. Here, we describe how the improvements in ontology development and biocuration resulting from this initiative can benefit the scientific community and enhance the interpretation of dementia data.

Download Full-text

A global analysis of CNVs in Chinese indigenous fine-wool sheep populations using whole-genome resequencing

10.21203/rs.2.16764/v4 ◽

2021 ◽

Author(s):

Chao Yuan ◽

Zengkui Lu ◽

Tingting Guo ◽

Yaojing Yue ◽

Xijun Wang ◽

...

Keyword(s):

Genetic Variation ◽

Complex Traits ◽

Phenotypic Diversity ◽

Average Length ◽

Global Analysis ◽

Enrichment Analysis ◽

Functional Enrichment Analysis ◽

Functional Enrichment ◽

Nutrient Metabolism ◽

Relaxin Family

Abstract Background Copy number variation (CNV) is an important source of genetic variation that has a significant influence on phenotypic diversity, economically important traits and the evolution of livestock species. In this study, the genome-wide CNV distribution characteristics of 32 fine-wool sheep from three breeds were analyzed using resequencing.Results A total of 1,747,604 CNVs were detected in this study, and 7,228 CNV regions (CNVR) were obtained after merging overlapping CNVs; these regions accounted for 2.17% of the sheep reference genome. The average length of the CNVRs was 4,307.17 bp. “Deletion” events took place more frequently than “duplication” or “both” events. The CNVRs obtained overlapped with previously reported sheep CNVRs to variable extents (4.39%–55.46%). Functional enrichment analysis showed that the CNVR-harboring genes were mainly involved in sensory perception systems, nutrient metabolism processes, and growth and development processes. Furthermore, 1,855 of the CNVRs were associated with 166 quantitative trait loci (QTL), including milk QTLs, carcass QTLs, and health-related QTLs, among others. In addition, the 32 fine-wool sheep were divided into horned and polled groups to analyze for the selective sweep of CNVRs, and it was found that the relaxin family peptide receptor 2 (RXFP2) gene was strongly influenced by selection.Conclusions In summary, we constructed a genomic CNV map for Chinese indigenous fine-wool sheep using resequencing, thereby providing a valuable genetic variation resource for sheep genome research, which will contribute to the study of complex traits in sheep.

Download Full-text

GSAn: an alternative to enrichment analysis for annotating gene sets

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqaa017 ◽

2020 ◽

Vol 2 (2) ◽

Cited By ~ 5

Author(s):

Aaron Ayllon-Benitez ◽

Romain Bourqui ◽

Patricia Thébault ◽

Fleur Mougin

Keyword(s):

Gene Ontology ◽

Semantic Similarity ◽

A Priori ◽

Similarity Measures ◽

Enrichment Analysis ◽

Biological Information ◽

Underlying Structure ◽

Gene Set ◽

Sequencing Technologies ◽

Gene Coverage

Abstract The revolution in new sequencing technologies is greatly leading to new understandings of the relations between genotype and phenotype. To interpret and analyze data that are grouped according to a phenotype of interest, methods based on statistical enrichment became a standard in biology. However, these methods synthesize the biological information by a priori selecting the over-represented terms and may suffer from focusing on the most studied genes that represent a limited coverage of annotated genes within a gene set. Semantic similarity measures have shown great results within the pairwise gene comparison by making advantage of the underlying structure of the Gene Ontology. We developed GSAn, a novel gene set annotation method that uses semantic similarity measures to synthesize a priori Gene Ontology annotation terms. The originality of our approach is to identify the best compromise between the number of retained annotation terms that has to be drastically reduced and the number of related genes that has to be as large as possible. Moreover, GSAn offers interactive visualization facilities dedicated to the multi-scale analysis of gene set annotations. Compared to enrichment analysis tools, GSAn has shown excellent results in terms of maximizing the gene coverage while minimizing the number of terms.

Download Full-text

A GO-driven semantic similarity measure for quantifying the biological relatedness of gene products

Intelligent Decision Technologies ◽

10.3233/idt-2009-0059 ◽

2009 ◽

Vol 3 (4) ◽

pp. 239-248 ◽

Cited By ~ 1

Author(s):

Spiridon C. Denaxas ◽

Christos Tjortjis

Keyword(s):

Semantic Similarity ◽

Similarity Measure ◽

Gene Products ◽

Semantic Similarity Measure

Download Full-text

A GRAPH-BASED SEMANTIC SIMILARITY MEASURE FOR THE GENE ONTOLOGY

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720011005641 ◽

2011 ◽

Vol 09 (06) ◽

pp. 681-695 ◽

Cited By ~ 15

Author(s):

MARCO A. ALVAREZ ◽

CHANGHUI YAN

Keyword(s):

Gene Ontology ◽

Semantic Similarity ◽

Common Ancestor ◽

State Of The Art ◽

Sequence Similarity ◽

Similarity Score ◽

Gene Products ◽

Semantic Similarity Measure ◽

Similarity Algorithm ◽

Go Terms

Existing methods for calculating semantic similarities between pairs of Gene Ontology (GO) terms and gene products often rely on external databases like Gene Ontology Annotation (GOA) that annotate gene products using the GO terms. This dependency leads to some limitations in real applications. Here, we present a semantic similarity algorithm (SSA), that relies exclusively on the GO. When calculating the semantic similarity between a pair of input GO terms, SSA takes into account the shortest path between them, the depth of their nearest common ancestor, and a novel similarity score calculated between the definitions of the involved GO terms. In our work, we use SSA to calculate semantic similarities between pairs of proteins by combining pairwise semantic similarities between the GO terms that annotate the involved proteins. The reliability of SSA was evaluated by comparing the resulting semantic similarities between proteins with the functional similarities between proteins derived from expert annotations or sequence similarity. Comparisons with existing state-of-the-art methods showed that SSA is highly competitive with the other methods. SSA provides a reliable measure for semantics similarity independent of external databases of functional-annotation observations.

Download Full-text

NETGE-PLUS: standard and network-based gene enrichment analysis in human and model organisms

10.1101/750661 ◽

2019 ◽

Author(s):

Samuele Bovo ◽

Pier Luigi Martelli ◽

Pietro Di Lena ◽

Rita Casadio

Keyword(s):

Complex Traits ◽

Enrichment Analysis ◽

Data Retrieval ◽

Functional Enrichment ◽

Model Organisms ◽

Functional Interpretation ◽

Kegg Pathways ◽

Gene Enrichment Analysis ◽

Gene Enrichment ◽

Gene Sets

ABSTRACTOmics techniques provide a spectrum of information that needs to be disentangled to characterize complex traits at the molecular level. The gap between genotype and phenotype must be closed by reconciling the genome information with the set of molecular pathways and biological processes describing the phenotype. In dealing with this problem, gene enrichment analysis has become the most widely adopted strategy. Here, we present NETGE-PLUS, a web-server for standard and network-based functional interpretation of gene sets of human and of model organisms, including S. scrofa, S. cerevisiae, E. coli and A. thaliana. NETGE-PLUS enables the functional enrichment of both simple and ranked lists of genes, also introducing the possibility of exploring relationships among KEGG pathways. A web interface makes data retrieval complete and user-friendly. NETGE-PLUS is publicly available at http://net-ge2.biocomp.unibo.it

Download Full-text

Almond Snacking For 8 Weeks Differentially Altered the Serum Omics Profiles of Young Adults in Comparison to a Control Snack

Current Developments in Nutrition ◽

10.1093/cdn/nzaa049_015 ◽

2020 ◽

Vol 4 (Supplement_2) ◽

pp. 622-622

Author(s):

Jaapna Dhillon ◽

Oliver Fiehn ◽

Rudy Ortiz

Keyword(s):

Young Adults ◽

Lipid Metabolism ◽

Time Of Flight ◽

Enrichment Analysis ◽

Cardiometabolic Health ◽

P Value ◽

Similarity Coefficients ◽

Chemical Similarity ◽

Medical Subject Headings ◽

Funding Sources

Abstract Objectives Almond consumption can improve cardiometabolic health. However, the mechanisms underlying those physiological changes are not well characterized. This study explored the effects of consuming a snack of almonds for 8 weeks on changes in omics profiles in young adults. Methods Newly enrolled, college students (n = 73, age: 18–19 years, BMI: 18–41 kg/m2) were randomly assigned to consume a morning snack, i.e., either almonds (2 oz./d, n = 38) or an isocaloric control snack of graham crackers (325 kcal/d, n = 35) daily for 8 weeks (Clinical trials: NCT03084003). Blood samples were collected every 4 weeks over the 8 week intervention. Metabolite abundances in the serum were quantified by hydrophilic Interaction chromatography (HILIC) quadrupole (Q) time-of-flight (TOF) mass spectrometry (MS/MS), gas chromatography time-of-flight (GCTOF) MS, and CSH-ESI (electrospray) QTOF MS/MS. Data were reported as quantitative ion peak heights and were normalized by systematic error removal using random forest (SERRF) normalization. The baseline-adjusted means of the almond and cracker groups at week 8 were analyzed using ChemRICH which is a chemical similarity enrichment analysis software for metabolomics datasets that uses medical subject headings and Tanimoto substructure chemical similarity coefficients to cluster metabolites into non-overlapping chemical groups. Statistically significant p-values for clusters were obtained by self-contained Kolmogorov–Smirnov tests. Results Out of the 5716 features detected, 857 were identified as known compounds. ChemRICH mapped 660 of the identified metabolites to 63 nonoverlapping chemical classes, of which 2 were found to be significantly different between the almond and cracker groups (false discovery rate adjusted P value (FDR) < 0.05). Almond snacking for 8 weeks was associated with altered unsaturated lipid metabolism represented by significantly increased levels of unsaturated triglycerides and unsaturated lysophosphatidylcholines compared with cracker snacking (cluster FDR < 0.05). Conclusions These findings indicate that almond and cracker snacking for 8 weeks differentially altered lipid metabolism. Funding Sources Research supported by Almond Board of California and NIH-NIMHD.

Download Full-text