scholarly journals MeSH-informed enrichment analysis and MeSH-guided semantic similarity among functional terms and gene products in chicken

2015 ◽  
Author(s):  
Gota Morota ◽  
Timothy M Beissinger ◽  
Francisco Peñagaricano

AbstractBiomedical vocabularies and ontologies aid in recapitulating biological knowledge. The annotation of gene products is mainly accelerated by Gene Ontology (GO) and more recently by Medical Subject Headings (MeSH). Here we report a suite of MeSH packages for chicken in Bioconductor and illustrate some features of different MeSH-based analyses, including MeSH-informed enrichment analysis and MeSH-guided semantic similarity among terms and gene products, using two lists of chicken genes available in public repositories. The two published datasets that were employed represent (i) differentially expressed genes and (ii) candidate genes under selective sweep or epistatic selection. The comparison of MeSH with GO overrepresentation analyses suggested not only that MeSH supports the findings obtained from GO analysis but also that MeSH is able to further enrich the representation of biological knowledge and often provide more interpretable results. Based on the hierarchical structures of MeSH and GO, we computed semantic similarities among vocabularies as well as semantic similarities among selected genes. These yielded the similarity levels between significant functional terms, and the annotation of each gene yielded the measures of gene similarity. Our findings show the benefits of using MeSH as an alternative choice of annotation in order to draw biological inferences from a list of genes of interest. We argue that the use of MeSH in conjunction with GO will be instrumental in facilitating the understanding of the genetic basis of complex traits.

2016 ◽  
Vol 6 (8) ◽  
pp. 2447-2453 ◽  
Author(s):  
Gota Morota ◽  
Timothy M. Beissinger ◽  
Francisco Peñagaricano

2020 ◽  
Vol 18 (06) ◽  
pp. 2050038
Author(s):  
Jorge Parraga-Alava ◽  
Mario Inostroza-Ponta

Using a prior biological knowledge of relationships and genetic functions for gene similarity, from repository such as the Gene Ontology (GO), has shown good results in multi-objective gene clustering algorithms. In this scenario and to obtain useful clustering results, it would be helpful to know which measure of biological similarity between genes should be employed to yield meaningful clusters that have both similar expression patterns (co-expression) and biological homogeneity. In this paper, we studied the influence of the four most used GO-based semantic similarity measures in the performance of a multi-objective gene clustering algorithm. We used four publicly available datasets and carried out comparative studies based on performance metrics for the multi-objective optimization field and clustering performance indexes. In most of the cases, using Jiang–Conrath and Wang similarities stand in terms of multi-objective metrics. In clustering properties, Resnik similarity allows to achieve the best values of compactness and separation and therefore of co-expression of groups of genes. Meanwhile, in biological homogeneity, the Wang similarity reports greater number of significant GO terms. However, statistical, visual, and biological significance tests showed that none of the GO-based semantic similarity measures stand out above the rest in order to significantly improve the performance of the multi-objective gene clustering algorithm.


2014 ◽  
Vol 2014 ◽  
pp. 1-9 ◽  
Author(s):  
Mingxin Gan

Successful applications of the gene ontology to the inference of functional relationships between gene products in recent years have raised the need for computational methods to automatically calculate semantic similarity between gene products based on semantic similarity of gene ontology terms. Nevertheless, existing methods, though having been widely used in a variety of applications, may significantly overestimate semantic similarity between genes that are actually not functionally related, thereby yielding misleading results in applications. To overcome this limitation, we propose to represent a gene product as a vector that is composed of information contents of gene ontology terms annotated for the gene product, and we suggest calculating similarity between two gene products as the relatedness of their corresponding vectors using three measures: Pearson’s correlation coefficient, cosine similarity, and the Jaccard index. We focus on the biological process domain of the gene ontology and annotations of yeast proteins to study the effectiveness of the proposed measures. Results show that semantic similarity scores calculated using the proposed measures are more consistent with known biological knowledge than those derived using a list of existing methods, suggesting the effectiveness of our method in characterizing functional relationships between gene products.


Genes ◽  
2018 ◽  
Vol 9 (12) ◽  
pp. 593 ◽  
Author(s):  
Barbara Kramarz ◽  
Paola Roncaglia ◽  
Birgit H. M. Meldal ◽  
Rachael P. Huntley ◽  
Maria J. Martin  ◽  
...  

The analysis and interpretation of high-throughput datasets relies on access to high-quality bioinformatics resources, as well as processing pipelines and analysis tools. Gene Ontology (GO, geneontology.org) is a major resource for gene enrichment analysis. The aim of this project, funded by the Alzheimer’s Research United Kingdom (ARUK) foundation and led by the University College London (UCL) biocuration team, was to enhance the GO resource by developing new neurological GO terms, and use GO terms to annotate gene products associated with dementia. Specifically, proteins and protein complexes relevant to processes involving amyloid-beta and tau have been annotated and the resulting annotations are denoted in GO databases as ‘ARUK-UCL’. Biological knowledge presented in the scientific literature was captured through the association of GO terms with dementia-relevant protein records; GO itself was revised, and new GO terms were added. This literature biocuration increased the number of Alzheimer’s-relevant gene products that were being associated with neurological GO terms, such as ‘amyloid-beta clearance’ or ‘learning or memory’, as well as neuronal structures and their compartments. Of the total 2055 annotations that we contributed for the prioritised gene products, 526 have associated proteins and complexes with neurological GO terms. To ensure that these descriptive annotations could be provided for Alzheimer’s-relevant gene products, over 70 new GO terms were created. Here, we describe how the improvements in ontology development and biocuration resulting from this initiative can benefit the scientific community and enhance the interpretation of dementia data.


2021 ◽  
Author(s):  
Chao Yuan ◽  
Zengkui Lu ◽  
Tingting Guo ◽  
Yaojing Yue ◽  
Xijun Wang ◽  
...  

Abstract Background Copy number variation (CNV) is an important source of genetic variation that has a significant influence on phenotypic diversity, economically important traits and the evolution of livestock species. In this study, the genome-wide CNV distribution characteristics of 32 fine-wool sheep from three breeds were analyzed using resequencing.Results A total of 1,747,604 CNVs were detected in this study, and 7,228 CNV regions (CNVR) were obtained after merging overlapping CNVs; these regions accounted for 2.17% of the sheep reference genome. The average length of the CNVRs was 4,307.17 bp. “Deletion” events took place more frequently than “duplication” or “both” events. The CNVRs obtained overlapped with previously reported sheep CNVRs to variable extents (4.39%–55.46%). Functional enrichment analysis showed that the CNVR-harboring genes were mainly involved in sensory perception systems, nutrient metabolism processes, and growth and development processes. Furthermore, 1,855 of the CNVRs were associated with 166 quantitative trait loci (QTL), including milk QTLs, carcass QTLs, and health-related QTLs, among others. In addition, the 32 fine-wool sheep were divided into horned and polled groups to analyze for the selective sweep of CNVRs, and it was found that the relaxin family peptide receptor 2 (RXFP2) gene was strongly influenced by selection.Conclusions In summary, we constructed a genomic CNV map for Chinese indigenous fine-wool sheep using resequencing, thereby providing a valuable genetic variation resource for sheep genome research, which will contribute to the study of complex traits in sheep.


2020 ◽  
Vol 2 (2) ◽  
Author(s):  
Aaron Ayllon-Benitez ◽  
Romain Bourqui ◽  
Patricia Thébault ◽  
Fleur Mougin

Abstract The revolution in new sequencing technologies is greatly leading to new understandings of the relations between genotype and phenotype. To interpret and analyze data that are grouped according to a phenotype of interest, methods based on statistical enrichment became a standard in biology. However, these methods synthesize the biological information by a priori selecting the over-represented terms and may suffer from focusing on the most studied genes that represent a limited coverage of annotated genes within a gene set. Semantic similarity measures have shown great results within the pairwise gene comparison by making advantage of the underlying structure of the Gene Ontology. We developed GSAn, a novel gene set annotation method that uses semantic similarity measures to synthesize a priori Gene Ontology annotation terms. The originality of our approach is to identify the best compromise between the number of retained annotation terms that has to be drastically reduced and the number of related genes that has to be as large as possible. Moreover, GSAn offers interactive visualization facilities dedicated to the multi-scale analysis of gene set annotations. Compared to enrichment analysis tools, GSAn has shown excellent results in terms of maximizing the gene coverage while minimizing the number of terms.


2011 ◽  
Vol 09 (06) ◽  
pp. 681-695 ◽  
Author(s):  
MARCO A. ALVAREZ ◽  
CHANGHUI YAN

Existing methods for calculating semantic similarities between pairs of Gene Ontology (GO) terms and gene products often rely on external databases like Gene Ontology Annotation (GOA) that annotate gene products using the GO terms. This dependency leads to some limitations in real applications. Here, we present a semantic similarity algorithm (SSA), that relies exclusively on the GO. When calculating the semantic similarity between a pair of input GO terms, SSA takes into account the shortest path between them, the depth of their nearest common ancestor, and a novel similarity score calculated between the definitions of the involved GO terms. In our work, we use SSA to calculate semantic similarities between pairs of proteins by combining pairwise semantic similarities between the GO terms that annotate the involved proteins. The reliability of SSA was evaluated by comparing the resulting semantic similarities between proteins with the functional similarities between proteins derived from expert annotations or sequence similarity. Comparisons with existing state-of-the-art methods showed that SSA is highly competitive with the other methods. SSA provides a reliable measure for semantics similarity independent of external databases of functional-annotation observations.


2019 ◽  
Author(s):  
Samuele Bovo ◽  
Pier Luigi Martelli ◽  
Pietro Di Lena ◽  
Rita Casadio

ABSTRACTOmics techniques provide a spectrum of information that needs to be disentangled to characterize complex traits at the molecular level. The gap between genotype and phenotype must be closed by reconciling the genome information with the set of molecular pathways and biological processes describing the phenotype. In dealing with this problem, gene enrichment analysis has become the most widely adopted strategy. Here, we present NETGE-PLUS, a web-server for standard and network-based functional interpretation of gene sets of human and of model organisms, including S. scrofa, S. cerevisiae, E. coli and A. thaliana. NETGE-PLUS enables the functional enrichment of both simple and ranked lists of genes, also introducing the possibility of exploring relationships among KEGG pathways. A web interface makes data retrieval complete and user-friendly. NETGE-PLUS is publicly available at http://net-ge2.biocomp.unibo.it


2020 ◽  
Vol 4 (Supplement_2) ◽  
pp. 622-622
Author(s):  
Jaapna Dhillon ◽  
Oliver Fiehn ◽  
Rudy Ortiz

Abstract Objectives Almond consumption can improve cardiometabolic health. However, the mechanisms underlying those physiological changes are not well characterized. This study explored the effects of consuming a snack of almonds for 8 weeks on changes in omics profiles in young adults. Methods Newly enrolled, college students (n = 73, age: 18–19 years, BMI: 18–41 kg/m2) were randomly assigned to consume a morning snack, i.e., either almonds (2 oz./d, n = 38) or an isocaloric control snack of graham crackers (325 kcal/d, n = 35) daily for 8 weeks (Clinical trials: NCT03084003). Blood samples were collected every 4 weeks over the 8 week intervention. Metabolite abundances in the serum were quantified by hydrophilic Interaction chromatography (HILIC) quadrupole (Q) time-of-flight (TOF) mass spectrometry (MS/MS), gas chromatography time-of-flight (GCTOF) MS, and CSH-ESI (electrospray) QTOF MS/MS. Data were reported as quantitative ion peak heights and were normalized by systematic error removal using random forest (SERRF) normalization. The baseline-adjusted means of the almond and cracker groups at week 8 were analyzed using ChemRICH which is a chemical similarity enrichment analysis software for metabolomics datasets that uses medical subject headings and Tanimoto substructure chemical similarity coefficients to cluster metabolites into non-overlapping chemical groups. Statistically significant p-values for clusters were obtained by self-contained Kolmogorov–Smirnov tests. Results Out of the 5716 features detected, 857 were identified as known compounds. ChemRICH mapped 660 of the identified metabolites to 63 nonoverlapping chemical classes, of which 2 were found to be significantly different between the almond and cracker groups (false discovery rate adjusted P value (FDR) < 0.05). Almond snacking for 8 weeks was associated with altered unsaturated lipid metabolism represented by significantly increased levels of unsaturated triglycerides and unsaturated lysophosphatidylcholines compared with cracker snacking (cluster FDR < 0.05). Conclusions These findings indicate that almond and cracker snacking for 8 weeks differentially altered lipid metabolism. Funding Sources Research supported by Almond Board of California and NIH-NIMHD.


Sign in / Sign up

Export Citation Format

Share Document