scholarly journals MixMir: microRNA motif discovery from gene expression data using mixed linear models

2014 ◽  
Author(s):  
LIYANG Diao ◽  
Antoine Marcais ◽  
Scott Norton ◽  
Kevin C. Chen

MicroRNAs (miRNAs) are a class of ~22nt non-coding RNAs that potentially regulate over 60% of human protein-coding genes. MiRNA activity is highly specific, differing between cell types, developmental stages and environmental conditions, so the identification of active miRNAs in a given sample is of great interest. Here we present a novel computational approach for analyzing both mRNA sequence and gene expression data, called MixMir. Our method corrects for 3' UTR background sequence similarity between transcripts, which is known to correlate with mRNA transcript abundance. We demonstrate that after accounting for kmer sequence similarities in 3' UTRs, a statistical linear model based on motif presence/absence can effectively discover active miRNAs in a sample. MixMir utilizes fast software implementations for solving mixed linear models which are widely-used in genome-wide association studies (GWAS). Essentially we use 3' UTR sequence similarity in place of population cryptic relatedness in the GWAS problem. Compared to similar methods such as miREDUCE, Sylamer and cWords, we found that MixMir performed better at discovering true miRNA motifs in Dicer knockout CD4+ T-cells, as well as protein and mRNA expression data obtained from miRNA transfection experiments in human cell lines. MixMir can be freely downloaded from https://github.com/ldiao/MixMir.

2014 ◽  
Vol 42 (17) ◽  
pp. e135-e135 ◽  
Author(s):  
Liyang Diao ◽  
Antoine Marcais ◽  
Scott Norton ◽  
Kevin C. Chen

2014 ◽  
Vol 9 ◽  
pp. BMI.S13729 ◽  
Author(s):  
Chindo Hicks ◽  
Tejaswi Koganti ◽  
Shankar Giri ◽  
Memory Tekere ◽  
Ritika Ramani ◽  
...  

Genome-wide association studies (GWAS) have achieved great success in identifying single nucleotide polymorphisms (SNPs, herein called genetic variants) and genes associated with risk of developing prostate cancer. However, GWAS do not typically link the genetic variants to the disease state or inform the broader context in which the genetic variants operate. Here, we present a novel integrative genomics approach that combines GWAS information with gene expression data to infer the causal association between gene expression and the disease and to identify the network states and biological pathways enriched for genetic variants. We identified gene regulatory networks and biological pathways enriched for genetic variants, including the prostate cancer, IGF-1, JAK2, androgen, and prolactin signaling pathways. The integration of GWAS information with gene expression data provides insights about the broader context in which genetic variants associated with an increased risk of developing prostate cancer operate.


2011 ◽  
Vol 10 ◽  
pp. CIN.S6837 ◽  
Author(s):  
Chindo Hicks ◽  
Rozana Asfour ◽  
Antonio Pannuti ◽  
Lucio Miele

Genome-wide association studies (GWAS) have successfully identified genetic variants associated with risk for breast cancer. However, the molecular mechanisms through which the identified variants confer risk or influence phenotypic expression remains poorly understood. Here, we present a novel integrative genomics approach that combines GWAS information with gene expression data to assess the combined contribution of multiple genetic variants acting within genes and putative biological pathways, and to identify novel genes and biological pathways that could not be identified using traditional GWAS. The results show that genes containing SNPs associated with risk for breast cancer are functionally related and interact with each other in biological pathways relevant to breast cancer. Additionally, we identified novel genes that are co-expressed and interact with genes containing SNPs associated with breast cancer. Integrative analysis combining GWAS information with gene expression data provides functional bridges between GWAS findings and biological pathways involved in breast cancer.


Animals ◽  
2019 ◽  
Vol 9 (12) ◽  
pp. 1059 ◽  
Author(s):  
Francisco A. Leal Yepes ◽  
Daryl V. Nydam ◽  
Sabine Mann ◽  
Luciano Caixeta ◽  
Jessica A. A. McArt ◽  
...  

The objective of our study was to identify genomic regions associated with varying concentrations of non-esterified fatty acid (NEFA), β-hydroxybutyrate (BHB), and the development of hyperketonemia (HYK) in longitudinally sampled Holstein dairy cows. Our study population consisted of 147 multiparous cows intensively characterized by serial NEFA and BHB concentrations. To identify individuals with contrasting combinations in longitudinal BHB and NEFA concentrations, phenotypes were established using incremental area under the curve (AUC) and categorized as follows: Group (1) high NEFA and high BHB, group (2) low NEFA and high BHB), group (3) low NEFA and low BHB, and group (4) high NEFA and low BHB. Cows were genotyped on the Illumina Bovine High-density (777 K) beadchip. Genome-wide association studies using mixed linear models with the least-related animals were performed to establish a genetic association with HYK, BHB-AUC, NEFA-AUC, and the comparisons of the 4 AUC phenotypic groups using Golden Helix software. Nine single-nucleotide polymorphisms were associated with high longitudinal concentrations of BHB and further investigated. Five candidate genes related to energy metabolism and homeostasis were identified. These results provide biological insight and help identify susceptible animals thus improving genetic selection criteria thereby decreasing the incidence of HYK.


2020 ◽  
Author(s):  
Jacqueline Milet ◽  
Hervé Perdry

AbstractMotivationMixed linear models (MLM) have been widely used to account for population structure in case-control genome-wide association studies, the status being analyzed as a quantitative phenotype. Chen et al. proved that this method is inappropriate and proposed a score test for the mixed logistic regression (MLR). However this test does not allow an estimation of the variants’ effects.ResultsWe propose two computationally efficient methods to estimate the variants’ effects. Their properties are evaluated on two simulations sets, and compared with other methods (MLM, logistic regression). MLR performs the best in all circumstances. The variants’ effects are well evaluated by our methods, with a moderate bias when the effect sizes are large. Additionally, we propose a stratified QQ-plot, enhancing the diagnosis of p-values inflation or deflation, when population strata are not clearly identified in the sample.AvailabilityAll methods are implemented in the R package milorGWAS available at https://github.com/genostats/[email protected] informationSupplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document