MixMir: microRNA motif discovery from gene expression data using mixed linear models

Mapping Intimacies ◽

10.1101/004010 ◽

2014 ◽

Author(s):

LIYANG Diao ◽

Antoine Marcais ◽

Scott Norton ◽

Kevin C. Chen

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Motif Discovery ◽

Linear Models ◽

Developmental Stages ◽

Sequence Similarity ◽

Association Studies ◽

Genome Wide Association Studies ◽

Expression Data ◽

Mixed Linear Models

MicroRNAs (miRNAs) are a class of ~22nt non-coding RNAs that potentially regulate over 60% of human protein-coding genes. MiRNA activity is highly specific, differing between cell types, developmental stages and environmental conditions, so the identification of active miRNAs in a given sample is of great interest. Here we present a novel computational approach for analyzing both mRNA sequence and gene expression data, called MixMir. Our method corrects for 3' UTR background sequence similarity between transcripts, which is known to correlate with mRNA transcript abundance. We demonstrate that after accounting for kmer sequence similarities in 3' UTRs, a statistical linear model based on motif presence/absence can effectively discover active miRNAs in a sample. MixMir utilizes fast software implementations for solving mixed linear models which are widely-used in genome-wide association studies (GWAS). Essentially we use 3' UTR sequence similarity in place of population cryptic relatedness in the GWAS problem. Compared to similar methods such as miREDUCE, Sylamer and cWords, we found that MixMir performed better at discovering true miRNA motifs in Dicer knockout CD4+ T-cells, as well as protein and mRNA expression data obtained from miRNA transfection experiments in human cell lines. MixMir can be freely downloaded from https://github.com/ldiao/MixMir.

Download Full-text

MixMir: microRNA motif discovery from gene expression data using mixed linear models

Nucleic Acids Research ◽

10.1093/nar/gku672 ◽

2014 ◽

Vol 42 (17) ◽

pp. e135-e135 ◽

Cited By ~ 11

Author(s):

Liyang Diao ◽

Antoine Marcais ◽

Scott Norton ◽

Kevin C. Chen

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Motif Discovery ◽

Linear Models ◽

Expression Data ◽

Mixed Linear Models

Download Full-text

Faculty Opinions recommendation of Meta-Analysis of Genome-Wide Association Studies and Network Analysis-Based Integration with Gene Expression Data Identify New Suggestive Loci and Unravel a Wnt-Centric Network Associated with Dupuytren's Disease.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.726582766.793522025 ◽

2016 ◽

Author(s):

Rik Lories

Keyword(s):

Gene Expression ◽

Network Analysis ◽

Gene Expression Data ◽

Association Studies ◽

Meta Analysis ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Expression Data ◽

Dupuytren's Disease ◽

Genome Wide

Download Full-text

Integrative Genomic Analysis for the Discovery of Biomarkers in Prostate Cancer

Biomarker Insights ◽

10.4137/bmi.s13729 ◽

2014 ◽

Vol 9 ◽

pp. BMI.S13729 ◽

Cited By ~ 5

Author(s):

Chindo Hicks ◽

Tejaswi Koganti ◽

Shankar Giri ◽

Memory Tekere ◽

Ritika Ramani ◽

...

Keyword(s):

Gene Expression ◽

Prostate Cancer ◽

Gene Expression Data ◽

Genetic Variants ◽

Association Studies ◽

Biological Pathways ◽

Great Success ◽

Genome Wide Association Studies ◽

Expression Data ◽

Increased Risk

Genome-wide association studies (GWAS) have achieved great success in identifying single nucleotide polymorphisms (SNPs, herein called genetic variants) and genes associated with risk of developing prostate cancer. However, GWAS do not typically link the genetic variants to the disease state or inform the broader context in which the genetic variants operate. Here, we present a novel integrative genomics approach that combines GWAS information with gene expression data to infer the causal association between gene expression and the disease and to identify the network states and biological pathways enriched for genetic variants. We identified gene regulatory networks and biological pathways enriched for genetic variants, including the prostate cancer, IGF-1, JAK2, androgen, and prolactin signaling pathways. The integration of GWAS information with gene expression data provides insights about the broader context in which genetic variants associated with an increased risk of developing prostate cancer operate.

Download Full-text

Combining Genome Wide Association Studies and Differential Gene Expression Data Analyses Identifies Candidate Genes Affecting Mastitis Caused by Two Different Pathogens in the Dairy Cow

Open Journal of Animal Sciences ◽

10.4236/ojas.2015.54040 ◽

2015 ◽

Vol 05 (04) ◽

pp. 358-393 ◽

Cited By ~ 14

Author(s):

Xing Chen ◽

Zhangrui Cheng ◽

Shujun Zhang ◽

Dirk Werling ◽

D. Claire Wathes

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Dairy Cow ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Expression Data ◽

Data Analyses ◽

Genome Wide ◽

Differential Gene

Download Full-text

Faculty Opinions recommendation of Meta-Analysis of Genome-Wide Association Studies and Network Analysis-Based Integration with Gene Expression Data Identify New Suggestive Loci and Unravel a Wnt-Centric Network Associated with Dupuytren's Disease.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.726582766.793556754 ◽

2019 ◽

Author(s):

Jagdeep Nanchahal

Keyword(s):

Gene Expression ◽

Network Analysis ◽

Gene Expression Data ◽

Association Studies ◽

Meta Analysis ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Expression Data ◽

Dupuytren's Disease ◽

Genome Wide

Download Full-text

An Integrative Genomics Approach to Biomarker Discovery in Breast Cancer

Cancer Informatics ◽

10.4137/cin.s6837 ◽

2011 ◽

Vol 10 ◽

pp. CIN.S6837 ◽

Cited By ~ 17

Author(s):

Chindo Hicks ◽

Rozana Asfour ◽

Antonio Pannuti ◽

Lucio Miele

Keyword(s):

Breast Cancer ◽

Gene Expression ◽

Gene Expression Data ◽

Genetic Variants ◽

Association Studies ◽

Biological Pathways ◽

Genome Wide Association Studies ◽

Expression Data ◽

Integrative Genomics ◽

Novel Genes

Genome-wide association studies (GWAS) have successfully identified genetic variants associated with risk for breast cancer. However, the molecular mechanisms through which the identified variants confer risk or influence phenotypic expression remains poorly understood. Here, we present a novel integrative genomics approach that combines GWAS information with gene expression data to assess the combined contribution of multiple genetic variants acting within genes and putative biological pathways, and to identify novel genes and biological pathways that could not be identified using traditional GWAS. The results show that genes containing SNPs associated with risk for breast cancer are functionally related and interact with each other in biological pathways relevant to breast cancer. Additionally, we identified novel genes that are co-expressed and interact with genes containing SNPs associated with breast cancer. Integrative analysis combining GWAS information with gene expression data provides functional bridges between GWAS findings and biological pathways involved in breast cancer.

Download Full-text

Meta-Analysis of Genome-Wide Association Studies and Network Analysis-Based Integration with Gene Expression Data Identify New Suggestive Loci and Unravel a Wnt-Centric Network Associated with Dupuytren’s Disease

PLoS ONE ◽

10.1371/journal.pone.0158101 ◽

2016 ◽

Vol 11 (7) ◽

pp. e0158101 ◽

Cited By ~ 14

Author(s):

Kerstin Becker ◽

Sabine Siegert ◽

Mohammad Reza Toliat ◽

Juanjiangmeng Du ◽

Ramona Casper ◽

...

Keyword(s):

Gene Expression ◽

Network Analysis ◽

Gene Expression Data ◽

Association Studies ◽

Meta Analysis ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Expression Data ◽

Dupuytren's Disease ◽

Genome Wide

Download Full-text

Integrative pathway analysis of genome-wide association studies and gene expression data in prostate cancer

BMC Systems Biology ◽

10.1186/1752-0509-6-s3-s13 ◽

2012 ◽

Vol 6 (Suppl 3) ◽

pp. S13 ◽

Cited By ~ 20

Author(s):

Peilin Jia ◽

Yang Liu ◽

Zhongming Zhao

Keyword(s):

Gene Expression ◽

Prostate Cancer ◽

Gene Expression Data ◽

Pathway Analysis ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Expression Data ◽

Genome Wide

Download Full-text

Longitudinal Phenotypes Improve Genotype Association for Hyperketonemia in Dairy Cattle

Animals ◽

10.3390/ani9121059 ◽

2019 ◽

Vol 9 (12) ◽

pp. 1059 ◽

Cited By ~ 1

Author(s):

Francisco A. Leal Yepes ◽

Daryl V. Nydam ◽

Sabine Mann ◽

Luciano Caixeta ◽

Jessica A. A. McArt ◽

...

Keyword(s):

Linear Models ◽

Association Studies ◽

Genetic Selection ◽

Area Under The Curve ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Mixed Linear Models ◽

Biological Insight ◽

Group 2 ◽

Genomic Regions

The objective of our study was to identify genomic regions associated with varying concentrations of non-esterified fatty acid (NEFA), β-hydroxybutyrate (BHB), and the development of hyperketonemia (HYK) in longitudinally sampled Holstein dairy cows. Our study population consisted of 147 multiparous cows intensively characterized by serial NEFA and BHB concentrations. To identify individuals with contrasting combinations in longitudinal BHB and NEFA concentrations, phenotypes were established using incremental area under the curve (AUC) and categorized as follows: Group (1) high NEFA and high BHB, group (2) low NEFA and high BHB), group (3) low NEFA and low BHB, and group (4) high NEFA and low BHB. Cows were genotyped on the Illumina Bovine High-density (777 K) beadchip. Genome-wide association studies using mixed linear models with the least-related animals were performed to establish a genetic association with HYK, BHB-AUC, NEFA-AUC, and the comparisons of the 4 AUC phenotypic groups using Golden Helix software. Nine single-nucleotide polymorphisms were associated with high longitudinal concentrations of BHB and further investigated. Five candidate genes related to energy metabolism and homeostasis were identified. These results provide biological insight and help identify susceptible animals thus improving genetic selection criteria thereby decreasing the incidence of HYK.

Download Full-text

Mixed Logistic Regression in Genome-Wide Association Studies

10.1101/2020.01.17.910109 ◽

2020 ◽

Author(s):

Jacqueline Milet ◽

Hervé Perdry

Keyword(s):

Logistic Regression ◽

Linear Models ◽

Association Studies ◽

Score Test ◽

R Package ◽

Genome Wide Association ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Mixed Linear Models ◽

Genome Wide

AbstractMotivationMixed linear models (MLM) have been widely used to account for population structure in case-control genome-wide association studies, the status being analyzed as a quantitative phenotype. Chen et al. proved that this method is inappropriate and proposed a score test for the mixed logistic regression (MLR). However this test does not allow an estimation of the variants’ effects.ResultsWe propose two computationally efficient methods to estimate the variants’ effects. Their properties are evaluated on two simulations sets, and compared with other methods (MLM, logistic regression). MLR performs the best in all circumstances. The variants’ effects are well evaluated by our methods, with a moderate bias when the effect sizes are large. Additionally, we propose a stratified QQ-plot, enhancing the diagnosis of p-values inflation or deflation, when population strata are not clearly identified in the sample.AvailabilityAll methods are implemented in the R package milorGWAS available at https://github.com/genostats/[email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text