Estimation of Mediation Effect for High-dimensional Omics Mediators with Application to the Framingham Heart Study

Mapping Intimacies ◽

10.1101/774877 ◽

2019 ◽

Author(s):

Tianzhong Yang ◽

Jingbo Niu ◽

Han Chen ◽

Peng Wei

Keyword(s):

Gene Expression ◽

Framingham Heart Study ◽

Mixed Model ◽

Expression Profiles ◽

Estimation Procedure ◽

Mediation Effect ◽

High Dimensional ◽

Model Framework ◽

Intermediate Phenotypes ◽

Heart Study

SUMMARYEnvironmental exposures can regulate intermediate molecular phenotypes, such as gene expression, by different mechanisms and thereby lead to various health outcomes. It is of significant scientific interest to unravel the role of potentially high-dimensional intermediate phenotypes in the relationship between environmental exposure and traits. Mediation analysis is an important tool for investigating such relationships. However, it has mainly focused on low-dimensional settings, and there is a lack of a good measure of the total mediation effect. Here, we extend an R-squared (Rsq) effect size measure, originally proposed in the single-mediator setting, to the moderate- and high-dimensional mediator settings in the mixed model framework. Based on extensive simulations, we compare our measure and estimation procedure with several frequently used mediation measures, including product, proportion, and ratio measures. Our Rsq measure has small bias and variance under the correctly specified model. To mitigate potential bias induced by non-mediators, we examine two variable selection procedures, i.e., iterative sure independence screening and false discovery rate control, to exclude the non-mediators. We evaluate the consistency of the proposed estimation procedures and introduce a resampling-based confidence interval. By applying the proposed estimation procedure, we find that more than half of the aging-related variations in systolic blood pressure can be explained by gene expression profiles in the Framingham Heart Study.

Download Full-text

Estimation of total mediation effect for high-dimensional omics mediators

BMC Bioinformatics ◽

10.1186/s12859-021-04322-1 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Tianzhong Yang ◽

Jingbo Niu ◽

Han Chen ◽

Peng Wei

Keyword(s):

Gene Expression ◽

Mixed Model ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Estimation Procedure ◽

Mediation Effect ◽

High Dimensional ◽

Model Framework ◽

Intermediate Phenotypes ◽

Age Related

Abstract Background Environmental exposures can regulate intermediate molecular phenotypes, such as gene expression, by different mechanisms and thereby lead to various health outcomes. It is of significant scientific interest to unravel the role of potentially high-dimensional intermediate phenotypes in the relationship between environmental exposure and traits. Mediation analysis is an important tool for investigating such relationships. However, it has mainly focused on low-dimensional settings, and there is a lack of a good measure of the total mediation effect. Here, we extend an R-squared (R$$^2$$ 2 ) effect size measure, originally proposed in the single-mediator setting, to the moderate- and high-dimensional mediator settings in the mixed model framework. Results Based on extensive simulations, we compare our measure and estimation procedure with several frequently used mediation measures, including product, proportion, and ratio measures. Our R$$^2$$ 2 -based second-moment measure has small bias and variance under the correctly specified model. To mitigate potential bias induced by non-mediators, we examine two variable selection procedures, i.e., iterative sure independence screening and false discovery rate control, to exclude the non-mediators. We establish the consistency of the proposed estimation procedures and introduce a resampling-based confidence interval. By applying the proposed estimation procedure, we found that 38% of the age-related variations in systolic blood pressure can be explained by gene expression profiles in the Framingham Heart Study of 1711 individuals. An R package “RsqMed” is available on CRAN. Conclusion R-squared (R$$^2$$ 2 ) is an effective and efficient measure for total mediation effect especially under high-dimensional setting.

Download Full-text

FC1000: normalized gene expression changes of systematically perturbed human cells

Statistical Applications in Genetics and Molecular Biology ◽

10.1515/sagmb-2016-0072 ◽

2017 ◽

Vol 16 (4) ◽

Cited By ~ 1

Author(s):

Ingrid M. Lönnstedt ◽

Sven Nelander

Keyword(s):

Gene Expression ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Estimation Procedure ◽

Human Cells ◽

Biomedical Data ◽

Transcriptional Responses ◽

Statistical Framework ◽

Statistical Measures ◽

Change Response

AbstractThe systematic study of transcriptional responses to genetic and chemical perturbations in human cells is still in its early stages. The largest available dataset to date is the newly released L1000 compendium. With its 1.3 million gene expression profiles of treated human cells it offers many opportunities for biomedical data mining, but also data normalization challenges of new dimensions. We developed a novel and practical approach to obtain accurate estimates of fold change response profiles from L1000, based on the RUV (Remove Unwanted Variation) statistical framework. Extending RUV to a big data setting, we propose an estimation procedure, in which an underlying RUV model is tuned by feedback through dataset specific statistical measures, reflecting

Download Full-text

263 USE OF PORCINE PARTHENOTES AND GENE EXPRESSION PROFILING USING MICROARRAYS FOR IDENTIFICATION OF IMPRINTED GENES

Reproduction Fertility and Development ◽

10.1071/rdv18n2ab263 ◽

2006 ◽

Vol 18 (2) ◽

pp. 239

Author(s):

J. Piedrahita ◽

S. Bischoff ◽

J. Estrada ◽

B. Freking ◽

D. Nonneman ◽

...

Keyword(s):

Gene Expression ◽

Candidate Genes ◽

Mixed Model ◽

Linear Mixed Model ◽

Expression Profiles ◽

Mammalian Species ◽

Polar Body ◽

Gene Expression Profiles ◽

Tissue Type ◽

Imprinted Genes

Genomic imprinting arises from differential epigenetic markings including DNA methylation and histone modifications and results in one allele being expressed in a parent-of-origin specific manner. For further insight into the porcine epigenome, gene expression profiles of parthenogenetic (PRT; two maternally derived chromosome sets) and biparental embryos (BP; one maternal and one paternal set of chromosomes) were compared using microarrays. Comparison of the expression profiles of the two tissue types permits identification of both maternally and paternally imprinted genes and thus the degree of conservation of imprinted genes between swine and other mammalian species. Diploid porcine parthenogenetic fetuses were generated using follicular oocytes (BOMED, Madison, WI, USA). Oocytes with a visible polar body were activated using a single square pulse of direct current of 50 V/mm for 100 �s and diploidized by culture in 10 �g/mL cycloheximide for 6 h to limit extrusion of the second polar body. Following culture, BP embryos obtained by natural matings, and PRT embryos, were surgically transferred to oviducts on the first day of estrus. Fetuses recovered at 28-30 days of gestation were dissected to separate viscera including brain, liver, and placenta; the visceral tissues were then flash-frozen in liquid nitrogen. Porcine fibroblast tissue was obtained from the remaining carcass by mincing, trypsinization, and plating cells in �-MEM. Total RNA was extracted from frozen tissue or cell culture using RNA Aqueous kit (Ambion, Austin, TX, USA) according to the manufacturer's protocol. Gene expression differences between BP and PRT tissues were determined using the GeneChip� Porcine Genome Array (Affymetrix, Santa Clara, CA) containing 23 256 transcripts from Sus scrofa and representing 42 genes known to be imprinted in human and/or mice. Triplicate arrays were utilized for each tissue type, and for PRT versus BP combination. Significant differential gene expression was identified by a linear mixed model analysis using SAS 5.0 (SAS Institute, Cary, NC, USA). Storey's q-value method was used to correct for multiple testing at q d 0.05. The following genes were classified as imprinted on the basis of their expression profiles: In fibroblasts, ARHI, HTR2A, MEST, NDN, NNAT, PEG3, PLAGL1, PEG10, SGCE, SNRPN, and UBE3A; in liver, IGF2, PEG3, PLAGL1, PEG10, and SNRPN; in placenta, HTR2A, IGF2, MEST, NDN, NNAT, PEG3, PLAGL1, PEG10, and SNRPN; and in brain, none. Additionally, several genes not known to be imprinted in humans/mice were highly differentially expressed between the two tissue types. Overall, utilizing the PRT models and gene expression profiles, we have identified thirteen genes where imprinting is conserved between swine and humans/mice, and several candidate genes that represent potentially imprinted genes. Presently, our efforts are focused in the identification of single nucleotide polymorphisms (SNPs) to more carefully evaluate the behavior of these genes in normal and abnormal gestations and to test whether the candidate genes are indeed imprinted. This research was supported by USDA-CSREES grant 524383 to J. P. and B. F.

Download Full-text

Estimating and testing the microbial causal mediation effect with high-dimensional and compositional microbiome data

Bioinformatics ◽

10.1093/bioinformatics/btz565 ◽

2019 ◽

Author(s):

Chan Wang ◽

Jiyuan Hu ◽

Martin J Blaser ◽

Huilin Li

Keyword(s):

Regression Model ◽

Association Studies ◽

Statistical Significance ◽

Mediation Effect ◽

Supplementary Information ◽

High Dimensional ◽

Model Framework ◽

Mediation Effects ◽

Causal Mediation ◽

Microbiome Data

Abstract Motivation Recent microbiome association studies have revealed important associations between microbiome and disease/health status. Such findings encourage scientists to dive deeper to uncover the causal role of microbiome in the underlying biological mechanism, and have led to applying statistical models to quantify causal microbiome effects and to identify the specific microbial agents. However, there are no existing causal mediation methods specifically designed to handle high dimensional and compositional microbiome data. Results We propose a rigorous Sparse Microbial Causal Mediation Model (SparseMCMM) specifically designed for the high dimensional and compositional microbiome data in a typical three-factor (treatment, microbiome and outcome) causal study design. In particular, linear log-contrast regression model and Dirichlet regression model are proposed to estimate the causal direct effect of treatment and the causal mediation effects of microbiome at both the community and individual taxon levels. Regularization techniques are used to perform the variable selection in the proposed model framework to identify signature causal microbes. Two hypothesis tests on the overall mediation effect are proposed and their statistical significance is estimated by permutation procedures. Extensive simulated scenarios show that SparseMCMM has excellent performance in estimation and hypothesis testing. Finally, we showcase the utility of the proposed SparseMCMM method in a study which the murine microbiome has been manipulated by providing a clear and sensible causal path among antibiotic treatment, microbiome composition and mouse weight. Availability and implementation https://sites.google.com/site/huilinli09/software and https://github.com/chanw0/SparseMCMM. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Diagnostic and prognostic prediction using gene expression profiles in high-dimensional microarray data

British Journal of Cancer ◽

10.1038/sj.bjc.6601326 ◽

2003 ◽

Vol 89 (9) ◽

pp. 1599-1604 ◽

Cited By ~ 113

Author(s):

R Simon

Keyword(s):

Gene Expression ◽

Microarray Data ◽

Expression Profiles ◽

Gene Expression Profiles ◽

High Dimensional ◽

Prognostic Prediction

Download Full-text

Reconstructing and analysing cellular states, space and time from gene expression profiles of many cells and single cells

Molecular BioSystems ◽

10.1039/c5mb00339c ◽

2015 ◽

Vol 11 (10) ◽

pp. 2690-2698 ◽

Cited By ~ 2

Author(s):

Mirko Francesconi ◽

Ben Lehner

Keyword(s):

Gene Expression ◽

Genetic Variation ◽

Expression Profiling ◽

Biological Sample ◽

Expression Profiles ◽

Single Cells ◽

Gene Expression Profiles ◽

High Dimensional ◽

Dimensional Measurement ◽

P Gene

Gene expression profiling is a fast, cheap and standardised analysis that provides a high dimensional measurement of the state of a biological sample, including of single cells. Computational methods to reconstruct the composition of samples and spatial and temporal information from expression profiles are described, as well as how they can be used to describe the effects of genetic variation.

Download Full-text

Composite measurements and molecular compressed sensing for highly efficient transcriptomics

10.1101/091926 ◽

2017 ◽

Cited By ~ 4

Author(s):

Brian Cleary ◽

Le Cong ◽

Eric S. Lander ◽

Aviv Regev

Keyword(s):

Gene Expression ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Training Data ◽

Modular Structure ◽

High Dimensional ◽

Rna Profiling ◽

Massive Scale ◽

Random Composite ◽

Modular Structures

AbstractRNA profiling is an excellent phenotype of cellular responses and tissue states, but can be costly to generate at the massive scale required for studies of regulatory circuits, genetic states or perturbation screens. Here, we draw on a series of advances over the last decade in the field of mathematics to establish a rigorous link between biological structure, data compressibility, and efficient data acquisition. We propose that very few random composite measurements – in which gene abundances are combined in a random linear combination – are needed to approximate the high-dimensional similarity between any pair of gene abundance profiles. We then show how finding latent, sparse representations of gene expression data would enable us to “decompress” a small number of random composite measurements and recover high-dimensional gene expression levels that were not measured (unobserved). We present a new algorithm for finding sparse, modular structure, which improves the ability to interpret samples in terms of small numbers of active modules, and show that the modular structure we find is sufficient to recover gene expression profiles from composite measurements (with ~100-fold fewer composite measurements than genes). Moreover, the knowledge that sparse, modular structures exist allows us to recover expression profiles from composite measurements, even without access to any training data. Finally, we present a proof-of-concept experiment for making composite measurements in the laboratory, involving the measurement of linear combinations of RNA abundances. Altogether, our results suggest new compressive modalities in experimental biology that can form a foundation for massive scaling in high-throughput measurements, while also offering new insights into the interpretation of high-dimensional data.

Download Full-text

Item response theory modeling for microarray gene expression data

Advances in Methodology and Statistics ◽

10.51936/mpqj3248 ◽

2009 ◽

Vol 6 (1) ◽

Author(s):

Andrej Kastrin

Keyword(s):

Gene Expression ◽

Item Response ◽

Microarray Data ◽

Latent Variables ◽

Latent Variable ◽

Expression Profiles ◽

Gene Expression Profiles ◽

High Dimensional ◽

Variable Model ◽

Cluster Partition

The high dimensionality of global gene expression profiles, where number of variables (genes) is very large compared to the number of observations (samples), presents challenges that affect generalizability and applicability of microarray analysis. Latent variable modeling offers a promising approach to deal with high-dimensional microarray data. The latent variable model is based on a few latent variables that capture most of the gene expression information. Here, we describe how to accomplish a reduction in dimension by a latent variable methodology, which can greatly reduce the number of features used to characterize microarray data. We propose a general latent variable framework for prediction of predefined classes of samples using gene expression profiles from microarray experiments. The framework consists of (i) selection of smaller number of genes that are most differentially expressed between samples, (ii) dimension reduction using hierarchical clustering, where each cluster partition is identified as latent variable, (iii) discretization of gene expression matrix, (iv) fitting the Rasch item response model for genes in each cluster partition to estimate the expression of latent variable, and (v) construction of prediction model with latent variables as covariates to study the relationship between latent variables and phenotype. Two different microarray data sets are used to illustrate a general framework of the approach. We show that the predictive performance of our method is comparable to the current best approach based on an all-gene space. The method is general and can be applied to the other high-dimensional data problems.

Download Full-text

Abstract P420: Dietary Fat on Whole Blood Gene Expression and Plasma Lipids in the Framingham Heart Study

Circulation ◽

10.1161/circ.129.suppl_1.p420 ◽

2014 ◽

Vol 129 (suppl_1) ◽

Author(s):

Michael M Mendelson ◽

Brian Chen ◽

Chunyu Liu ◽

Roby Joehanes ◽

Peter Munson ◽

...

Keyword(s):

Gene Expression ◽

Fatty Acid ◽

Dietary Fat ◽

Whole Blood ◽

Framingham Heart Study ◽

Plasma Lipids ◽

Cholesterol Metabolism ◽

Pufa Intake ◽

Heart Study ◽

Abca1 Expression

Objective: To describe the influence of type of dietary fat on the activity of metabolic pathways, as measured by gene expression profiles, and the relation to plasma lipids. Background: Metabolic studies have demonstrated strong associations of dietary fatty acid composition with plasma lipids. Relatively little is known about the pathways and gene expression changes that mediate these relationships in the general population. Methods: We analyzed self-reported dietary intake of fatty acids, plasma lipid levels, and genome-wide gene expression data from Framingham Heart Study Offspring and Third Generation cohort participants. We excluded participants on lipid therapy. Multivariable linear regression models were conducted with plasma lipids as separate outcomes, energy-adjusted residuals of dietary fats as predictors, and adjustment for clinical and dietary covariates. Normalized gene expression from whole blood derived RNA was similarly modeled with additional adjustment for cell count and batch effects. Results: Among 3681 participants, higher polyunsaturated fatty acid (PUFA) intake is associated with lower LDL-C (estimated β [regression coefficient] = -0.5, p=0.002), higher HDL-C (β= 0.4, p<0.0001), and lower triglyceride (β= -0.009, p=0.0003) concentrations after adjustment for age, sex, carbohydrate, protein, and alcohol intake. Higher PUFA intake was associated with differential gene expression of cholesterol efflux transporters (ABCA1/ABCG1), LDL receptor degrader (IDOL), and non-lipoprotein metabolism related transcripts (FDR < 0.05). In contrast, higher saturated fat intake (SFA) showed inverse associations with ABCA1 expression levels and HDL cholesterol ( Figure 1 ). Conclusions: Higher PUFA intake is associated with a less atherogenic lipid profile and higher ABCA1 expression with inverse associations for higher SFA intake. Gene expression analysis reveals important links between dietary fat type, specific cholesterol metabolism pathways, and lipids in a community cohort.

Download Full-text

Associating multiple longitudinal traits with high-dimensional single-nucleotide polymorphism data: application to the Framingham Heart Study

BMC Proceedings ◽

10.1186/1753-6561-3-s7-s47 ◽

2009 ◽

Vol 3 (S7) ◽

Cited By ~ 1

Author(s):

Sandra Waaijenborg ◽

Aeilko H Zwinderman

Keyword(s):

Single Nucleotide Polymorphism ◽

Framingham Heart Study ◽

Single Nucleotide Polymorphism Data ◽

High Dimensional ◽

Nucleotide Polymorphism ◽

Single Nucleotide ◽

Data Application ◽

Heart Study ◽

Polymorphism Data

Download Full-text