scholarly journals Beta regression improves the detection of differential DNA methylation for epigenetic epidemiology

2016 ◽  
Author(s):  
Timothy J. Triche ◽  
Peter W. Laird ◽  
Kimberly D. Siegmund

AbstractBackgroundDNA methylation is the most readily assayed epigenetic mark, possessing confirmed relationships with gene expression, imprinting, and chromatin accessibility.Given the increasingly widespread use of DNA methylation microarrays in population-scale epidemiological applications, we sought to determine which methods provided the greatest statistical power to reproducibly detect differences in DNA methylation across various conditions,using publicly available data sets on tissue type and aging.ResultsBeta regression, as proposed originally by Ferrari and Cribari-Neto, yielded more validated hits in each of our comparisons than any other method under consideration, both in a regression setting and in comparisons to two-group tests such as the Wilcoxon-Mann-Whitney, Student t, and Welch t tests.In large cohorts of whole blood samples, we corrected for compositional differences and batch effects, and found that marginal likelihood ratio tests from beta regression models uniformly dominate popular alternatives based on linear models.The superior sensitivity and specificity exhibited by beta regression in epidemiologically relevant cohort sizes corresponded to approximately a 2% increase in sensitivity at the same specificity when compared to linear models fitted on raw beta values (proportion of signal intensity due to the methylated allele), M-values, or rankquantile normalized values.ConclusionsInvestigators should consider beta regression to maximize statistical power in studies of DNA methylation using microarrays.At epidemiologically relevant sample sizes, with typical quality control procedures (compositional and batch effect correction), cross-cohort agreement uniformly favors beta regression over popular alternatives.

2019 ◽  
Author(s):  
Lara Nonell ◽  
Juan R González

AbstractDNA methylation plays an important role in the development and progression of disease. Beta-values are the standard methylation measures. Different statistical methods have been proposed to assess differences in methylation between conditions. However, most of them do not completely account for the distribution of beta-values. The simplex distribution can accommodate beta-values data. We hypothesize that simplex is a quite flexible distribution which is able to model methylation data.To test our hypothesis, we conducted several analyses using four real data sets obtained from microarrays and sequencing technologies. Standard data distributions were studied and modelled in comparison to the simplex. Besides, some simulations were conducted in different scenarios encompassing several distribution assumptions, regression models and sample sizes. Finally, we compared DNA methylation between females and males in order to benchmark the assessed methodologies under different scenarios.According to the results obtained by the simulations and real data analyses, DNA methylation data are concordant with the simplex distribution in many situations. Simplex regression models work well in small sample size data sets. However, when sample size increases, other models such as the beta regression or even the linear regression can be employed to assess group comparisons and obtain unbiased results. Based on these results, we can provide some practical recommendations when analyzing methylation data: 1) use data sets of at least 10 samples per studied condition for microarray data sets or 30 in NGS data sets, 2) apply a simplex or beta regression model for microarray data, 3) apply a linear model in any other case.


Author(s):  
Richard Meier ◽  
Emily Nissen ◽  
Devin C. Koestler

Abstract Statistical methods that allow for cell type specific DNA methylation (DNAm) analyses based on bulk-tissue methylation data have great potential to improve our understanding of human disease and have created unprecedented opportunities for new insights using the wealth of publicly available bulk-tissue methylation data. These methodologies involve incorporating interaction terms formed between the phenotypes/exposures of interest and proportions of the cell types underlying the bulk-tissue sample used for DNAm profiling. Despite growing interest in such “interaction-based” methods, there has been no comprehensive assessment how variability in the cellular landscape across study samples affects their performance. To answer this question, we used numerous publicly available whole-blood DNAm data sets along with extensive simulation studies and evaluated the performance of interaction-based approaches in detecting cell-specific methylation effects. Our results show that low cell proportion variability results in large estimation error and low statistical power for detecting cell-specific effects of DNAm. Further, we identified that many studies targeting methylation profiling in whole-blood may be at risk to be underpowered due to low variability in the cellular landscape across study samples. Finally, we discuss guidelines for researchers seeking to conduct studies utilizing interaction-based approaches to help ensure that their studies are adequately powered.


2010 ◽  
Vol 62 (4) ◽  
pp. 875-882 ◽  
Author(s):  
A. Dembélé ◽  
J.-L. Bertrand-Krajewski ◽  
B. Barillon

Regression models are among the most frequently used models to estimate pollutants event mean concentrations (EMC) in wet weather discharges in urban catchments. Two main questions dealing with the calibration of EMC regression models are investigated: i) the sensitivity of models to the size and the content of data sets used for their calibration, ii) the change of modelling results when models are re-calibrated when data sets grow and change with time when new experimental data are collected. Based on an experimental data set of 64 rain events monitored in a densely urbanised catchment, four TSS EMC regression models (two log-linear and two linear models) with two or three explanatory variables have been derived and analysed. Model calibration with the iterative re-weighted least squares method is less sensitive and leads to more robust results than the ordinary least squares method. Three calibration options have been investigated: two options accounting for the chronological order of the observations, one option using random samples of events from the whole available data set. Results obtained with the best performing non linear model clearly indicate that the model is highly sensitive to the size and the content of the data set used for its calibration.


PLoS ONE ◽  
2012 ◽  
Vol 7 (12) ◽  
pp. e50471 ◽  
Author(s):  
Wei Jie Seow ◽  
Angela Cecilia Pesatori ◽  
Emmanuel Dimont ◽  
Peter B. Farmer ◽  
Benedetta Albetti ◽  
...  

2021 ◽  
pp. 1-36
Author(s):  
Henry Prakken ◽  
Rosa Ratsma

This paper proposes a formal top-level model of explaining the outputs of machine-learning-based decision-making applications and evaluates it experimentally with three data sets. The model draws on AI & law research on argumentation with cases, which models how lawyers draw analogies to past cases and discuss their relevant similarities and differences in terms of relevant factors and dimensions in the problem domain. A case-based approach is natural since the input data of machine-learning applications can be seen as cases. While the approach is motivated by legal decision making, it also applies to other kinds of decision making, such as commercial decisions about loan applications or employee hiring, as long as the outcome is binary and the input conforms to this paper’s factor- or dimension format. The model is top-level in that it can be extended with more refined accounts of similarities and differences between cases. It is shown to overcome several limitations of similar argumentation-based explanation models, which only have binary features and do not represent the tendency of features towards particular outcomes. The results of the experimental evaluation studies indicate that the model may be feasible in practice, but that further development and experimentation is needed to confirm its usefulness as an explanation model. Main challenges here are selecting from a large number of possible explanations, reducing the number of features in the explanations and adding more meaningful information to them. It also remains to be investigated how suitable our approach is for explaining non-linear models.


2017 ◽  
Vol 121 (suppl_1) ◽  
Author(s):  
Mark E Pepin ◽  
David K Crossman ◽  
Joseph P Barchue ◽  
Salpy V Pamboukian ◽  
Steven M Pogwizd ◽  
...  

To identify the role of glucose in the development of diabetic cardiomyopathy, we had directly assessed glucose delivery to the intact heart on alterations of DNA methylation and gene expression using both an inducible heart-specific transgene (glucose transporter 4; mG4H) and streptozotocin-induced diabetes (STZ) mouse models. We aimed to determine whether long-lasting diabetic complications arise from prior transient exposure to hyperglycemia via a process termed “glycemic memory.” We had identified DNA methylation changes associated with significant gene expression regulation. Comparing our results from STZ, mG4H, and the modifications which persist following transgene silencing, we now provide evidence for cardiac DNA methylation as a persistent epigenetic mark contributing to glycemic memory. To begin to determine which changes contribute to human heart failure, we measured both RNA transcript levels and whole-genome DNA methylation in heart failure biopsy samples (n = 12) from male patients collected at left ventricular assist device placement using RNA-sequencing and Methylation450 assay, respectively. We hypothesized that epigenetic changes such as DNA methylation distinguish between heart failure etiologies. Our findings demonstrated that type 2 diabetic heart failure patients (n = 6) had an overall signature of hypomethylation, whereas patients listed as ischemic (n = 5) had a distinct hypermethylation signature for regulated transcripts. The focus of this initial analysis was on promoter-associated CpG islands with inverse changes in gene transcript levels, from which diabetes (14 genes; e.g. IGFBP4) and ischemic (12 genes; e.g. PFKFB3) specific targets emerged with significant regulation of both measures. By combining our mouse and human molecular analyses, we provide evidence that diabetes mellitus governs direct regulation of cellular function by DNA methylation and the corresponding gene expression in diabetic mouse and human hearts. Importantly, many of the changes seen in either mouse type 1 diabetes or human type 2 diabetes were similar supporting a consistent mechanism of regulation. These studies are some of the first steps at defining mechanisms of epigenetic regulation in diabetic cardiomyopathy.


2021 ◽  
Vol 67 (1) ◽  
Author(s):  
Jamaji C Nwanaji-Enwerem ◽  
Lars Van Der Laan ◽  
Elorm F Avakame ◽  
Kristan A Scott ◽  
Heather H Burris ◽  
...  

ABSTRACT Background Zika virus (ZIKV)-associated congenital microcephaly is an important contributor to pediatric death, and more robust pediatric mortality risk metrics are needed to help guide life plans and clinical decision making for these patients. Although common etiologies of pediatric and adult mortality differ, early life health can impact adult outcomes—potentially through DNA methylation. Hence, in this pilot study, we take an early step in identifying pediatric mortality risk metrics by examining associations of ZIKV infection and associated congenital microcephaly with existing adult DNA methylation-based mortality biomarkers: GrimAge and Zhang’s mortality score (ZMS). Methods Mortality measures were calculated from previously published HumanMethylationEPIC BeadChip data from 44 Brazilian children aged 5–40 months (18 with ZIKV-associated microcephaly; 7 normocephalic, exposed to ZIKV in utero; and 19 unexposed controls). We used linear models adjusted for chronological age, sex, methylation batch and white blood cell proportions to evaluate ZIKV and mortality marker relationships. Results We observed significant decreases in GrimAge-component plasminogen activator inhibitor-1 [PAI-1; β = −2453.06 pg/ml, 95% confidence interval (CI) −3652.96, −1253.16, p = 0.0002], and ZMS-site cg14975410 methylation (β = −0.06, 95% CI −0.09, −0.03, p = 0.0003) among children with microcephaly compared to controls. PAI-1 (β = −2448.70 pg/ml, 95% CI −4384.45, −512.95, p = 0.01) and cg14975410 (β = 0.01, 95% CI −0.04, 0.06, p = 0.64) results in comparisons of normocephalic, ZIKV-exposed children to controls were not statistically significant. Conclusion Our results suggest that elements of previously-identified adult epigenetic markers of mortality risk are associated with ZIKV-associated microcephaly, a known contributor to pediatric mortality risk. These findings may provide insights for efforts aimed at developing pediatric mortality markers.


2018 ◽  
Vol 620 ◽  
pp. A168 ◽  
Author(s):  
G. Valle ◽  
M. Dell’Omodarme ◽  
P. G. Prada Moroni ◽  
S. Degl’Innocenti

Aims. We aim to perform a theoretical investigation on the direct impact of measurement errors in the observational constraints on the recovered age for stars in main sequence (MS) and red giant branch (RGB) phases. We assumed that a mix of classical (effective temperature Teff and metallicity [Fe/H]) and asteroseismic (Δν and νmax) constraints were available for the objects. Methods. Artificial stars were sampled from a reference isochrone and subjected to random Gaussian perturbation in their observational constraints to simulate observational errors. The ages of these synthetic objects were then recovered by means of a Monte Carlo Markov chains approach over a grid of pre-computed stellar models. To account for observational uncertainties the grid covers different values of initial helium abundance and mixing-length parameter, that act as nuisance parameters in the age estimation. Results. The obtained differences between the recovered and true ages were modelled against the errors in the observables. This procedure was performed by means of linear models and projection pursuit regression models. The first class of statistical models provides an easily generalizable result, whose robustness is checked with the second method. From linear models we find that no age error source dominates in all the evolutionary phases. Assuming typical observational uncertainties, for MS the most important error source in the reconstructed age is the effective temperature of the star. An offset of 75 K accounts for an underestimation of the stellar age from 0.4 to 0.6 Gyr for initial and terminal MS. An error of 2.5% in νmax resulted the second most important source of uncertainty accounting for about −0.3 Gyr. The 0.1 dex error in [Fe/H] resulted particularly important only at the end of the MS, producing an age error of −0.4 Gyr. For the RGB phase the dominant source of uncertainty is νmax, causing an underestimation of about 0.6 Gyr; the offset in the effective temperature and Δν caused respectively an underestimation and overestimation of 0.3 Gyr. We find that the inference from the linear model is a good proxy for that from projection pursuit regression models. Therefore, inference from linear models can be safely used thanks to its broader generalizability. Finally, we explored the impact on age estimates of adding the luminosity to the previously discussed observational constraints. To this purpose, we assumed – for computational reasons – a 2.5% error in luminosity, much lower than the average error in the Gaia DR2 catalogue. However, even in this optimistic case, the addition of the luminosity does not increase precision of age estimates. Moreover, the luminosity resulted as a major contributor to the variability in the estimated ages, accounting for an error of about −0.3 Gyr in the explored evolutionary phases.


2021 ◽  
Author(s):  
A. Haghani ◽  
A.T. Lu ◽  
C.Z. Li ◽  
T.R. Robeck ◽  
K. Belov ◽  
...  

SummaryEpigenetics has hitherto been studied and understood largely at the level of individual organisms. Here, we report a multi-faceted investigation of DNA methylation across 11,117 samples from 176 different species. We performed an unbiased clustering of individual cytosines into 55 modules and identified 31 modules related to primary traits including age, species lifespan, sex, adult species weight, tissue type and phylogenetic order. Analysis of the correlation between DNA methylation and species allowed us to construct phyloepigenetic trees for different tissues that parallel the phylogenetic tree. In addition, while some stable cytosines reflect phylogenetic signatures, others relate to age and lifespan, and in many cases responding to anti-aging interventions in mice such as caloric restriction and ablation of growth hormone receptors. Insights uncovered by this investigation have important implications for our understanding of the role of epigenetics in mammalian evolution, aging and lifespan.


Sign in / Sign up

Export Citation Format

Share Document