Accurate ethnicity prediction from placental DNA methylation data

Mapping Intimacies ◽

10.1101/618470 ◽

2019 ◽

Author(s):

Victor Yuan ◽

E Magda Price ◽

Giulia F Del Gobbo ◽

Sara Mostafavi ◽

Brian Cox ◽

...

Keyword(s):

Dna Methylation ◽

Population Stratification ◽

Association Studies ◽

R Package ◽

Continuous Variable ◽

Methylation Data ◽

Ancestry Informative Markers ◽

Genome Wide ◽

Highly Correlated ◽

Ethnicity Classification

ABSTRACTBackgroundThe influence of genetics on variation in DNA methylation (DNAme) is well documented. Yet confounding from population stratification is often unaccounted for in DNAme association studies. Existing approaches to address confounding by population stratification using DNAme data may not generalize to populations or tissues outside those in which they were developed. To aid future placental DNAme studies in assessing population stratification, we developed an ethnicity classifier, PlaNET (Placental DNAme Elastic Net Ethnicity Tool), using five cohorts with Infinium Human Methylation 450k BeadChip array (HM450k) data from placental samples that is also compatible with the newer EPIC platform.ResultsData from 509 placental samples was used to develop PlaNET and show that it accurately predicts (accuracy = 0.938, kappa = 0.823) major classes of self-reported ethnicity/race (African: n = 58, Asian: n = 53, Caucasian: n = 389), and produces ethnicity probabilities that are highly correlated with genetic ancestry inferred from genome-wide SNP arrays (>2.5 million SNP) and ancestry informative markers (n = 50 SNPs). PlaNET’s ethnicity classification relies on 1860 HM450K microarray sites, and over half of these were linked to nearby genetic polymorphisms (n = 955). Our placental-optimized method outperforms existing approaches in assessing population stratification in placental samples from individuals of Asian, African, and Caucasian ethnicities.ConclusionPlaNET provides an improved approach to address population stratification in placental DNAme association studies. The method can be applied to predict ethnicity as a discrete or continuous variable and will be especially useful when self-reported ethnicity information is missing and genotyping markers are unavailable. PlaNET is available as an R package at (https://github.com/wvictor14/planet).

Download Full-text

Measuring epigenetics as the mediator of gene/environment interactions in DOHaD

Journal of Developmental Origins of Health and Disease ◽

10.1017/s2040174414000506 ◽

2014 ◽

Vol 6 (1) ◽

pp. 10-16 ◽

Cited By ~ 13

Author(s):

M.-L. Ong ◽

X. Lin ◽

J. D. Holbrook

Keyword(s):

Dna Methylation ◽

Association Studies ◽

Mediating Effect ◽

Genome Wide Association Studies ◽

Methylation Data ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Gene Environment ◽

Genome Wide ◽

Sources Of Variation

Analysis of DNA methylation data in epigenome-wide association studies provides many bioinformatics and statistical challenges. Not least of these, are the non-independence of individual DNA methylation marks from each other, from genotype and from technical sources of variation. In this review we discuss DNA methylation data from the Infinium450K array and processing methodologies to reduce technical variation. We describe recent approaches to harness the concordance of neighbouring DNA methylation values to improve power in association studies. We also describe how the non-independence of genotype and DNA methylation has been used to infer causality (in the case of Mendelian randomization approaches); suggest the mediating effect of DNA methylation in linking intergenic single nucleotide polymorphisms, identified in genome-wide association studies, to phenotype; and to uncover the widespread influence of gene and environment interactions on methylation levels.

Download Full-text

Genome-wide association studies identify 137 genetic loci for DNA methylation biomarkers of aging

Genome Biology ◽

10.1186/s13059-021-02398-9 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Daniel L. McCartney ◽

Josine L. Min ◽

Rebecca C. Richmond ◽

Ake T. Lu ◽

Maria K. Sobczyk ◽

...

Keyword(s):

Dna Methylation ◽

Genome Wide Association Study ◽

Association Studies ◽

Genome Wide Association ◽

Biological Aging ◽

Genome Wide Association Studies ◽

Genetic Loci ◽

Biomarkers Of Aging ◽

Genome Wide ◽

Shared Genetic

Abstract Background Biological aging estimators derived from DNA methylation data are heritable and correlate with morbidity and mortality. Consequently, identification of genetic and environmental contributors to the variation in these measures in populations has become a major goal in the field. Results Leveraging DNA methylation and SNP data from more than 40,000 individuals, we identify 137 genome-wide significant loci, of which 113 are novel, from genome-wide association study (GWAS) meta-analyses of four epigenetic clocks and epigenetic surrogate markers for granulocyte proportions and plasminogen activator inhibitor 1 levels, respectively. We find evidence for shared genetic loci associated with the Horvath clock and expression of transcripts encoding genes linked to lipid metabolism and immune function. Notably, these loci are independent of those reported to regulate DNA methylation levels at constituent clock CpGs. A polygenic score for GrimAge acceleration showed strong associations with adiposity-related traits, educational attainment, parental longevity, and C-reactive protein levels. Conclusion This study illuminates the genetic architecture underlying epigenetic aging and its shared genetic contributions with lifestyle factors and longevity.

Download Full-text

Multi-species and multi-tissue methylation clocks for age estimation in toothed whales and dolphins

Communications Biology ◽

10.1038/s42003-021-02179-x ◽

2021 ◽

Vol 4 (1) ◽

Author(s):

Todd R. Robeck ◽

Zhe Fei ◽

Ake T. Lu ◽

Amin Haghani ◽

Eve Jourdain ◽

...

Keyword(s):

Dna Methylation ◽

Age Estimation ◽

Species Conservation ◽

Genome Wide ◽

Epigenetic Aging ◽

Highly Correlated ◽

Methylation Patterns ◽

Cg Dinucleotides ◽

Free Ranging ◽

Unique Species

AbstractThe development of a precise blood or skin tissue DNA Epigenetic Aging Clock for Odontocete (OEAC) would solve current age estimation inaccuracies for wild odontocetes. Therefore, we determined genome-wide DNA methylation profiles using a custom array (HorvathMammalMethyl40) across skin and blood samples (n = 446) from known age animals representing nine odontocete species within 4 phylogenetic families to identify age associated CG dinucleotides (CpGs). The top CpGs were used to create a cross-validated OEAC clock which was highly correlated for individuals (r = 0.94) and for unique species (median r = 0.93). Finally, we applied the OEAC for estimating the age and sex of 22 wild Norwegian killer whales. DNA methylation patterns of age associated CpGs are highly conserved across odontocetes. These similarities allowed us to develop an odontocete epigenetic aging clock (OEAC) which can be used for species conservation efforts by provide a mechanism for estimating the age of free ranging odontocetes from either blood or skin samples.

Download Full-text

Gene set enrichment analysis for genome-wide DNA methylation data

Genome Biology ◽

10.1186/s13059-021-02388-x ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Jovana Maksimovic ◽

Alicia Oshlack ◽

Belinda Phipson

Keyword(s):

Dna Methylation ◽

Enrichment Analysis ◽

R Package ◽

Gene Set Enrichment Analysis ◽

Methylation Array ◽

Gene Set ◽

Genome Wide ◽

Genome Methylation ◽

Unbiased Gene ◽

Gene Set Testing

AbstractDNA methylation is one of the most commonly studied epigenetic marks, due to its role in disease and development. Illumina methylation arrays have been extensively used to measure methylation across the human genome. Methylation array analysis has primarily focused on preprocessing, normalization, and identification of differentially methylated CpGs and regions. GOmeth and GOregion are new methods for performing unbiased gene set testing following differential methylation analysis. Benchmarking analyses demonstrate GOmeth outperforms other approaches, and GOregion is the first method for gene set testing of differentially methylated regions. Both methods are publicly available in the missMethyl Bioconductor R package.

Download Full-text

Evaluation of affinity-based genome-wide DNA methylation data: Effects of CpG density, amplification bias, and copy number variation

Genome Research ◽

10.1101/gr.110601.110 ◽

2010 ◽

Vol 20 (12) ◽

pp. 1719-1729 ◽

Cited By ~ 92

Author(s):

M. D. Robinson ◽

C. Stirzaker ◽

A. L. Statham ◽

M. W. Coolen ◽

J. Z. Song ◽

...

Keyword(s):

Dna Methylation ◽

Copy Number Variation ◽

Copy Number ◽

Methylation Data ◽

Amplification Bias ◽

Genome Wide ◽

Number Variation

Download Full-text

methylKit: a comprehensive R package for the analysis of genome-wide DNA methylation profiles

Genome Biology ◽

10.1186/gb-2012-13-10-r87 ◽

2012 ◽

Vol 13 (10) ◽

pp. R87 ◽

Cited By ~ 696

Author(s):

Altuna Akalin ◽

Matthias Kormaksson ◽

Sheng Li ◽

Francine E Garrett-Bakelman ◽

Maria E Figueroa ◽

...

Keyword(s):

Dna Methylation ◽

R Package ◽

Genome Wide

Download Full-text

A practical approach to adjusting for population stratification in genome-wide association studies: principal components and propensity scores (PCAPS)

Statistical Applications in Genetics and Molecular Biology ◽

10.1515/sagmb-2017-0054 ◽

2018 ◽

Vol 17 (6) ◽

Cited By ~ 2

Author(s):

Huaqing Zhao ◽

Nandita Mitra ◽

Peter A. Kanetsky ◽

Katherine L. Nathanson ◽

Timothy R. Rebbeck

Keyword(s):

Principal Components ◽

Population Stratification ◽

Propensity Scores ◽

Association Studies ◽

Germ Cell Tumors ◽

Gwas Data ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Testicular Germ Cell ◽

Genome Wide

Abstract Genome-wide association studies (GWAS) are susceptible to bias due to population stratification (PS). The most widely used method to correct bias due to PS is principal components (PCs) analysis (PCA), but there is no objective method to guide which PCs to include as covariates. Often, the ten PCs with the highest eigenvalues are included to adjust for PS. This selection is arbitrary, and patterns of local linkage disequilibrium may affect PCA corrections. To address these limitations, we estimate genomic propensity scores based on all statistically significant PCs selected by the Tracy-Widom (TW) statistic. We compare a principal components and propensity scores (PCAPS) approach to PCA and EMMAX using simulated GWAS data under no, moderate, and severe PS. PCAPS reduced spurious genetic associations regardless of the degree of PS, resulting in odds ratio (OR) estimates closer to the true OR. We illustrate our PCAPS method using GWAS data from a study of testicular germ cell tumors. PCAPS provided a more conservative adjustment than PCA. Advantages of the PCAPS approach include reduction of bias compared to PCA, consistent selection of propensity scores to adjust for PS, the potential ability to handle outliers, and ease of implementation using existing software packages.

Download Full-text

Meffil: efficient normalisation and analysis of very large DNA methylation samples

10.1101/125963 ◽

2017 ◽

Cited By ~ 17

Author(s):

Josine Min ◽

Gibran Hemani ◽

George Davey Smith ◽

Caroline Relton ◽

Matthew Suderman

Keyword(s):

Dna Methylation ◽

Association Studies ◽

R Package ◽

Individual Level ◽

Technological Advances ◽

Level Data ◽

Fixed And Random Effects ◽

R Packages ◽

Meta Analyses ◽

Dramatic Growth

AbstractBackgroundTechnological advances in high throughput DNA methylation microarrays have allowed dramatic growth of a new branch of epigenetic epidemiology. DNA methylation datasets are growing ever larger in terms of the number of samples profiled, the extent of genome coverage, and the number of studies being meta-analysed. Novel computational solutions are required to efficiently handle these data.MethodsWe have developed meffil, an R package designed to quality control, normalize and perform epigenome-wide association studies (EWAS) efficiently on large samples of Illumina Infinium HumanMethylation450 and MethylationEPIC BeadChip microarrays. We tested meffil by applying it to 6000 450k microarrays generated from blood collected for two different datasets, Accessible Resource for Integrative Epigenomic Studies (ARIES) and The Genetics of Overweight Young Adults (GOYA) study.ResultsA complete reimplementation of functional normalization minimizes computational memory requirements to 5% of that required by other R packages, without increasing running time. Incorporating fixed and random effects alongside functional normalization, and automated estimation of functional normalisation parameters reduces technical variation in DNA methylation levels, thus reducing false positive associations and improving power. We also demonstrate that the ability to normalize datasets distributed across physically different locations without sharing any biologically-based individual-level data may reduce heterogeneity in meta-analyses of epigenome-wide association studies. However, we show that when batch is perfectly confounded with cases and controls functional normalization is unable to prevent spurious associations.Conclusionsmeffil is available online (https://github.com/perishky/meffil/) along with tutorials covering typical use cases.

Download Full-text

Genome-Wide Association Studies of 39 Seed Yield-Related Traits in Sesame (Sesamum indicum L.)

International Journal of Molecular Sciences ◽

10.3390/ijms19092794 ◽

2018 ◽

Vol 19 (9) ◽

pp. 2794 ◽

Cited By ~ 5

Author(s):

Rong Zhou ◽

Komivi Dossa ◽

Donghua Li ◽

Jingyin Yu ◽

Jun You ◽

...

Keyword(s):

Candidate Genes ◽

Seed Yield ◽

Association Studies ◽

Oil Quality ◽

Genome Wide Association ◽

Oilseed Crop ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Highly Correlated ◽

Capsule Size

Sesame is poised to become a major oilseed crop owing to its high oil quality and adaptation to various ecological areas. However, the seed yield of sesame is very low and the underlying genetic basis is still elusive. Here, we performed genome-wide association studies of 39 seed yield-related traits categorized into five major trait groups, in three different environments, using 705 diverse lines. Extensive variation was observed for the traits with capsule size, capsule number and seed size-related traits, found to be highly correlated with seed yield indexes. In total, 646 loci were significantly associated with the 39 traits (p < 10−7) and resolved to 547 quantitative trait loci QTLs. We identified six multi-environment QTLs and 76 pleiotropic QTLs associated with two to five different traits. By analyzing the candidate genes for the assayed traits, we retrieved 48 potential genes containing significant functional loci. Several homologs of these candidate genes in Arabidopsis are described to be involved in seed or biomass formation. However, we also identified novel candidate genes, such as SiLPT3 and SiACS8, which may control capsule length and capsule number traits. Altogether, we provided the highly-anticipated basis for research on genetics and functional genomics towards seed yield improvement in sesame.

Download Full-text

Integration of genome-wide mRNA and miRNA expression, and DNA methylation data of three cell lines exposed to ten carbon nanomaterials

Data in Brief ◽

10.1016/j.dib.2018.05.107 ◽

2018 ◽

Vol 19 ◽

pp. 1046-1057 ◽

Cited By ~ 2

Author(s):

Giovanni Scala ◽

Veer Marwah ◽

Pia Kinaret ◽

Jukka Sund ◽

Vittorio Fortino ◽

...

Keyword(s):

Dna Methylation ◽

Cell Lines ◽

Mirna Expression ◽

Carbon Nanomaterials ◽

Methylation Data ◽

Genome Wide

Download Full-text