scholarly journals Accurate ethnicity prediction from placental DNA methylation data

2019 ◽  
Author(s):  
Victor Yuan ◽  
E Magda Price ◽  
Giulia F Del Gobbo ◽  
Sara Mostafavi ◽  
Brian Cox ◽  
...  

ABSTRACTBackgroundThe influence of genetics on variation in DNA methylation (DNAme) is well documented. Yet confounding from population stratification is often unaccounted for in DNAme association studies. Existing approaches to address confounding by population stratification using DNAme data may not generalize to populations or tissues outside those in which they were developed. To aid future placental DNAme studies in assessing population stratification, we developed an ethnicity classifier, PlaNET (Placental DNAme Elastic Net Ethnicity Tool), using five cohorts with Infinium Human Methylation 450k BeadChip array (HM450k) data from placental samples that is also compatible with the newer EPIC platform.ResultsData from 509 placental samples was used to develop PlaNET and show that it accurately predicts (accuracy = 0.938, kappa = 0.823) major classes of self-reported ethnicity/race (African: n = 58, Asian: n = 53, Caucasian: n = 389), and produces ethnicity probabilities that are highly correlated with genetic ancestry inferred from genome-wide SNP arrays (>2.5 million SNP) and ancestry informative markers (n = 50 SNPs). PlaNET’s ethnicity classification relies on 1860 HM450K microarray sites, and over half of these were linked to nearby genetic polymorphisms (n = 955). Our placental-optimized method outperforms existing approaches in assessing population stratification in placental samples from individuals of Asian, African, and Caucasian ethnicities.ConclusionPlaNET provides an improved approach to address population stratification in placental DNAme association studies. The method can be applied to predict ethnicity as a discrete or continuous variable and will be especially useful when self-reported ethnicity information is missing and genotyping markers are unavailable. PlaNET is available as an R package at (https://github.com/wvictor14/planet).

2014 ◽  
Vol 6 (1) ◽  
pp. 10-16 ◽  
Author(s):  
M.-L. Ong ◽  
X. Lin ◽  
J. D. Holbrook

Analysis of DNA methylation data in epigenome-wide association studies provides many bioinformatics and statistical challenges. Not least of these, are the non-independence of individual DNA methylation marks from each other, from genotype and from technical sources of variation. In this review we discuss DNA methylation data from the Infinium450K array and processing methodologies to reduce technical variation. We describe recent approaches to harness the concordance of neighbouring DNA methylation values to improve power in association studies. We also describe how the non-independence of genotype and DNA methylation has been used to infer causality (in the case of Mendelian randomization approaches); suggest the mediating effect of DNA methylation in linking intergenic single nucleotide polymorphisms, identified in genome-wide association studies, to phenotype; and to uncover the widespread influence of gene and environment interactions on methylation levels.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Daniel L. McCartney ◽  
Josine L. Min ◽  
Rebecca C. Richmond ◽  
Ake T. Lu ◽  
Maria K. Sobczyk ◽  
...  

Abstract Background Biological aging estimators derived from DNA methylation data are heritable and correlate with morbidity and mortality. Consequently, identification of genetic and environmental contributors to the variation in these measures in populations has become a major goal in the field. Results Leveraging DNA methylation and SNP data from more than 40,000 individuals, we identify 137 genome-wide significant loci, of which 113 are novel, from genome-wide association study (GWAS) meta-analyses of four epigenetic clocks and epigenetic surrogate markers for granulocyte proportions and plasminogen activator inhibitor 1 levels, respectively. We find evidence for shared genetic loci associated with the Horvath clock and expression of transcripts encoding genes linked to lipid metabolism and immune function. Notably, these loci are independent of those reported to regulate DNA methylation levels at constituent clock CpGs. A polygenic score for GrimAge acceleration showed strong associations with adiposity-related traits, educational attainment, parental longevity, and C-reactive protein levels. Conclusion This study illuminates the genetic architecture underlying epigenetic aging and its shared genetic contributions with lifestyle factors and longevity.


2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Todd R. Robeck ◽  
Zhe Fei ◽  
Ake T. Lu ◽  
Amin Haghani ◽  
Eve Jourdain ◽  
...  

AbstractThe development of a precise blood or skin tissue DNA Epigenetic Aging Clock for Odontocete (OEAC) would solve current age estimation inaccuracies for wild odontocetes. Therefore, we determined genome-wide DNA methylation profiles using a custom array (HorvathMammalMethyl40) across skin and blood samples (n = 446) from known age animals representing nine odontocete species within 4 phylogenetic families to identify age associated CG dinucleotides (CpGs). The top CpGs were used to create a cross-validated OEAC clock which was highly correlated for individuals (r = 0.94) and for unique species (median r = 0.93). Finally, we applied the OEAC for estimating the age and sex of 22 wild Norwegian killer whales. DNA methylation patterns of age associated CpGs are highly conserved across odontocetes. These similarities allowed us to develop an odontocete epigenetic aging clock (OEAC) which can be used for species conservation efforts by provide a mechanism for estimating the age of free ranging odontocetes from either blood or skin samples.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Jovana Maksimovic ◽  
Alicia Oshlack ◽  
Belinda Phipson

AbstractDNA methylation is one of the most commonly studied epigenetic marks, due to its role in disease and development. Illumina methylation arrays have been extensively used to measure methylation across the human genome. Methylation array analysis has primarily focused on preprocessing, normalization, and identification of differentially methylated CpGs and regions. GOmeth and GOregion are new methods for performing unbiased gene set testing following differential methylation analysis. Benchmarking analyses demonstrate GOmeth outperforms other approaches, and GOregion is the first method for gene set testing of differentially methylated regions. Both methods are publicly available in the missMethyl Bioconductor R package.


2010 ◽  
Vol 20 (12) ◽  
pp. 1719-1729 ◽  
Author(s):  
M. D. Robinson ◽  
C. Stirzaker ◽  
A. L. Statham ◽  
M. W. Coolen ◽  
J. Z. Song ◽  
...  

2012 ◽  
Vol 13 (10) ◽  
pp. R87 ◽  
Author(s):  
Altuna Akalin ◽  
Matthias Kormaksson ◽  
Sheng Li ◽  
Francine E Garrett-Bakelman ◽  
Maria E Figueroa ◽  
...  

Author(s):  
Huaqing Zhao ◽  
Nandita Mitra ◽  
Peter A. Kanetsky ◽  
Katherine L. Nathanson ◽  
Timothy R. Rebbeck

Abstract Genome-wide association studies (GWAS) are susceptible to bias due to population stratification (PS). The most widely used method to correct bias due to PS is principal components (PCs) analysis (PCA), but there is no objective method to guide which PCs to include as covariates. Often, the ten PCs with the highest eigenvalues are included to adjust for PS. This selection is arbitrary, and patterns of local linkage disequilibrium may affect PCA corrections. To address these limitations, we estimate genomic propensity scores based on all statistically significant PCs selected by the Tracy-Widom (TW) statistic. We compare a principal components and propensity scores (PCAPS) approach to PCA and EMMAX using simulated GWAS data under no, moderate, and severe PS. PCAPS reduced spurious genetic associations regardless of the degree of PS, resulting in odds ratio (OR) estimates closer to the true OR. We illustrate our PCAPS method using GWAS data from a study of testicular germ cell tumors. PCAPS provided a more conservative adjustment than PCA. Advantages of the PCAPS approach include reduction of bias compared to PCA, consistent selection of propensity scores to adjust for PS, the potential ability to handle outliers, and ease of implementation using existing software packages.


2017 ◽  
Author(s):  
Josine Min ◽  
Gibran Hemani ◽  
George Davey Smith ◽  
Caroline Relton ◽  
Matthew Suderman

AbstractBackgroundTechnological advances in high throughput DNA methylation microarrays have allowed dramatic growth of a new branch of epigenetic epidemiology. DNA methylation datasets are growing ever larger in terms of the number of samples profiled, the extent of genome coverage, and the number of studies being meta-analysed. Novel computational solutions are required to efficiently handle these data.MethodsWe have developed meffil, an R package designed to quality control, normalize and perform epigenome-wide association studies (EWAS) efficiently on large samples of Illumina Infinium HumanMethylation450 and MethylationEPIC BeadChip microarrays. We tested meffil by applying it to 6000 450k microarrays generated from blood collected for two different datasets, Accessible Resource for Integrative Epigenomic Studies (ARIES) and The Genetics of Overweight Young Adults (GOYA) study.ResultsA complete reimplementation of functional normalization minimizes computational memory requirements to 5% of that required by other R packages, without increasing running time. Incorporating fixed and random effects alongside functional normalization, and automated estimation of functional normalisation parameters reduces technical variation in DNA methylation levels, thus reducing false positive associations and improving power. We also demonstrate that the ability to normalize datasets distributed across physically different locations without sharing any biologically-based individual-level data may reduce heterogeneity in meta-analyses of epigenome-wide association studies. However, we show that when batch is perfectly confounded with cases and controls functional normalization is unable to prevent spurious associations.Conclusionsmeffil is available online (https://github.com/perishky/meffil/) along with tutorials covering typical use cases.


2018 ◽  
Vol 19 (9) ◽  
pp. 2794 ◽  
Author(s):  
Rong Zhou ◽  
Komivi Dossa ◽  
Donghua Li ◽  
Jingyin Yu ◽  
Jun You ◽  
...  

Sesame is poised to become a major oilseed crop owing to its high oil quality and adaptation to various ecological areas. However, the seed yield of sesame is very low and the underlying genetic basis is still elusive. Here, we performed genome-wide association studies of 39 seed yield-related traits categorized into five major trait groups, in three different environments, using 705 diverse lines. Extensive variation was observed for the traits with capsule size, capsule number and seed size-related traits, found to be highly correlated with seed yield indexes. In total, 646 loci were significantly associated with the 39 traits (p < 10−7) and resolved to 547 quantitative trait loci QTLs. We identified six multi-environment QTLs and 76 pleiotropic QTLs associated with two to five different traits. By analyzing the candidate genes for the assayed traits, we retrieved 48 potential genes containing significant functional loci. Several homologs of these candidate genes in Arabidopsis are described to be involved in seed or biomass formation. However, we also identified novel candidate genes, such as SiLPT3 and SiACS8, which may control capsule length and capsule number traits. Altogether, we provided the highly-anticipated basis for research on genetics and functional genomics towards seed yield improvement in sesame.


Data in Brief ◽  
2018 ◽  
Vol 19 ◽  
pp. 1046-1057 ◽  
Author(s):  
Giovanni Scala ◽  
Veer Marwah ◽  
Pia Kinaret ◽  
Jukka Sund ◽  
Vittorio Fortino ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document