scholarly journals Locally Epistatic Models for Genome-wide Prediction and Association by Importance Sampling

2016 ◽  
Author(s):  
Deniz Akdemir ◽  
Jean-Luc Jannink

AbstractIn statistical genetics an important task involves building predictive models for the genotype-phenotype relationships and thus attribute a proportion of the total phenotypic variance to the variation in genotypes. Numerous models have been proposed to incorporate additive genetic effects into models for prediction or association. However, there is a scarcity of models that can adequately account for gene by gene or other forms of genetical interactions. In addition, there is an increased interest in using marker annotations in genome-wide prediction and association. In this paper, we discuss an hybrid modeling methodology which combines the parametric mixed modeling approach and the non-parametric rule ensembles. This approach gives us a flexible class of models that can be used to capture additive, locally epistatic genetic effects, gene x background interactions and allows us to incorporate one or more annotations into the genomic selection or association models. We use benchmark data sets covering a range of organisms and traits in addition to simulated data sets to illustrate the strengths of this approach. The improvement of model accuracies and association results suggest that a part of the ’’missing heritability” in complex traits can be captured by modeling local epistasis.

Entropy ◽  
2020 ◽  
Vol 22 (9) ◽  
pp. 949
Author(s):  
Jiangyi Wang ◽  
Min Liu ◽  
Xinwu Zeng ◽  
Xiaoqiang Hua

Convolutional neural networks have powerful performances in many visual tasks because of their hierarchical structures and powerful feature extraction capabilities. SPD (symmetric positive definition) matrix is paid attention to in visual classification, because it has excellent ability to learn proper statistical representation and distinguish samples with different information. In this paper, a deep neural network signal detection method based on spectral convolution features is proposed. In this method, local features extracted from convolutional neural network are used to construct the SPD matrix, and a deep learning algorithm for the SPD matrix is used to detect target signals. Feature maps extracted by two kinds of convolutional neural network models are applied in this study. Based on this method, signal detection has become a binary classification problem of signals in samples. In order to prove the availability and superiority of this method, simulated and semi-physical simulated data sets are used. The results show that, under low SCR (signal-to-clutter ratio), compared with the spectral signal detection method based on the deep neural network, this method can obtain a gain of 0.5–2 dB on simulated data sets and semi-physical simulated data sets.


2016 ◽  
Vol 29 (3) ◽  
pp. 197-204 ◽  
Author(s):  
Rohan H. C. Palmer ◽  
Nicole R. Nugent ◽  
Leslie A. Brick ◽  
Cinnamon L. Bidwell ◽  
John E. McGeary ◽  
...  

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Jisu Shin ◽  
Sang Hong Lee

AbstractGenetic variation in response to the environment, that is, genotype-by-environment interaction (GxE), is fundamental in the biology of complex traits and diseases. However, existing methods are computationally demanding and infeasible to handle biobank-scale data. Here, we introduce GxEsum, a method for estimating the phenotypic variance explained by genome-wide GxE based on GWAS summary statistics. Through comprehensive simulations and analysis of UK Biobank with 288,837 individuals, we show that GxEsum can handle a large-scale biobank dataset with controlled type I error rates and unbiased GxE estimates, and its computational efficiency can be hundreds of times higher than existing GxE methods.


2021 ◽  
Author(s):  
Duncan S Palmer ◽  
Wei Zhou ◽  
Liam Abbott ◽  
Nik Baya ◽  
Claire Churchhouse ◽  
...  

In classical statistical genetic theory, a dominance effect is defined as the deviation from a purely additive genetic effect for a biallelic variant. Dominance effects are well documented in model organisms. However, evidence in humans is limited to a handful of traits, particularly those with strong single locus effects such as hair color. We carried out the largest systematic evaluation of dominance effects on phenotypic variance in the UK Biobank. We curated and tested over 1,000 phenotypes for dominance effects through GWAS scans, identifying 175 loci at genome-wide significance correcting for multiple testing (P < 4.7 × 10-11). Power to detect non-additive loci is much lower than power to detect additive effects for complex traits: based on the relative effect sizes at genome-wide significant additive loci, we estimate a factor of 20-30 increase in sample size will be necessary to capture clear evidence of dominance similar to those currently observed for additive effects. However, these localised dominance hits do not extend to a significant aggregate contribution to phenotypic variance genome-wide. By deriving a version of LD-score regression to detect dominance effects tagged by common variation genome-wide (minor allele frequency > 0.05), we found no strong evidence of a contribution to phenotypic variance when accounting for multiple testing. Across the 267 continuous and 793 binary traits the median contribution was 5.73 × 10-4, with unbiased point estimates ranging from -0.261 to 0.131. Finally, we introduce dominance fine-mapping to explore whether the more rapid decay of dominance LD can be leveraged to find causal variants. These results provide the most comprehensive assessment of dominance trait variation in humans to date.


2018 ◽  
Author(s):  
Michael Nute ◽  
Ehsan Saleh ◽  
Tandy Warnow

AbstractThe estimation of multiple sequence alignments of protein sequences is a basic step in many bioinformatics pipelines, including protein structure prediction, protein family identification, and phylogeny estimation. Statistical co-estimation of alignments and trees under stochastic models of sequence evolution has long been considered the most rigorous technique for estimating alignments and trees, but little is known about the accuracy of such methods on biological benchmarks. We report the results of an extensive study evaluating the most popular protein alignment methods as well as the statistical co-estimation method BAli-Phy on 1192 protein data sets from established benchmarks as well as on 120 simulated data sets. Our study (which used more than 230 CPU years for the BAli-Phy analyses alone) shows that BAli-Phy is dramatically more accurate than the other alignment methods on the simulated data sets, but is among the least accurate on the biological benchmarks. There are several potential causes for this discordance, including model misspecification, errors in the reference alignments, and conflicts between structural alignment and evolutionary alignments; future research is needed to understand the most likely explanation for our observations. multiple sequence alignment, BAli-Phy, protein sequences, structural alignment, homology


2019 ◽  
Vol 51 (1) ◽  
Author(s):  
Luis Varona ◽  
Juan Altarriba ◽  
Carlos Moreno ◽  
María Martínez-Castillero ◽  
Joaquim Casellas

Abstract Background Inbreeding is caused by mating between related individuals and its most common consequence is inbreeding depression. Several studies have detected heterogeneity in inbreeding depression among founder individuals, and recently a procedure for predicting hidden inbreeding depression loads associated with founders and the Mendelian sampling of non-founders has been developed. The objectives of our study were to expand this model to predict the inbreeding loads for all individuals in the pedigree and to estimate the covariance between the inbreeding loads and the additive genetic effects for the trait of interest. We tested the proposed approach with simulated data and with two datasets of records on weaning weight from the Spanish Pirenaica and Rubia Gallega beef cattle breeds. Results The posterior estimates of the variance components with the simulated datasets did not differ significantly from the simulation parameters. In addition, the correlation between the predicted and simulated inbreeding loads were always positive and ranged from 0.27 to 0.82. The beef cattle datasets comprised 35,126 and 75,194 records on weights between 170 and 250 days of age, and pedigrees of 308,836 and 384,434 individual-sire-dam entries for the Pirenaica and Rubia Gallega breeds, respectively. The posterior mean estimates of the variance of inbreeding depression loads were 29,967.8 and 28,222.4 for the Pirenaica and Rubia Gallega breeds, respectively. They were larger than those of the additive variance (695.0 and 439.8 for Pirenaica and Rubia Gallega, respectively), because they should be understood as the variance of the inbreeding depression achieved by a fully inbred (100%) descendant. Therefore, the inbreeding loads have to be rescaled for smaller inbreeding coefficients. In addition, a strong negative correlation (− 0.43 ± 0.10) between additive effects and inbreeding loads was detected in the Pirenaica, but not in the Rubia Gallega breed. Conclusions The results of the simulation study confirmed the ability of the proposed procedure to predict inbreeding depression loads for all individuals in the populations. Furthermore, the results obtained from the two real datasets confirmed the variability in the inbreeding depression loads in both breeds and suggested a negative correlation of the inbreeding loads with the additive genetic effects in the Pirenaica breed.


2015 ◽  
Vol 11 (A29A) ◽  
pp. 205-207
Author(s):  
Philip C. Gregory

AbstractA new apodized Keplerian model is proposed for the analysis of precision radial velocity (RV) data to model both planetary and stellar activity (SA) induced RV signals. A symmetrical Gaussian apodization function with unknown width and center can distinguish planetary signals from SA signals on the basis of the width of the apodization function. The general model for m apodized Keplerian signals also includes a linear regression term between RV and the stellar activity diagnostic In (R'hk), as well as an extra Gaussian noise term with unknown standard deviation. The model parameters are explored using a Bayesian fusion MCMC code. A differential version of the Generalized Lomb-Scargle periodogram provides an additional way of distinguishing SA signals and helps guide the choice of new periods. Sample results are reported for a recent international RV blind challenge which included multiple state of the art simulated data sets supported by a variety of stellar activity diagnostics.


2005 ◽  
Vol 37 (12) ◽  
pp. 1320-1322 ◽  
Author(s):  
Eleftheria Zeggini ◽  
William Rayner ◽  
Andrew P Morris ◽  
Andrew T Hattersley ◽  
Mark Walker ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document