Locally Epistatic Models for Genome-wide Prediction and Association by Importance Sampling

Mapping Intimacies ◽

10.1101/046177 ◽

2016 ◽

Author(s):

Deniz Akdemir ◽

Jean-Luc Jannink

Keyword(s):

Complex Traits ◽

Simulated Data ◽

Genetic Effects ◽

Data Sets ◽

Phenotypic Variance ◽

Modeling Methodology ◽

Genome Wide ◽

Marker Annotations ◽

Additive Genetic Effects ◽

Simulated Data Sets

AbstractIn statistical genetics an important task involves building predictive models for the genotype-phenotype relationships and thus attribute a proportion of the total phenotypic variance to the variation in genotypes. Numerous models have been proposed to incorporate additive genetic effects into models for prediction or association. However, there is a scarcity of models that can adequately account for gene by gene or other forms of genetical interactions. In addition, there is an increased interest in using marker annotations in genome-wide prediction and association. In this paper, we discuss an hybrid modeling methodology which combines the parametric mixed modeling approach and the non-parametric rule ensembles. This approach gives us a flexible class of models that can be used to capture additive, locally epistatic genetic effects, gene x background interactions and allows us to incorporate one or more annotations into the genomic selection or association models. We use benchmark data sets covering a range of organisms and traits in addition to simulated data sets to illustrate the strengths of this approach. The improvement of model accuracies and association results suggest that a part of the ’’missing heritability” in complex traits can be captured by modeling local epistasis.

Download Full-text

Spectral Convolution Feature-Based SPD Matrix Representation for Signal Detection Using a Deep Neural Network

Entropy ◽

10.3390/e22090949 ◽

2020 ◽

Vol 22 (9) ◽

pp. 949

Author(s):

Jiangyi Wang ◽

Min Liu ◽

Xinwu Zeng ◽

Xiaoqiang Hua

Keyword(s):

Neural Network ◽

Signal Detection ◽

Convolutional Neural Network ◽

Deep Neural Network ◽

Detection Method ◽

Learning Algorithm ◽

Simulated Data ◽

Data Sets ◽

Feature Maps ◽

Simulated Data Sets

Convolutional neural networks have powerful performances in many visual tasks because of their hierarchical structures and powerful feature extraction capabilities. SPD (symmetric positive definition) matrix is paid attention to in visual classification, because it has excellent ability to learn proper statistical representation and distinguish samples with different information. In this paper, a deep neural network signal detection method based on spectral convolution features is proposed. In this method, local features extracted from convolutional neural network are used to construct the SPD matrix, and a deep learning algorithm for the SPD matrix is used to detect target signals. Feature maps extracted by two kinds of convolutional neural network models are applied in this study. Based on this method, signal detection has become a binary classification problem of signals in samples. In order to prove the availability and superiority of this method, simulated and semi-physical simulated data sets are used. The results show that, under low SCR (signal-to-clutter ratio), compared with the spectral signal detection method based on the deep neural network, this method can obtain a gain of 0.5–2 dB on simulated data sets and semi-physical simulated data sets.

Download Full-text

Evidence of Shared Genome-Wide Additive Genetic Effects on Interpersonal Trauma Exposure and Generalized Vulnerability to Drug Dependence in a Population of Substance Users

Journal of Traumatic Stress ◽

10.1002/jts.22103 ◽

2016 ◽

Vol 29 (3) ◽

pp. 197-204 ◽

Cited By ~ 2

Author(s):

Rohan H. C. Palmer ◽

Nicole R. Nugent ◽

Leslie A. Brick ◽

Cinnamon L. Bidwell ◽

John E. McGeary ◽

...

Keyword(s):

Drug Dependence ◽

Trauma Exposure ◽

Interpersonal Trauma ◽

Genetic Effects ◽

Substance Users ◽

Genome Wide ◽

Additive Genetic Effects

Download Full-text

GxEsum: a novel approach to estimate the phenotypic variance explained by genome-wide GxE interaction based on GWAS summary statistics for biobank-scale data

Genome Biology ◽

10.1186/s13059-021-02403-1 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Jisu Shin ◽

Sang Hong Lee

Keyword(s):

Complex Traits ◽

Error Rates ◽

Type I ◽

Phenotypic Variance ◽

Environment Interaction ◽

Summary Statistics ◽

Gxe Interaction ◽

Genome Wide ◽

Scale Data ◽

Variance Explained

AbstractGenetic variation in response to the environment, that is, genotype-by-environment interaction (GxE), is fundamental in the biology of complex traits and diseases. However, existing methods are computationally demanding and infeasible to handle biobank-scale data. Here, we introduce GxEsum, a method for estimating the phenotypic variance explained by genome-wide GxE based on GWAS summary statistics. Through comprehensive simulations and analysis of UK Biobank with 288,837 individuals, we show that GxEsum can handle a large-scale biobank dataset with controlled type I error rates and unbiased GxE estimates, and its computational efficiency can be hundreds of times higher than existing GxE methods.

Download Full-text

Analysis of genetic dominance in the UK Biobank

10.1101/2021.08.15.456387 ◽

2021 ◽

Author(s):

Duncan S Palmer ◽

Wei Zhou ◽

Liam Abbott ◽

Nik Baya ◽

Claire Churchhouse ◽

...

Keyword(s):

Complex Traits ◽

Multiple Testing ◽

Model Organisms ◽

Systematic Evaluation ◽

Hair Color ◽

Phenotypic Variance ◽

Additive Effects ◽

Uk Biobank ◽

Genome Wide ◽

The Uk

In classical statistical genetic theory, a dominance effect is defined as the deviation from a purely additive genetic effect for a biallelic variant. Dominance effects are well documented in model organisms. However, evidence in humans is limited to a handful of traits, particularly those with strong single locus effects such as hair color. We carried out the largest systematic evaluation of dominance effects on phenotypic variance in the UK Biobank. We curated and tested over 1,000 phenotypes for dominance effects through GWAS scans, identifying 175 loci at genome-wide significance correcting for multiple testing (P < 4.7 × 10-11). Power to detect non-additive loci is much lower than power to detect additive effects for complex traits: based on the relative effect sizes at genome-wide significant additive loci, we estimate a factor of 20-30 increase in sample size will be necessary to capture clear evidence of dominance similar to those currently observed for additive effects. However, these localised dominance hits do not extend to a significant aggregate contribution to phenotypic variance genome-wide. By deriving a version of LD-score regression to detect dominance effects tagged by common variation genome-wide (minor allele frequency > 0.05), we found no strong evidence of a contribution to phenotypic variance when accounting for multiple testing. Across the 267 continuous and 793 binary traits the median contribution was 5.73 × 10-4, with unbiased point estimates ranging from -0.261 to 0.131. Finally, we introduce dominance fine-mapping to explore whether the more rapid decay of dominance LD can be leveraged to find causal variants. These results provide the most comprehensive assessment of dominance trait variation in humans to date.

Download Full-text

Benchmarking Statistical Multiple Sequence Alignment

10.1101/304659 ◽

2018 ◽

Cited By ~ 1

Author(s):

Michael Nute ◽

Ehsan Saleh ◽

Tandy Warnow

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Structural Alignment ◽

Estimation Method ◽

Simulated Data ◽

Protein Sequences ◽

Data Sets ◽

Sequence Alignments ◽

Multiple Sequence ◽

Simulated Data Sets

AbstractThe estimation of multiple sequence alignments of protein sequences is a basic step in many bioinformatics pipelines, including protein structure prediction, protein family identification, and phylogeny estimation. Statistical co-estimation of alignments and trees under stochastic models of sequence evolution has long been considered the most rigorous technique for estimating alignments and trees, but little is known about the accuracy of such methods on biological benchmarks. We report the results of an extensive study evaluating the most popular protein alignment methods as well as the statistical co-estimation method BAli-Phy on 1192 protein data sets from established benchmarks as well as on 120 simulated data sets. Our study (which used more than 230 CPU years for the BAli-Phy analyses alone) shows that BAli-Phy is dramatically more accurate than the other alignment methods on the simulated data sets, but is among the least accurate on the biological benchmarks. There are several potential causes for this discordance, including model misspecification, errors in the reference alignments, and conflicts between structural alignment and evolutionary alignments; future research is needed to understand the most likely explanation for our observations. multiple sequence alignment, BAli-Phy, protein sequences, structural alignment, homology

Download Full-text

A multivariate analysis with direct additive and inbreeding depression load effects

Genetics Selection Evolution ◽

10.1186/s12711-019-0521-3 ◽

2019 ◽

Vol 51 (1) ◽

Cited By ~ 3

Author(s):

Luis Varona ◽

Juan Altarriba ◽

Carlos Moreno ◽

María Martínez-Castillero ◽

Joaquim Casellas

Keyword(s):

Beef Cattle ◽

Inbreeding Depression ◽

Negative Correlation ◽

Simulated Data ◽

Genetic Effects ◽

Strong Negative Correlation ◽

Additive Variance ◽

Inbreeding Coefficients ◽

Load Effects ◽

Additive Genetic Effects

Abstract Background Inbreeding is caused by mating between related individuals and its most common consequence is inbreeding depression. Several studies have detected heterogeneity in inbreeding depression among founder individuals, and recently a procedure for predicting hidden inbreeding depression loads associated with founders and the Mendelian sampling of non-founders has been developed. The objectives of our study were to expand this model to predict the inbreeding loads for all individuals in the pedigree and to estimate the covariance between the inbreeding loads and the additive genetic effects for the trait of interest. We tested the proposed approach with simulated data and with two datasets of records on weaning weight from the Spanish Pirenaica and Rubia Gallega beef cattle breeds. Results The posterior estimates of the variance components with the simulated datasets did not differ significantly from the simulation parameters. In addition, the correlation between the predicted and simulated inbreeding loads were always positive and ranged from 0.27 to 0.82. The beef cattle datasets comprised 35,126 and 75,194 records on weights between 170 and 250 days of age, and pedigrees of 308,836 and 384,434 individual-sire-dam entries for the Pirenaica and Rubia Gallega breeds, respectively. The posterior mean estimates of the variance of inbreeding depression loads were 29,967.8 and 28,222.4 for the Pirenaica and Rubia Gallega breeds, respectively. They were larger than those of the additive variance (695.0 and 439.8 for Pirenaica and Rubia Gallega, respectively), because they should be understood as the variance of the inbreeding depression achieved by a fully inbred (100%) descendant. Therefore, the inbreeding loads have to be rescaled for smaller inbreeding coefficients. In addition, a strong negative correlation (− 0.43 ± 0.10) between additive effects and inbreeding loads was detected in the Pirenaica, but not in the Rubia Gallega breed. Conclusions The results of the simulation study confirmed the ability of the proposed procedure to predict inbreeding depression loads for all individuals in the populations. Furthermore, the results obtained from the two real datasets confirmed the variability in the inbreeding depression loads in both breeds and suggested a negative correlation of the inbreeding loads with the additive genetic effects in the Pirenaica breed.

Download Full-text

Bayesian Planet Searches for the 10 cm/s Radial Velocity Era

Proceedings of the International Astronomical Union ◽

10.1017/s1743921316002817 ◽

2015 ◽

Vol 11 (A29A) ◽

pp. 205-207

Author(s):

Philip C. Gregory

Keyword(s):

Radial Velocity ◽

State Of The Art ◽

Simulated Data ◽

Model Parameters ◽

Data Sets ◽

Stellar Activity ◽

Bayesian Fusion ◽

Multiple State ◽

Simulated Data Sets ◽

Apodization Function

AbstractA new apodized Keplerian model is proposed for the analysis of precision radial velocity (RV) data to model both planetary and stellar activity (SA) induced RV signals. A symmetrical Gaussian apodization function with unknown width and center can distinguish planetary signals from SA signals on the basis of the width of the apodization function. The general model for m apodized Keplerian signals also includes a linear regression term between RV and the stellar activity diagnostic In (R'hk), as well as an extra Gaussian noise term with unknown standard deviation. The model parameters are explored using a Bayesian fusion MCMC code. A differential version of the Generalized Lomb-Scargle periodogram provides an additional way of distinguishing SA signals and helps guide the choice of new periods. Sample results are reported for a recent international RV blind challenge which included multiple state of the art simulated data sets supported by a variety of stellar activity diagnostics.

Download Full-text

A comparison of procedures for classifying remotely-sensed data using simulated data sets incorporating autocorrelations between spectral responses

International Journal of Remote Sensing ◽

10.1080/01431169208904073 ◽

1992 ◽

Vol 13 (14) ◽

pp. 2701-2725 ◽

Cited By ~ 3

Author(s):

J. D. WILSON

Keyword(s):

Simulated Data ◽

Remotely Sensed ◽

Data Sets ◽

Remotely Sensed Data ◽

Simulated Data Sets ◽

Spectral Responses

Download Full-text

Erratum to: The Use of Geographically Weighted Regression for Spatial Prediction: An Evaluation of Models Using Simulated Data Sets

Mathematical Geosciences ◽

10.1007/s11004-011-9323-z ◽

2011 ◽

Vol 43 (3) ◽

pp. 399-399 ◽

Cited By ~ 1

Author(s):

P. Harris ◽

A. S. Fotheringham ◽

R. Crespo ◽

M. Charlton

Keyword(s):

Geographically Weighted Regression ◽

Simulated Data ◽

Spatial Prediction ◽

Weighted Regression ◽

Data Sets ◽

Simulated Data Sets

Download Full-text

An evaluation of HapMap sample size and tagging SNP performance in large-scale empirical and simulated data sets

Nature Genetics ◽

10.1038/ng1670 ◽

2005 ◽

Vol 37 (12) ◽

pp. 1320-1322 ◽

Cited By ~ 76

Author(s):

Eleftheria Zeggini ◽

William Rayner ◽

Andrew P Morris ◽

Andrew T Hattersley ◽

Mark Walker ◽

...

Keyword(s):

Sample Size ◽

Large Scale ◽

Simulated Data ◽

Data Sets ◽

Hapmap Sample ◽

Tagging Snp ◽

Simulated Data Sets

Download Full-text