Simultaneous Analysis of Common and Rare Variants in Complex Traits: Application to SNPs (SCARVAsnp)

Bioinformatics and Biology Insights ◽

10.4137/bbi.s9966 ◽

2012 ◽

Vol 6 ◽

pp. BBI.S9966 ◽

Cited By ~ 2

Author(s):

Guanjie Chen ◽

Ao Yuan ◽

Yanxun Zhou ◽

Amy R. Bentley ◽

Jie Zhou ◽

...

Keyword(s):

Complex Traits ◽

Statistical Power ◽

Large Scale ◽

Rare Variants ◽

Real Data ◽

Simultaneous Analysis ◽

Common Variants ◽

Disease Etiology ◽

Modified Method ◽

Log Likelihood

Advances in technology and reduced costs are facilitating large-scale sequencing of genes and exomes as well as entire genomes. Recently, we described an approach based on haplotypes called SCARVA 1 that enables the simultaneous analysis of the association between rare and common variants in disease etiology. Here, we describe an extension of SCARVA that evaluates individual markers instead of haplotypes. This modified method (SCARVAsnp) is implemented in four stages. First, all common variants in a pre-specified region (eg, gene) are evaluated individually. Second, a union procedure is used to combined all rare variants (RVs) in the index region, and the ratio of the log likelihood with one RV excluded to the log likelihood of a model with all the collapsed RVs is calculated. On the basis of previously-reported simulation studies, 1 a likelihood ratio ≥ 1.3 is considered statistically significant. Third, the direction of the association of the removed RV is determined by evaluating the change in λ values with the inclusion and exclusion of that RV. Lastly, significant common and rare variants, along with covariates, are included in a final regression model to evaluate the association between the trait and variants in that region. We apply simulated and real data sets to show that the method is simple to use, computationally effcient, and that it can accurately identify both common and rare risk variants. This method overcomes several limitations of existing methods. For example, SCARVAsnp limits loss of statistical power by not including variants that are not associated with the trait of interest in the final model. Also, SCARVAsnp takes into consideration the direction of association by effectively modelling positively and negatively associated variants.

Download Full-text

A Novel Approach for the Simultaneous Analysis of Common and Rare Variants in Complex Traits

Bioinformatics and Biology Insights ◽

10.4137/bbi.s8852 ◽

2012 ◽

Vol 6 ◽

pp. BBI.S8852 ◽

Cited By ~ 4

Author(s):

Ao Yuan ◽

Guanjie Chen ◽

Yanxun Zhou ◽

Amy Bentley ◽

Charles Rotimi

Keyword(s):

Complex Traits ◽

Rare Variants ◽

Sequence Data ◽

Association Studies ◽

Simultaneous Analysis ◽

Genome Wide Association Studies ◽

Common Variants ◽

Disease Etiology ◽

Novel Approach ◽

Common Genetic Variants

Genome-wide association studies (GWAS) have been successful in detecting common genetic variants underlying common traits and diseases. Despite the GWAS success stories, the percent trait variance explained by GWAS signals, the so called “missing heritability” has been, at best, modest. Also, the predictive power of common variants identified by GWAS has not been encouraging. Given these observations along with the fact that the effects of rare variants are often, by design, unaccounted for by GWAS and the availability of sequence data, there is a growing need for robust analytic approaches to evaluate the contribution of rare variants to common complex diseases. Here we propose a new method that enables the simultaneous analysis of the association between rare and common variants in disease etiology. We refer to this method as SCARVA (simultaneous common and rare variants analysis). SCARVA is simple to use and is efficient. We used SCARVA to analyze two independent real datasets to identify rare and common variants underlying variation in obesity among participants in the Africa America Diabetes Mellitus (AADM) study and plasma triglyceride levels in the Dallas Heart Study (DHS). We found common and rare variants associated with both traits, consistent with published results.

Download Full-text

Rare variants regulate expression of nearby individual genes in multiple tissues

PLoS Genetics ◽

10.1371/journal.pgen.1009596 ◽

2021 ◽

Vol 17 (6) ◽

pp. e1009596

Author(s):

Jiajin Li ◽

Nahyun Kong ◽

Buhm Han ◽

Jae Hoon Sul

Keyword(s):

Gene Expression ◽

Complex Traits ◽

Statistical Power ◽

Rare Variants ◽

Small Sample Size ◽

Tissue Expression ◽

Small Sample ◽

Ratio Test ◽

Common Variants ◽

Regulatory Effects

The rapid decrease in sequencing cost has enabled genetic studies to discover rare variants associated with complex diseases and traits. Once this association is identified, the next step is to understand the genetic mechanism of rare variants on how the variants influence diseases. Similar to the hypothesis of common variants, rare variants may affect diseases by regulating gene expression, and recently, several studies have identified the effects of rare variants on gene expression using heritability and expression outlier analyses. However, identifying individual genes whose expression is regulated by rare variants has been challenging due to the relatively small sample size of expression quantitative trait loci studies and statistical approaches not optimized to detect the effects of rare variants. In this study, we analyze whole-genome sequencing and RNA-seq data of 681 European individuals collected for the Genotype-Tissue Expression (GTEx) project (v8) to identify individual genes in 49 human tissues whose expression is regulated by rare variants. To improve statistical power, we develop an approach based on a likelihood ratio test that combines effects of multiple rare variants in a nonlinear manner and has higher power than previous approaches. Using GTEx data, we identify many genes regulated by rare variants, and some of them are only regulated by rare variants and not by common variants. We also find that genes regulated by rare variants are enriched for expression outliers and disease-causing genes. These results suggest the regulatory effects of rare variants, which would be important in interpreting associations of rare variants with complex traits.

Download Full-text

Abstract 367: Extreme High-Density Lipoprotein Cholesterol Genetics: An Assortment of Large and Small Polygenic Effects

Arteriosclerosis Thrombosis and Vascular Biology ◽

10.1161/atvb.37.suppl_1.367 ◽

2017 ◽

Vol 37 (suppl_1) ◽

Author(s):

Jacqueline S Dron ◽

Jian Wang ◽

Cécile Low-Kam ◽

Sumeet A Khetarpal ◽

John F Robinson ◽

...

Keyword(s):

Large Scale ◽

Genetic Basis ◽

Rare Variants ◽

Association Studies ◽

Density Lipoprotein ◽

Copy Number Variations ◽

Genome Wide Association Studies ◽

Common Variants ◽

Targeted Next Generation Sequencing ◽

Common Genetic Variants

Rationale: Although HDL-C levels are known to have a complex genetic basis, most studies have focused solely on identifying rare variants with large phenotypic effects to explain extreme HDL-C phenotypes. Objective: Here we concurrently evaluate the contribution of both rare and common genetic variants, as well as large-scale copy number variations (CNVs), towards extreme HDL-C concentrations. Methods: In clinically ascertained patients with low ( N =136) and high ( N =119) HDL-C profiles, we applied our targeted next-generation sequencing panel (LipidSeq TM ) to sequence genes involved in HDL metabolism, which were subsequently screened for rare variants and CNVs. We also developed a novel polygenic trait score (PTS) to assess patients’ genetic accumulations of common variants that have been shown by genome-wide association studies to associate primarily with HDL-C levels. Two additional cohorts of patients with extremely low and high HDL-C (total N =1,746 and N =1,139, respectively) were used for PTS validation. Results: In the discovery cohort, 32.4% of low HDL-C patients carried rare variants or CNVs in primary ( ABCA1 , APOA1 , LCAT ) and secondary ( LPL , LMF1 , GPD1 , APOE ) HDL-C–altering genes. Additionally, 13.4% of high HDL-C patients carried rare variants or CNVs in primary ( SCARB1 , CETP , LIPC , LIPG ) and secondary ( APOC3 , ANGPTL4 ) HDL-C–altering genes. For polygenic effects, patients with abnormal HDL-C profiles but without rare variants or CNVs were ~2-fold more likely to have an extreme PTS compared to normolipidemic individuals, indicating an increased frequency of common HDL-C–associated variants in these patients. Similar results in the two validation cohorts demonstrate that this novel PTS successfully quantifies common variant accumulation, further characterizing the polygenic basis for extreme HDL-C phenotypes. Conclusions: Patients with extreme HDL-C levels have various combinations of rare variants, common variants, or CNVs driving their phenotypes. Fully characterizing the genetic basis of HDL-C levels must extend to encompass multiple types of genetic determinants—not just rare variants—to further our understanding of this complex, controversial quantitative trait.

Download Full-text

The importance of magnification effects in galaxy-galaxy lensing

Astronomy and Astrophysics ◽

10.1051/0004-6361/201936915 ◽

2020 ◽

Vol 638 ◽

pp. A96 ◽

Cited By ~ 2

Author(s):

Sandra Unruh ◽

Peter Schneider ◽

Stefan Hilbert ◽

Patrick Simon ◽

Sandra Martin ◽

...

Keyword(s):

Statistical Power ◽

Large Scale ◽

Analytical Description ◽

Real Data ◽

Source Population ◽

Luminosity Functions ◽

Number Counts ◽

Source Number ◽

Local Number ◽

Magnification Effect

Magnification changes the observed local number density of galaxies on the sky. This biases the observed tangential shear profiles around galaxies: the so-called galaxy-galaxy lensing (GGL) signal. Inference of physical quantities, such as the mean mass profile of halos around galaxies, are correspondingly affected by magnification effects. We used simulated shear and galaxy data from the Millennium Simulation to quantify the effect on shear and mass estimates from the magnified lens and source number counts. The former is due to the large-scale matter distribution in the foreground of the lenses; the latter is caused by magnification of the source population by the matter associated with the lenses. The GGL signal is calculated from the simulations by an efficient fast Fourier transform, which can also be applied to real data. The numerical treatment is complemented by a leading-order analytical description of the magnification effects, which is shown to fit the numerical shear data well. We find the magnification effect is strongest for steep galaxy luminosity functions and high redshifts. For a KiDS+VIKING+GAMA-like survey with lens galaxies at redshift zd = 0.36 and source galaxies in the last three redshift bins with a mean redshift of ¯zs = 0.79, the magnification correction changes the shear profile up to 2%, and the mass is biased by up to 8%. We further considered an even higher redshift fiducial lens sample at zd = 0.83, with a limited magnitude of 22 mag in the r-band and a source redshift of zs = 0.99. Through this, we find that a magnification correction changes the shear profile up to 45% and that the mass is biased by up to 55%. As expected, the sign of the bias depends on the local slope of the lens luminosity function αd, where the mass is biased low for αd < 1 and biased high for αd > 1. While the magnification effect of sources is rarely more than 1% of the measured GGL signal, the statistical power of future weak lensing surveys warrants correction for this effect.

Download Full-text

RAREsim: A simulation method for very rare genetic variants

10.1101/2021.04.13.439644 ◽

2021 ◽

Author(s):

Megan Null ◽

Josée Dupuis ◽

Christopher R. Gignoux ◽

Audrey E. Hendricks

Keyword(s):

Rare Variant ◽

Complex Traits ◽

Rare Variants ◽

Simulated Data ◽

Real Data ◽

Simulation Method ◽

Sequencing Data ◽

Variant Annotation ◽

Causal Variants ◽

Rare Genetic Variants

AbstractIdentification of rare variant associations is crucial to fully characterize the genetic architecture of complex traits and diseases. Essential in this process is the evaluation of novel methods in simulated data that mirrors the distribution of rare variants and haplotype structure in real data. Additionally, importing real variant annotation enables in silico comparison of methods that focus on putative causal variants, such as rare variant association tests, and polygenic scoring methods. Existing simulation methods are either unable to employ real variant annotation or severely under- or over-estimate the number of singletons and doubletons reducing the ability to generalize simulation results to real studies. We present RAREsim, a flexible and accurate rare variant simulation algorithm. Using parameters and haplotypes derived from real sequencing data, RAREsim efficiently simulates the expected variant distribution and enables real variant annotations. We highlight RAREsim’s utility across various genetic regions, sample sizes, ancestries, and variant classes.

Download Full-text

Genome-wide rare variant analysis for thousands of phenotypes in 54,000 exomes

10.1101/692368 ◽

2019 ◽

Cited By ~ 2

Author(s):

Elizabeth T. Cirulli ◽

Simon White ◽

Robert W. Read ◽

Gai Elhanan ◽

William J Metcalf ◽

...

Keyword(s):

Rare Variant ◽

Large Scale ◽

Rare Variants ◽

Sequence Data ◽

Statistical Significance ◽

European Ancestry ◽

Hair Color ◽

Common Variants ◽

Loss Of Function ◽

Test Statistic

Defining the effects that rare variants can have on human phenotypes is essential to advancing our understanding of human health and disease. Large-scale human genetic analyses have thus far focused on common variants, but the development of large cohorts of deeply phenotyped individuals with exome sequence data has now made comprehensive analyses of rare variants possible. We analyzed the effects of rare (MAF<0.1%) variants on 3,166 phenotypes in 40,468 exome-sequenced individuals from the UK Biobank and performed replication as well as meta-analyses with 1,067 phenotypes in 13,470 members of the Healthy Nevada Project (HNP) cohort who underwent Exome+ sequencing at Helix. Our analyses of non-benign coding and loss of function (LoF) variants identified 78 gene-based associations that passed our statistical significance threshold (p<5×10-9). These are associations in which carrying any rare coding or LoF variant in the gene is associated with an enrichment for a specific phenotype, as opposed to GWAS-based associations of strictly single variants. Importantly, our results do not suffer from the test statistic inflation that is often seen with rare variant analyses of biobank-scale data because of our rare variant-tailored methodology, which includes a step that optimizes the carrier frequency threshold for each phenotype based on prevalence. Of the 47 discovery associations whose phenotypes were represented in the replication cohort, 98% showed effects in the expected direction, and 45% attained formal replication significance (p<0.001). Six additional significant associations were identified in our meta-analysis of both cohorts. Among the results, we confirm known associations of PCSK9 and APOB variation with LDL levels; we extend knowledge of variation in the TYRP1 gene, previously associated with blonde hair color only in Solomon Islanders to blonde hair color in individuals of European ancestry; we show that PAPPA, a gene in which common variants had previously associated with height via GWAS, contains rare variants that decrease height; and we make the novel discovery that STAB1 variation is associated with blood flow in the brain. Our results are available for download and interactive browsing in an app (https://ukb.research.helix.com). This comprehensive analysis of the effects of rare variants on human phenotypes marks one of the first steps in the next big phase of human genetics, where large, deeply phenotyped cohorts with next generation sequence data will elucidate the effects of rare variants.

Download Full-text

Leveraging functional annotation to identify genes associated with complex diseases

10.1101/529297 ◽

2019 ◽

Author(s):

Wei Liu ◽

Mo Li ◽

Wenfeng Zhang ◽

Geyu Zhou ◽

Xing Wu ◽

...

Keyword(s):

Gene Expression ◽

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Complex Traits ◽

Statistical Power ◽

Late Onset ◽

Disease Etiology ◽

Expression Levels ◽

Epigenetic Information ◽

Disease Associated Genes

AbstractTo increase statistical power to identify genes associated with complex traits, a number of transcriptome-wide association study (TWAS) methods have been proposed using gene expression as a mediating trait linking genetic variations and diseases. These methods first predict expression levels based on inferred expression quantitative trait loci (eQTLs) and then identify expression-mediated genetic effects on diseases by associating phenotypes with predicted expression levels. The success of these methods critically depends on the identification of eQTLs, which may not be functional in the corresponding tissue, due to linkage disequilibrium (LD) and the correlation of gene expression between tissues. Here, we introduce a new method called T-GEN (Transcriptome-mediated identification of disease-associatedGens withEpigenetic aNnotation) to identify disease-associated genes leveraging epigenetic information. Through prioritizing SNPs with tissue-specific epigenetic annotation, T-GEN can better identify SNPs that are both statistically predictive and biologically functional. We found that a significantly higher percentage (an increase of 18.7% to 47.2%) of eQTLs identified by T-GEN are inferred to be functional by ChromHMM and more are deleterious based on their Combined Annotation Dependent Depletion (CADD) scores. Applying T-GEN to 207 complex traits, we were able to identify more trait-associated genes (ranging from 7.7 % to 102%) than those from existing methods. Among the identified genes associated with these traits, T-GEN can better identify genes with high (>0.99) pLI scores compared to other methods. When T-GEN was applied to late-onset Alzheimer’s disease, we identified 96 genes located at 15 loci, including two novel loci not implicated in previous GWAS. We further replicated 50 genes in an independent GWAS, including one of the two novel loci.Author summaryTWAS-like methods have been widely applied to understand disease etiology using eQTL data and GWAS results. However, it is still challenging to discriminate the true disease-associated genes from those in strong LD with true genes, which is largely due to the misidentification of eQTLs. Here we introduce a novel statistical method named T-GEN to identify disease-associated genes considering epigenetic information. Compared to current TWAS methods, T-GEN can not only identify eQTLs with higher CADD scores and function potentials in gene-expression imputation models, but also identify more disease-associated genes across 207 traits and more genes with high (>0.99) pLI scores. Applying T-GEN in late-onset Alzheimer’s disease identified 96 genes at 15 loci with two novel loci. Among 96 identified genes, 50 genes were further replicated in an independent GWAS.

Download Full-text

Rare variant enriched identity-by-descent enables the detection of distant relatedness and older divergence between populations

10.1101/2020.05.05.079541 ◽

2020 ◽

Author(s):

Amol C. Shetty ◽

Jeffrey O’Connell ◽

Braxton D. Mitchell ◽

Timothy D. O’Connor ◽

◽

...

Keyword(s):

Rare Variant ◽

Human Population ◽

Large Scale ◽

Genetic Relatedness ◽

Rare Variants ◽

Association Studies ◽

Common Variants ◽

Identity By Descent ◽

Association Analyses ◽

Scale Population

AbstractMotivationThe global human population has experienced an explosive growth from a few million to roughly 7 billion people in the last 10,000 years. Accompanying this growth has been the accumulation of rare variants that can inform our understanding of human evolutionary history. Common variants have primarily been used to infer the structure of the human population and relatedness between two individuals. However, with the increasing abundance of rare variants observed in large-scale projects, such as Trans-Omics for Precision Medicine (TOPMed), the use of rare variants to decipher cryptic relatedness and fine-scale population structure can be beneficial to the study of population demographics and association studies. Identity-by-descent (IBD) is an important framework used for identifying these relationships. IBD segments are broken down by recombination over time, such that longer shared haplotypes give strong evidence of recent relatedness while shorter shared haplotypes are indicative of more distant relationships. Current methods to identify IBD accurately detect only long segments (> 2cM) found in related individuals.AlgorithmWe describe a metric that leverages rare-variants shared between individuals to improve the detection of short IBD segments. We computed IBD segments using existing methods implemented in Refined IBD where we enrich the signal using our metric that facilitates the detection of short IBD segments (<2cM) by explicitly incorporating rare variants.ResultsTo test our new metric, we simulated datasets involving populations with varying divergent time-scales. We show that rare-variant IBD identifies shorter segments with greater confidence and enables the detection of older divergence between populations. As an example, we applied our metric to the Old-Order Amish cohort with known genealogies dating 14 generations back to validate its ability to detect genetic relatedness between distant relatives. This analysis shows that our method increases the accuracy of identifying shorter segments that in turn capture distant relationships.ConclusionsWe describe a method to enrich the detection of short IBD segments using rare-variant sharing within IBD segments. Leveraging rare-variant sharing improves the information content of short IBD segments better than common variants alone. We validated the method in both simulated and empirical datasets. This method can benefit association analyses, IBD mapping analyses, and demographic inferences.

Download Full-text

A powerful approach to estimating annotation-stratified genetic covariance using GWAS summary statistics

10.1101/114561 ◽

2017 ◽

Cited By ~ 1

Author(s):

Qiongshi Lu ◽

Boyang Li ◽

Derek Ou ◽

Margret Erlendsdottir ◽

Ryan L. Powles ◽

...

Keyword(s):

Complex Traits ◽

Statistical Power ◽

Genetic Architecture ◽

Large Scale ◽

Late Onset ◽

Association Studies ◽

Joint Analysis ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Genetic Covariance

AbstractDespite the success of large-scale genome-wide association studies (GWASs) on complex traits, our understanding of their genetic architecture is far from complete. Jointly modeling multiple traits’ genetic profiles has provided insights into the shared genetic basis of many complex traits. However, large-scale inference sets a high bar for both statistical power and biological interpretability. Here we introduce a principled framework to estimate annotation-stratified genetic covariance between traits using GWAS summary statistics. Through theoretical and numerical analyses we demonstrate that our method provides accurate covariance estimates, thus enabling researchers to dissect both the shared and distinct genetic architecture across traits to better understand their etiologies. Among 50 complex traits with publicly accessible GWAS summary statistics (Ntotal ≈ 4.5 million), we identified more than 170 pairs with statistically significant genetic covariance. In particular, we found strong genetic covariance between late-onset Alzheimer’s disease (LOAD) and amyotrophic lateral sclerosis (ALS), two major neurodegenerative diseases, in single-nucleotide polymorphisms (SNPs) with high minor allele frequencies and in SNPs located in the predicted functional genome. Joint analysis of LOAD, ALS, and other traits highlights LOAD’s correlation with cognitive traits and hints at an autoimmune component for ALS.

Download Full-text

Improved methods for multi-trait fine mapping of pleiotropic risk loci

10.1101/054684 ◽

2016 ◽

Author(s):

Gleb Kichaev ◽

Megan Roytman ◽

Ruth Johnson ◽

Eleazar Eskin ◽

Sara Lindstroem ◽

...

Keyword(s):

Fine Mapping ◽

Complex Traits ◽

Functional Annotation ◽

Large Scale ◽

Association Studies ◽

Real Data ◽

Causal Variant ◽

Genome Wide Association Studies ◽

Annotation Data ◽

Association Data

AbstractGenome-wide association studies (GWAS) have identified thousands of regions in the genome that contain genetic variants that increase risk for complex traits and diseases. However, the variants uncovered in GWAS are typically not biologicaly causal, but rather, correlated to the true causal variant through linkage disequilibrium (LD). To discern the true causal variant(s), a variety of statistical fine-mapping methods have been proposed to prioritize variants for functional validation. In this work we introduce a new approach, fastPAINTOR, that leverages evidence across correlated traits, as well as functional annotation data, to improve fine-mapping accuracy at pleiotropic risk loci. To improve computational efficiency, we describe an new importance sampling scheme to perform model inference. First, we demonstrate in simulations that by leveraging functional annotation data, fastPAINTOR increases fine-mapping resolution relative to existing methods. Next, we show that jointly modeling pleiotropic risk regions improves fine-mapping resolution relative to standard single trait and pleiotropic fine mapping strategies. We report a reduction in the number of SNPs required for follow-up in order to capture 90% of the causal variants from 23 SNPs per locus using a single trait to 12 SNPs when fine-mapping two traits simultaneously. Finally, we analyze summary association data from a large-scale GWAS of lipids and show that these improvements are largely sustained in real data.

Download Full-text