scholarly journals A Comparative Study on Multifactor Dimensionality Reduction Methods for Detecting Gene-Gene Interactions with the Survival Phenotype

2015 ◽  
Vol 2015 ◽  
pp. 1-7 ◽  
Author(s):  
Seungyeoun Lee ◽  
Yongkang Kim ◽  
Min-Seok Kwon ◽  
Taesung Park

Genome-wide association studies (GWAS) have extensively analyzed single SNP effects on a wide variety of common and complex diseases and found many genetic variants associated with diseases. However, there is still a large portion of the genetic variants left unexplained. This missing heritability problem might be due to the analytical strategy that limits analyses to only single SNPs. One of possible approaches to the missing heritability problem is to consider identifying multi-SNP effects or gene-gene interactions. The multifactor dimensionality reduction method has been widely used to detect gene-gene interactions based on the constructive induction by classifying high-dimensional genotype combinations into one-dimensional variable with two attributes of high risk and low risk for the case-control study. Many modifications of MDR have been proposed and also extended to the survival phenotype. In this study, we propose several extensions of MDR for the survival phenotype and compare the proposed extensions with earlier MDR through comprehensive simulation studies.

2019 ◽  
Vol 15 ◽  
pp. 117693431986086
Author(s):  
Shan-Shan Dong ◽  
Yan Guo ◽  
Tie-Lin Yang

Genome-wide association studies (GWASs) have successfully identified thousands of susceptibility loci for human complex diseases. However, missing heritability is still a challenging problem. Considering most GWAS loci are located in regulatory elements, we recently developed a pipeline named functional disease-associated single-nucleotide polymorphisms (SNPs) prediction (FDSP), to predict novel susceptibility loci for complex diseases based on the interpretation of regulatory features and published GWAS results with machine learning. When applied to type 2 diabetes and hypertension, the predicted susceptibility loci by FDSP were proved to be capable of explaining additional heritability. In addition, potential target genes of the predicted positive SNPs were significantly enriched in disease-related pathways. Our results suggested that taking regulatory features into consideration might be a useful way to address the missing heritability problem. We hope FDSP could offer help for the identification of novel susceptibility loci for complex diseases.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Mira Park ◽  
Hoe-Bin Jeong ◽  
Jong-Hyun Lee ◽  
Taesung Park

Abstract Background Identifying interaction effects between genes is one of the main tasks of genome-wide association studies aiming to shed light on the biological mechanisms underlying complex diseases. Multifactor dimensionality reduction (MDR) is a popular approach for detecting gene–gene interactions that has been extended in various forms to handle binary and continuous phenotypes. However, only few multivariate MDR methods are available for multiple related phenotypes. Current approaches use Hotelling’s T2 statistic to evaluate interaction models, but it is well known that Hotelling’s T2 statistic is highly sensitive to heavily skewed distributions and outliers. Results We propose a robust approach based on nonparametric statistics such as spatial signs and ranks. The new multivariate rank-based MDR (MR-MDR) is mainly suitable for analyzing multiple continuous phenotypes and is less sensitive to skewed distributions and outliers. MR-MDR utilizes fuzzy k-means clustering and classifies multi-locus genotypes into two groups. Then, MR-MDR calculates a spatial rank-sum statistic as an evaluation measure and selects the best interaction model with the largest statistic. Our novel idea lies in adopting nonparametric statistics as an evaluation measure for robust inference. We adopt tenfold cross-validation to avoid overfitting. Intensive simulation studies were conducted to compare the performance of MR-MDR with current methods. Application of MR-MDR to a real dataset from a Korean genome-wide association study demonstrated that it successfully identified genetic interactions associated with four phenotypes related to kidney function. The R code for conducting MR-MDR is available at https://github.com/statpark/MR-MDR. Conclusions Intensive simulation studies comparing MR-MDR with several current methods showed that the performance of MR-MDR was outstanding for skewed distributions. Additionally, for symmetric distributions, MR-MDR showed comparable power. Therefore, we conclude that MR-MDR is a useful multivariate non-parametric approach that can be used regardless of the phenotype distribution, the correlations between phenotypes, and sample size.


PeerJ ◽  
2018 ◽  
Vol 6 ◽  
pp. e5854 ◽  
Author(s):  
Faramarz Dorani ◽  
Ting Hu ◽  
Michael O. Woods ◽  
Guangju Zhai

Colorectal cancer (CRC) has a high incident rate in both men and women and is affecting millions of people every year. Genome-wide association studies (GWAS) on CRC have successfully revealed common single-nucleotide polymorphisms (SNPs) associated with CRC risk. However, they can only explain a very limited fraction of the disease heritability. One reason may be the common uni-variable analyses in GWAS where genetic variants are examined one at a time. Given the complexity of cancers, the non-additive interaction effects among multiple genetic variants have a potential of explaining the missing heritability. In this study, we employed two powerful ensemble learning algorithms, random forests and gradient boosting machine (GBM), to search for SNPs that contribute to the disease risk through non-additive gene-gene interactions. We were able to find 44 possible susceptibility SNPs that were ranked most significant by both algorithms. Out of those 44 SNPs, 29 are in coding regions. The 29 genes include ARRDC5, DCC, ALK, and ITGA1, which have been found previously associated with CRC, and E2F3 and NID2, which are potentially related to CRC since they have known associations with other types of cancer. We performed pairwise and three-way interaction analysis on the 44 SNPs using information theoretical techniques and found 17 pairwise (p < 0.02) and 16 three-way (p ≤ 0.001) interactions among them. Moreover, functional enrichment analysis suggested 16 functional terms or biological pathways that may help us better understand the etiology of the disease.


2021 ◽  
Author(s):  
Marcin Kierczak ◽  
Nima Rafati ◽  
Julia Höglund ◽  
Hadrien Gourle ◽  
Daniel Schmitz ◽  
...  

Abstract Despite the success in identifying effects of common genetic variants, using genome-wide association studies (GWAS), much of the genetic contribution to complex traits remains unexplained. Here, we analysed high coverage whole-genome sequencing (WGS) data, to evaluate the contribution of rare genetic variants to 414 plasma proteins. The frequency distribution of genetic variants was skewed towards the rare spectrum, and damaging variants were more often rare. However, only 2.24% of the heritability was estimated to be explained by rare variants. A gene-based approach, developed to also capture the effect of rare variants, identified associations for 249 of the proteins, which was 25% more as compared to a GWAS. Out of those, 24 associations were driven by rare variants, clearly highlighting the capacity of aggregated tests and WGS data. We conclude that, while many rare variants have considerable phenotypic effects, their contribution to the missing heritability is limited by their low frequencies.


2017 ◽  
Vol 11 ◽  
pp. 117793221773509 ◽  
Author(s):  
Baishali Bandyopadhyay ◽  
Veda Chanda ◽  
Yupeng Wang

Thousands of genome-wide association studies (GWAS) have been conducted to identify the genetic variants associated with complex disorders. However, only a small proportion of phenotypic variances can be explained by the reported variants. Moreover, many GWAS failed to identify genetic variants associated with disorders displaying hereditary features. The “missing heritability” problem can be partly explained by rare variants. We simulated a causality scenario that gestational ages, a quantitative trait that can distinguish preterm (<37 weeks) and term births, were significantly correlated with the rare variant aggregations at 1000 single-nucleotide polymorphism loci. These 1000 simulated causal rare variants were embedded into randomly selected subsets of 9642 promoter regions from the 1000 Genomes Project genotypic data according to different proportions of causal rare variants within the embedded promoters. Through analysis of the correlations between rare variant aggregations and gestational ages, we found that the embedded promoters as a whole showed weaker genetic association when the proportion of causal rare variants decreased, and no individual embedded promoters showed genetic association when the proportion of causal rare variants was smaller than 0.4. Our analyses indicate that association signals can be greatly diluted when causal rare variants are dispersedly and sparsely distributed in the genome, accounting for an important source of missing heritability.


Author(s):  
Stacey Winham ◽  
Chong Wang ◽  
Alison A Motsinger-Reif

Recently, the amount of high-dimensional data has exploded, creating new analytical challenges for human genetics. Furthermore, much evidence suggests that common complex diseases may be due to complex etiologies such as gene-gene interactions, which are difficult to identify in high-dimensional data using traditional statistical approaches. Data-mining approaches are gaining popularity for variable selection in association studies, and one of the most commonly used methods to evaluate potential gene-gene interactions is Multifactor Dimensionality Reduction (MDR). Additionally, a number of penalized regression techniques, such as Lasso, are gaining popularity within the statistical community and are now being applied to association studies, including extensions for interactions. In this study, we compare the performance of MDR, the traditional lasso with L1 penalty (TL1), and the group lasso for categorical data with group-wise L1 penalty (GL1) to detect gene-gene interactions through a broad range of simulations.We find that each method has both advantages and disadvantages, and relative performance is context dependent. TL1 frequently over-fits, identifying false positive as well as true positive loci. MDR has higher power for epistatic models that exhibit independent main effects; for both Lasso methods, main effects tend to dominate. For purely epistatic models, GL1 has the best performance for lower minor allele frequencies, but MDR performs best for higher frequencies. These results provide guidance of when each approach might be best suited for detecting and characterizing interactions with different mechanisms.


2019 ◽  
Vol 2019 ◽  
pp. 1-10
Author(s):  
Hyein Kim ◽  
Hoe-Bin Jeong ◽  
Hye-Young Jung ◽  
Taesung Park ◽  
Mira Park

To understand the pathophysiology of complex diseases, including hypertension, diabetes, and autism, deleterious phenotypes are unlikely due to the effects of single genes, but rather, gene-gene interactions (GGIs), which are widely analyzed by multifactor dimensionality reduction (MDR). Early MDR methods mainly focused on binary traits. More recently, several extensions of MDR have been developed for analyzing various traits such as quantitative traits and survival times. Newer technologies, such as genome-wide association studies (GWAS), have now been developed for assessing multiple traits, to simultaneously identify genetic variants associated with various pathological phenotypes. It has also been well demonstrated that analyzing multiple traits has several advantages over single trait analysis. While there remains a need to find GGIs for multiple traits, such studies have become more difficult, due to a lack of novel methods and software. Herein, we propose a novel multi-CMDR method, by combining fuzzy clustering and MDR, to find GGIs for multiple traits. Multi-CMDR showed similar power to existing methods, when phenotypes followed bivariate normal distributions, and showed better power than others for skewed distributions. The validity of multi-CMDR was confirmed by analyzing real-life Korean GWAS data.


2018 ◽  
Vol 16 (06) ◽  
pp. 1840026 ◽  
Author(s):  
Sungkyoung Choi ◽  
Sungyoung Lee ◽  
Yongkang Kim ◽  
Heungsun Hwang ◽  
Taesung Park

Although genome-wide association studies (GWAS) have successfully identified thousands of single nucleotide polymorphisms (SNPs) associated with common diseases, these observations are limited for fully explaining “missing heritability”. Determining gene–gene interactions (GGI) are one possible avenue for addressing the missing heritability problem. While many statistical approaches have been proposed to detect GGI, most of these focus primarily on SNP-to-SNP interactions. While there are many advantages of gene-based GGI analyses, such as reducing the burden of multiple-testing correction, and increasing power by aggregating multiple causal signals across SNPs in specific genes, only a few methods are available. In this study, we proposed a new statistical approach for gene-based GGI analysis, “Hierarchical structural CoMponent analysis of Gene–Gene Interactions” (HisCoM-GGI). HisCoM-GGI is based on generalized structured component analysis, and can consider hierarchical structural relationships between genes and SNPs. For a pair of genes, HisCoM-GGI first effectively summarizes all possible pairwise SNP–SNP interactions into a latent variable, from which it then performs GGI analysis. HisCoM-GGI can evaluate both gene-level and SNP-level interactions. Through simulation studies, HisCoM-GGI demonstrated higher statistical power than existing gene-based GGI methods, in analyzing a GWAS of a Korean population for identifying GGI associated with body mass index. Resultantly, HisCoM-GGI successfully identified 14 potential GGI, two of which, (NCOR2 [Formula: see text] SPOCK1) and (LINGO2 [Formula: see text] ZNF385D) were successfully replicated in independent datasets. We conclude that HisCoM-GGI method may be a valuable tool for genome to identify GGI in missing heritability, allowing us to better understand the biological genetic mechanisms of complex traits. We conclude that HisCoM-GGI method may be a valuable tool for genome to identify GGI in missing heritability, allowing us to better understand biological genetic mechanisms of complex traits. An implementation of HisCoM-GGI can be downloaded from the website ( http://statgen.snu.ac.kr/software/hiscom-ggi ).


2019 ◽  
Vol 26 (34) ◽  
pp. 6207-6221 ◽  
Author(s):  
Innocenzo Rainero ◽  
Alessandro Vacca ◽  
Flora Govone ◽  
Annalisa Gai ◽  
Lorenzo Pinessi ◽  
...  

Migraine is a common, chronic neurovascular disorder caused by a complex interaction between genetic and environmental risk factors. In the last two decades, molecular genetics of migraine have been intensively investigated. In a few cases, migraine is transmitted as a monogenic disorder, and the disease phenotype cosegregates with mutations in different genes like CACNA1A, ATP1A2, SCN1A, KCNK18, and NOTCH3. In the common forms of migraine, candidate genes as well as genome-wide association studies have shown that a large number of genetic variants may increase the risk of developing migraine. At present, few studies investigated the genotype-phenotype correlation in patients with migraine. The purpose of this review was to discuss recent studies investigating the relationship between different genetic variants and the clinical characteristics of migraine. Analysis of genotype-phenotype correlations in migraineurs is complicated by several confounding factors and, to date, only polymorphisms of the MTHFR gene have been shown to have an effect on migraine phenotype. Additional genomic studies and network analyses are needed to clarify the complex pathways underlying migraine and its clinical phenotypes.


Sign in / Sign up

Export Citation Format

Share Document