SNP data analysis in genome-wide association studies

2011 ◽  
Author(s):  
Can Yang
Author(s):  
M. Shamila ◽  
Amit Kumar Tyagi

Genome-wide association studies (GWAS) or genetic data analysis is used to discover common genetic factors which influence the health of human beings and become a part of a disease. The concept of using genomics has increased in recent years, especially in e-healthcare. Today there is huge improvement required in this field or genomics. Note that the terms genomics and genetics are not similar terms here. Basically, the human genome is made up of DNA, which consists of four different chemical building blocks (called bases and abbreviated A, T, C, and G). Based on this, we differentiate each and every human being living on earth. The term ‘genetics' originated from the Greek word ‘genetikos'. It means ‘origin'. In simple terms, genetics can be defined as a branch of biology, which deals with the study of the functionalities and composition of a single gene in an organism. There are mainly three branches of genetics, which include classical genetics, molecular genetics, and population genetics.


2015 ◽  
Author(s):  
Guo-Bo Chen ◽  
Sang Hong Lee ◽  
Zhi-Xiang Zhu ◽  
Beben Benyamin ◽  
Matthew R Robinson

We apply the statistical framework for genome-wide association studies (GWAS) to eigenvector decomposition (EigenGWAS), which is commonly used in population genetics to characterise the structure of genetic data. We show that loci under selection can be detected in a structured population by using eigenvectors as phenotypes in a single-marker GWAS. We find LCT to be under selection between HapMap CEU-TSI cohorts, a finding that was replicated across European countries in the POPRES samples. HERC2 was also found to be differentiated between both the CEU-TSI cohort and among POPRES samples, reflecting the likely anthropological differences in skin and hair colour between northern and southern European populations. We show that when determining the effect of a SNP on an eigenvector, three methods of single-marker regression of eigenvectors, best linear unbiased prediction of eigenvectors, and singular value decomposition of SNP data are equivalent to each other. We also demonstrate that estimated SNP effects on eigenvectors from a reference panel can be used to predict eigenvectors (the projected eigenvectors) in a target sample with high accuracy, particularly for the primary eigenvectors. Under this GWAS framework, ancestry informative markers and loci under selection can be identified, and population structure can be captured and easily interpreted. We have developed freely available software to facilitate the application of the methods (https://github.com/gc5k/GEAR/wiki/EigenGWAS).


PLoS ONE ◽  
2011 ◽  
Vol 6 (10) ◽  
pp. e24982 ◽  
Author(s):  
Faheem Mitha ◽  
Herodotos Herodotou ◽  
Nedyalko Borisov ◽  
Chen Jiang ◽  
Josh Yoder ◽  
...  

Author(s):  
Siddharth Sharma

Increasingly, genomics is being used for the prediction of specific traits and diseases (phenotypes) among humans. Wider availability of genomics data through multiple research projects (such as International HapMap Project1 and 1000 Genomes2) has been a catalyst in that direction. With the recent advances in machine learning and big data analysis, data computation resources and data models needed for genomics data analysis are readily available. However, the prediction of traits and diseases has its own challenges in terms of computational requirements and computational analysis, statistical analysis (example: confounding variables), and limited quality of data collection. Linear Mixed Models (LMM, a type of linear regression) is a common approach for Genome-wide Association Studies (GWAS) for the prediction of common traits among humans using genomics. This paper researches the existing LMM-based approaches for Genome-wide Association Studies (GWAS), describes the experiment performed on FaST-LMM approach from Microsoft Research, and then proposes an enhanced approach (called LMM-22) on how to address computational and statistical issues. LMM-22 focuses on the parallelization of LMM computations and execution of LMM-22 on General Purpose Graphics Processing Units (GPU) as against CPUs to accelerate the LMM approach for GWAS studies.


Sign in / Sign up

Export Citation Format

Share Document