scholarly journals Variability in GWAS analysis: the impact of genotype calling algorithm inconsistencies

2010 ◽  
Vol 10 (4) ◽  
pp. 324-335 ◽  
Author(s):  
K Miclaus ◽  
M Chierici ◽  
C Lambert ◽  
L Zhang ◽  
S Vega ◽  
...  
2016 ◽  
Vol 6 (1) ◽  
Author(s):  
Qi Yan ◽  
Rui Chen ◽  
James S. Sutcliffe ◽  
Edwin H. Cook ◽  
Daniel E. Weeks ◽  
...  

2009 ◽  
Vol 25 (3) ◽  
pp. 309-314 ◽  
Author(s):  
Jumamurat R. Bayjanov ◽  
Michiel Wels ◽  
Marjo Starrenburg ◽  
Johan E. T. van Hylckama Vlieg ◽  
Roland J. Siezen ◽  
...  

2021 ◽  
Vol 12 ◽  
Author(s):  
Frédéric Jehl ◽  
Fabien Degalez ◽  
Maria Bernard ◽  
Frédéric Lecerf ◽  
Laetitia Lagoutte ◽  
...  

In addition to their common usages to study gene expression, RNA-seq data accumulated over the last 10 years are a yet-unexploited resource of SNPs in numerous individuals from different populations. SNP detection by RNA-seq is particularly interesting for livestock species since whole genome sequencing is expensive and exome sequencing tools are unavailable. These SNPs detected in expressed regions can be used to characterize variants affecting protein functions, and to study cis-regulated genes by analyzing allele-specific expression (ASE) in the tissue of interest. However, gene expression can be highly variable, and filters for SNP detection using the popular GATK toolkit are not yet standardized, making SNP detection and genotype calling by RNA-seq a challenging endeavor. We compared SNP calling results using GATK suggested filters, on two chicken populations for which both RNA-seq and DNA-seq data were available for the same samples of the same tissue. We showed, in expressed regions, a RNA-seq precision of 91% (SNPs detected by RNA-seq and shared by DNA-seq) and we characterized the remaining 9% of SNPs. We then studied the genotype (GT) obtained by RNA-seq and the impact of two factors (GT call-rate and read number per GT) on the concordance of GT with DNA-seq; we proposed thresholds for them leading to a 95% concordance. Applying these thresholds to 767 multi-tissue RNA-seq of 382 birds of 11 chicken populations, we found 9.5 M SNPs in total, of which ∼550,000 SNPs per tissue and population with a reliable GT (call rate ≥ 50%) and among them, ∼340,000 with a MAF ≥ 10%. We showed that such RNA-seq data from one tissue can be used to (i) detect SNPs with a strong predicted impact on proteins, despite their scarcity in each population (16,307 SIFT deleterious missenses and 590 stop-gained), (ii) study, on a large scale, cis-regulations of gene expression, with ∼81% of protein-coding and 68% of long non-coding genes (TPM ≥ 1) that can be analyzed for ASE, and with ∼29% of them that were cis-regulated, and (iii) analyze population genetic using such SNPs located in expressed regions. This work shows that RNA-seq data can be used with good confidence to detect SNPs and associated GT within various populations and used them for different analyses as GTEx studies.


2014 ◽  
Vol 30 (12) ◽  
pp. 1714-1720 ◽  
Author(s):  
Jin Zhou ◽  
Erwin Tantoso ◽  
Lai-Ping Wong ◽  
Rick Twee-Hee Ong ◽  
Jin-Xin Bei ◽  
...  

2010 ◽  
Vol 10 (4) ◽  
pp. 336-346 ◽  
Author(s):  
K Miclaus ◽  
R Wolfinger ◽  
S Vega ◽  
M Chierici ◽  
C Furlanello ◽  
...  

2006 ◽  
Vol 22 (16) ◽  
pp. 1942-1947 ◽  
Author(s):  
D. L. Nicolae ◽  
X. Wu ◽  
K. Miyake ◽  
N. J. Cox

Author(s):  
Jianping Hua ◽  
David W. Craig ◽  
Marcel Brun ◽  
Jennifer Webster ◽  
Victoria Zismann ◽  
...  

2011 ◽  
Vol 09 (06) ◽  
pp. 715-728 ◽  
Author(s):  
BILIN FU ◽  
JIN XU

Current genotype-calling methods such as Robust Linear Model with Mahalanobis Distance Classifier (RLMM) and Corrected Robust Linear Model with Maximum Likelihood Classification (CRLMM) provide accurate calling results for Affymetrix Single Nucleotide Polymorphisms (SNP) chips. However, these methods are computationally expensive as they employ preprocess procedures, including chip data normalization and other sophisticated statistical techniques. In the small sample case the accuracy rate may drop significantly. We develop a new genotype calling method for Affymetrix 100 k and 500 k SNP chips. A two-stage classification scheme is proposed to obtain a fast genotype calling algorithm. The first stage uses unsupervised classification to quickly discriminate genotypes with high accuracy for the majority of the SNPs. And the second stage employs a supervised classification method to incorporate allele frequency information either from the HapMap data or from a self-training scheme. Confidence score is provided for every genotype call. The overall performance is shown to be comparable to that of CRLMM as verified by the known gold standard HapMap data and is superior in small sample cases. The new algorithm is computationally simple and standalone in the sense that a self-training scheme can be used without employing any other training data. A package implementing the calling algorithm is freely available at .


Sign in / Sign up

Export Citation Format

Share Document