scholarly journals Evaluating the quality of the 1000 Genomes Project data

2018 ◽  
Author(s):  
Saurabh Belsare ◽  
Michal Sakin-Levy ◽  
Yulia Mostovoy ◽  
Steffen Durinck ◽  
Subhra Chaudhry ◽  
...  

ABSTRACTData from the 1000 Genomes project is quite often used as a reference for human genomic analysis. However, its accuracy needs to be assessed to understand the quality of predictions made using this reference. We present here an assessment of the genotype, phasing, and imputation accuracy data in the 1000 Genomes project. We compare the phased haplotype calls from the 1000 Genomes project to experimentally phased haplotypes for 28 of the same individuals sequenced using the 10X Genomics platform. We observe that phasing and imputation for rare variants are unreliable, which likely reflects the limited sample size of the 1000 Genomes project data. Further, it appears that using a population specific reference panel does not improve the accuracy of imputation over using the entire 1000 Genomes data set as a reference panel. We also note that the error rates and trends depend on the choice of definition of error, and hence any error reporting needs to take these definitions into account.

2019 ◽  
Vol 4 ◽  
pp. 50 ◽  
Author(s):  
Ernesto Lowy-Gallego ◽  
Susan Fairley ◽  
Xiangqun Zheng-Bradley ◽  
Magali Ruffier ◽  
Laura Clarke ◽  
...  

We present a set of biallelic SNVs and INDELs, from 2,548 samples spanning 26 populations from the 1000 Genomes Project, called de novo on GRCh38. We believe this will be a useful reference resource for those using GRCh38. It represents an improvement over the “lift-overs” of the 1000 Genomes Project data that have been available to date by encompassing all of the GRCh38 primary assembly autosomes and pseudo-autosomal regions, including novel, medically relevant loci. Here, we describe how the data set was created and benchmark our call set against that produced by the final phase of the 1000 Genomes Project on GRCh37 and the lift-over of that data to GRCh38.


2021 ◽  
Author(s):  
Luis Felipe Paulin ◽  
Muthuswamy Raveendran ◽  
Ronald Alan Harris ◽  
Jeffrey Rogers ◽  
Arndt von Haeseler ◽  
...  

Recent population studies are ever growing in size of samples to investigate the diversity of a given population or species. These studies reveal ever new polymorphism that lead to important insights into the mechanisms of evolution, but are also important for the interpretation of these variations. Nevertheless, while the full catalog of variations across entire species remains unknown, we can predict which regions harbor additional variations that remain hidden and investigate their properties, thereby enhancing the analysis for potentially missed variants. To achieve this we implemented SVhound (https://github.com/lfpaulin/SVhound), which based on a population level SVs dataset can predict regions that harbor novel SV alleles. We tested SVhound using subsets of the 1000 genomes project data and showed that its correlation (average correlation of 2,800 tests r=0.7136) is high to the full data set. Next, we utilized SVhound to investigate potentially missed or understudied regions across 1KGP and CCDG that included multiple genes. Lastly we show the applicability for SVhound also on a small and novel SV call set for rhesus macaque (Macaca mulatta) and discuss the impact and choice of parameters for SVhound. Overall SVhound is a unique method to identify potential regions that harbor hidden diversity in model and non model organisms and can also be potentially used to ensure high quality of SV call sets.


2017 ◽  
Vol 7 (1) ◽  
Author(s):  
Meraj Ahmad ◽  
Anubhav Sinha ◽  
Sreya Ghosh ◽  
Vikrant Kumar ◽  
Sonia Davila ◽  
...  

2020 ◽  
Vol 15 ◽  
Author(s):  
Weiwen Zhang ◽  
Long Wang ◽  
Theint Theint Aye

Background: Asia is the largest continent in the world with a large group of populations. However, we are still in lack of an imputation server with an Asian-specific reference panel to estimate genotypes for genome wide association study in Asia. Currently, two well-known imputation servers are available, i.e., Michigan imputation server in the US and Sanger in the UK. However, the quality of imputation for Southeast Asia's populations is not satisfying by using their genotype imputation services and reference panels. Objective: In this paper, we develop ModStore imputation server with a specially designed reference panel to offer genotype imputation as a service, aiming to increase the power of genome wide association study of Singapore in the context of National Precision Medicine. Method: We present the implementation and customization of ModStore imputation server on high performance computing infrastructure. Meanwhile, we construct a reference panel based on whole-genome sequencing of Singaporeans, referred to as the SG10K reference panel, for improving the imputation accuracy of Southeast Asia's populations. Results: Experiment results show that by using the SG10K reference panel, over 79% improvement of mean Rsq can be achieved for the imputation of three Singapore ethnic populations data set, i.e., Malay, Chinese, and Indian, under MAF<0.005 compared to the 1000 Genome reference panel. Conclusion: With ModStore imputation server, genotype imputation can be performed more accurately for data derived from array-based pharmacogenomics and pre-existing Southeast Asia's population-scale genetic.


BMC Genomics ◽  
2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Saurabh Belsare ◽  
Michal Levy-Sakin ◽  
Yulia Mostovoy ◽  
Steffen Durinck ◽  
Subhra Chaudhuri ◽  
...  

2020 ◽  
Author(s):  
Peter Pfaffelhuber ◽  
Elisabeth Sester-Huss ◽  
Franz Baumdicker ◽  
Jana Naue ◽  
Sabine Lutz-Bonengel ◽  
...  

AbstractThe inference of biogeographic ancestry (BGA) has become a focus of forensic genetics. Mis-inference of BGA can have profound unwanted consequences for investigations and society. We show that recent admixture can lead to misclassification and erroneous inference of ancestry proportions, using state of the art analysis tools with (i) simulations, (ii) 1000 genomes project data, and (iii) two individuals analyzed using the ForenSeq DNA Signature Prep Kit. Subsequently, we extend existing tools for estimation of individual ancestry (IA) by allowing for different IA in both parents, leading to estimates of parental individual ancestry (PIA), and a statistical test for recent admixture. Estimation of PIA outperforms IA in most scenarios of recent admixture. Furthermore, additional information about parental ancestry can be acquired with PIA that may guide casework.


2012 ◽  
Vol 9 (5) ◽  
pp. 459-462 ◽  
Author(s):  
Laura Clarke ◽  
◽  
Xiangqun Zheng-Bradley ◽  
Richard Smith ◽  
Eugene Kulesha ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document