Evaluating the quality of the 1000 Genomes Project data

Mapping Intimacies ◽

10.1101/383950 ◽

2018 ◽

Cited By ~ 2

Author(s):

Saurabh Belsare ◽

Michal Sakin-Levy ◽

Yulia Mostovoy ◽

Steffen Durinck ◽

Subhra Chaudhry ◽

...

Keyword(s):

Imputation Accuracy ◽

Genomic Analysis ◽

Error Rates ◽

Reference Panel ◽

Specific Reference ◽

1000 Genomes Project ◽

Data Set ◽

1000 Genomes ◽

Project Data

ABSTRACTData from the 1000 Genomes project is quite often used as a reference for human genomic analysis. However, its accuracy needs to be assessed to understand the quality of predictions made using this reference. We present here an assessment of the genotype, phasing, and imputation accuracy data in the 1000 Genomes project. We compare the phased haplotype calls from the 1000 Genomes project to experimentally phased haplotypes for 28 of the same individuals sequenced using the 10X Genomics platform. We observe that phasing and imputation for rare variants are unreliable, which likely reflects the limited sample size of the 1000 Genomes project data. Further, it appears that using a population specific reference panel does not improve the accuracy of imputation over using the entire 1000 Genomes data set as a reference panel. We also note that the error rates and trends depend on the choice of definition of error, and hence any error reporting needs to take these definitions into account.

Download Full-text

Variant calling on the GRCh38 assembly with the data from phase three of the 1000 Genomes Project

Wellcome Open Research ◽

10.12688/wellcomeopenres.15126.2 ◽

2019 ◽

Vol 4 ◽

pp. 50 ◽

Cited By ~ 7

Author(s):

Ernesto Lowy-Gallego ◽

Susan Fairley ◽

Xiangqun Zheng-Bradley ◽

Magali Ruffier ◽

Laura Clarke ◽

...

Keyword(s):

De Novo ◽

Variant Calling ◽

Final Phase ◽

1000 Genomes Project ◽

Data Set ◽

1000 Genomes ◽

Project Data

We present a set of biallelic SNVs and INDELs, from 2,548 samples spanning 26 populations from the 1000 Genomes Project, called de novo on GRCh38. We believe this will be a useful reference resource for those using GRCh38. It represents an improvement over the “lift-overs” of the 1000 Genomes Project data that have been available to date by encompassing all of the GRCh38 primary assembly autosomes and pseudo-autosomal regions, including novel, medically relevant loci. Here, we describe how the data set was created and benchmark our call set against that produced by the final phase of the 1000 Genomes Project on GRCh37 and the lift-over of that data to GRCh38.

Download Full-text

SVhound: Detection of future Structural Variation hotspots

10.1101/2021.04.09.439237 ◽

2021 ◽

Author(s):

Luis Felipe Paulin ◽

Muthuswamy Raveendran ◽

Ronald Alan Harris ◽

Jeffrey Rogers ◽

Arndt von Haeseler ◽

...

Keyword(s):

Population Level ◽

Model Organisms ◽

Average Correlation ◽

Full Data ◽

Data Set ◽

1000 Genomes ◽

Unique Method ◽

Project Data ◽

The Impact

Recent population studies are ever growing in size of samples to investigate the diversity of a given population or species. These studies reveal ever new polymorphism that lead to important insights into the mechanisms of evolution, but are also important for the interpretation of these variations. Nevertheless, while the full catalog of variations across entire species remains unknown, we can predict which regions harbor additional variations that remain hidden and investigate their properties, thereby enhancing the analysis for potentially missed variants. To achieve this we implemented SVhound (https://github.com/lfpaulin/SVhound), which based on a population level SVs dataset can predict regions that harbor novel SV alleles. We tested SVhound using subsets of the 1000 genomes project data and showed that its correlation (average correlation of 2,800 tests r=0.7136) is high to the full data set. Next, we utilized SVhound to investigate potentially missed or understudied regions across 1KGP and CCDG that included multiple genes. Lastly we show the applicability for SVhound also on a small and novel SV call set for rhesus macaque (Macaca mulatta) and discuss the impact and choice of parameters for SVhound. Overall SVhound is a unique method to identify potential regions that harbor hidden diversity in model and non model organisms and can also be potentially used to ensure high quality of SV call sets.

Download Full-text

Inclusion of Population-specific Reference Panel from India to the 1000 Genomes Phase 3 Panel Improves Imputation Accuracy

Scientific Reports ◽

10.1038/s41598-017-06905-6 ◽

2017 ◽

Vol 7 (1) ◽

Cited By ~ 7

Author(s):

Meraj Ahmad ◽

Anubhav Sinha ◽

Sreya Ghosh ◽

Vikrant Kumar ◽

Sonia Davila ◽

...

Keyword(s):

Imputation Accuracy ◽

Reference Panel ◽

Specific Reference ◽

Phase 3 ◽

1000 Genomes

Download Full-text

ModStore:Genotype Imputationasa Service Powered by SG10K Reference Panel

Current Bioinformatics ◽

10.2174/1574893615999200831112522 ◽

2020 ◽

Vol 15 ◽

Author(s):

Weiwen Zhang ◽

Long Wang ◽

Theint Theint Aye

Keyword(s):

Association Study ◽

High Performance ◽

Genome Wide Association Study ◽

Imputation Accuracy ◽

Genotype Imputation ◽

Reference Panel ◽

Genome Wide Association ◽

Specific Reference ◽

Data Set ◽

Genome Wide

Background: Asia is the largest continent in the world with a large group of populations. However, we are still in lack of an imputation server with an Asian-specific reference panel to estimate genotypes for genome wide association study in Asia. Currently, two well-known imputation servers are available, i.e., Michigan imputation server in the US and Sanger in the UK. However, the quality of imputation for Southeast Asia's populations is not satisfying by using their genotype imputation services and reference panels. Objective: In this paper, we develop ModStore imputation server with a specially designed reference panel to offer genotype imputation as a service, aiming to increase the power of genome wide association study of Singapore in the context of National Precision Medicine. Method: We present the implementation and customization of ModStore imputation server on high performance computing infrastructure. Meanwhile, we construct a reference panel based on whole-genome sequencing of Singaporeans, referred to as the SG10K reference panel, for improving the imputation accuracy of Southeast Asia's populations. Results: Experiment results show that by using the SG10K reference panel, over 79% improvement of mean Rsq can be achieved for the imputation of three Singapore ethnic populations data set, i.e., Malay, Chinese, and Indian, under MAF<0.005 compared to the 1000 Genome reference panel. Conclusion: With ModStore imputation server, genotype imputation can be performed more accurately for data derived from array-based pharmacogenomics and pre-existing Southeast Asia's population-scale genetic.

Download Full-text

Evaluating the quality of the 1000 genomes project data

BMC Genomics ◽

10.1186/s12864-019-5957-x ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 7

Author(s):

Saurabh Belsare ◽

Michal Levy-Sakin ◽

Yulia Mostovoy ◽

Steffen Durinck ◽

Subhra Chaudhuri ◽

...

Keyword(s):

1000 Genomes Project ◽

1000 Genomes ◽

Project Data

Download Full-text

Empirical estimation of genome-wide significance thresholds based on the 1000 Genomes Project data set

Journal of Human Genetics ◽

10.1038/jhg.2016.72 ◽

2016 ◽

Vol 61 (10) ◽

pp. 861-866 ◽

Cited By ~ 36

Author(s):

Masahiro Kanai ◽

Toshihiro Tanaka ◽

Yukinori Okada

Keyword(s):

Empirical Estimation ◽

1000 Genomes Project ◽

Data Set ◽

1000 Genomes ◽

Genome Wide ◽

Project Data ◽

Genome Wide Significance

Download Full-text

Inference of recent admixture using genotype data

10.1101/2020.09.16.300640 ◽

2020 ◽

Author(s):

Peter Pfaffelhuber ◽

Elisabeth Sester-Huss ◽

Franz Baumdicker ◽

Jana Naue ◽

Sabine Lutz-Bonengel ◽

...

Keyword(s):

State Of The Art ◽

Forensic Genetics ◽

Statistical Test ◽

Genotype Data ◽

1000 Genomes Project ◽

Additional Information ◽

1000 Genomes ◽

Project Data ◽

Ancestry Proportions ◽

Individual Ancestry

AbstractThe inference of biogeographic ancestry (BGA) has become a focus of forensic genetics. Mis-inference of BGA can have profound unwanted consequences for investigations and society. We show that recent admixture can lead to misclassification and erroneous inference of ancestry proportions, using state of the art analysis tools with (i) simulations, (ii) 1000 genomes project data, and (iii) two individuals analyzed using the ForenSeq DNA Signature Prep Kit. Subsequently, we extend existing tools for estimation of individual ancestry (IA) by allowing for different IA in both parents, leading to estimates of parental individual ancestry (PIA), and a statistical test for recent admixture. Estimation of PIA outperforms IA in most scenarios of recent admixture. Furthermore, additional information about parental ancestry can be acquired with PIA that may guide casework.

Download Full-text

Sequencing of GJB2 in Cameroonians and Black South Africans and comparison to 1000 Genomes Project Data Support Need to Revise Strategy for Discovery of Nonsyndromic Deafness Genes in Africans

OMICS A Journal of Integrative Biology ◽

10.1089/omi.2014.0063 ◽

2014 ◽

Vol 18 (11) ◽

pp. 705-710 ◽

Cited By ~ 15

Author(s):

Jason Bosch ◽

Jean Jacques N. Noubiap ◽

Collet Dandara ◽

Nomlindo Makubalo ◽

Galen Wright ◽

...

Keyword(s):

1000 Genomes Project ◽

Nonsyndromic Deafness ◽

South Africans ◽

1000 Genomes ◽

Data Support ◽

Black South Africans ◽

Project Data

Download Full-text

The 1000 Genomes Project: data management and community access

Nature Methods ◽

10.1038/nmeth.1974 ◽

2012 ◽

Vol 9 (5) ◽

pp. 459-462 ◽

Cited By ~ 168

Author(s):

Laura Clarke ◽

◽

Xiangqun Zheng-Bradley ◽

Richard Smith ◽

Eugene Kulesha ◽

...

Keyword(s):

Data Management ◽

1000 Genomes Project ◽

1000 Genomes ◽

Project Data ◽

Community Access

Download Full-text

A comparison of cataloged variation between International HapMap Consortium and 1000 Genomes Project data

Journal of the American Medical Informatics Association ◽

10.1136/amiajnl-2011-000652 ◽

2012 ◽

Vol 19 (2) ◽

pp. 289-294 ◽

Cited By ~ 50

Author(s):

Carrie C Buchanan ◽

Eric S Torstenson ◽

William S Bush ◽

Marylyn D Ritchie

Keyword(s):

1000 Genomes Project ◽

1000 Genomes ◽

International Hapmap Consortium ◽

Project Data

Download Full-text