Haplotype reconstruction from genotype data using Imperfect Phylogeny

E. Halperin; E. Eskin

doi:10.1093/bioinformatics/bth149

A SURVEY ON HAPLOTYPING ALGORITHMS FOR TIGHTLY LINKED MARKERS

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720008003369 ◽

2008 ◽

Vol 06 (01) ◽

pp. 241-259 ◽

Cited By ~ 13

Author(s):

JING LI ◽

TAO JIANG

Keyword(s):

Complex Traits ◽

Genotype Data ◽

Nucleotide Polymorphisms ◽

Haplotype Reconstruction ◽

Pedigree Data ◽

Heritable Variation ◽

Single Nucleotide ◽

Haplotype Data ◽

Pooled Samples ◽

Effective Representation

Two grand challenges in the postgenomic era are to develop a detailed understanding of heritable variation in the human genome, and to develop robust strategies for identifying the genetic contribution to diseases and drug responses. Haplotypes of single nucleotide polymorphisms (SNPs) have been suggested as an effective representation of human variation, and various haplotype-based association mapping methods for complex traits have been proposed in the literature. However, humans are diploid and, in practice, genotype data instead of haplotype data are collected directly. Therefore, efficient and accurate computational methods for haplotype reconstruction are needed and have recently been investigated intensively, especially for tightly linked markers such as SNPs. This paper reviews statistical and combinatorial haplotyping algorithms using pedigree data, unrelated individuals, or pooled samples.

Download Full-text

SPEARS: Standard Performance Evaluation of Ancestral haplotype Reconstruction through Simulation

Bioinformatics ◽

10.1093/bioinformatics/btaa749 ◽

2020 ◽

Author(s):

Heather Manching ◽

Randall J Wisser

Keyword(s):

Genomic Variation ◽

Ancestral Haplotype ◽

Supplementary Information ◽

Genotype Data ◽

Haplotype Structure ◽

Haplotype Reconstruction ◽

Large Numbers ◽

Genome Wide ◽

Reconstruction Software ◽

Outcross Progeny

Abstract Motivation Ancestral haplotype maps provide useful information about genomic variation and insights into biological processes. Reconstructing the descendent haplotype structure of homologous chromosomes, particularly for large numbers of individuals, can help with characterizing the recombination landscape, elucidating genotype-to-phenotype relationships, improving genomic predictions and more. Inferring haplotype maps from sparse genotype data is an efficient approach to whole-genome haplotyping, but this is a non-trivial problem. A standardized approach is needed to validate whether haplotype reconstruction software, conceived population designs and existing data for a given population provides accurate haplotype information for further inference. Results We introduce SPEARS, a pipeline for the simulation-based appraisal of genome-wide haplotype maps constructed from sparse genotype data. Using a specified pedigree, the pipeline generates virtual genotypes (known data) with genotyping errors and missing data structure. It then proceeds to mimic analysis in practice, capturing sources of error due to genotyping, imputation and haplotype inference. Standard metrics allow researchers to assess different population designs and which features of haplotype structure or regions of the genome are sufficiently accurate for analysis. Haplotype maps for 1000 outcross progeny from a multi-parent population of maize are used to demonstrate SPEARS. Availabilityand implementation SPEARS, the protocol and suite of scripts, are publicly available under an MIT license at GitHub (https://github.com/maizeatlas/spears).. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

A Comparison of Bayesian Methods for Haplotype Reconstruction from Population Genotype Data

The American Journal of Human Genetics ◽

10.1086/379378 ◽

2003 ◽

Vol 73 (5) ◽

pp. 1162-1169 ◽

Cited By ~ 2636

Author(s):

Matthew Stephens ◽

Peter Donnelly

Keyword(s):

Bayesian Methods ◽

Genotype Data ◽

Haplotype Reconstruction

Download Full-text

Self-Optimizing Parallel Algorithms for Haplotype Reconstruction and Their Evaluation on the JPT and CHB Genotype Data

2007 IEEE 7th International Symposium on BioInformatics and BioEngineering ◽

10.1109/bibe.2007.4375734 ◽

2007 ◽

Cited By ~ 1

Author(s):

Dragos Trinca ◽

Sanguthevar Rajasekaran

Keyword(s):

Parallel Algorithms ◽

Genotype Data ◽

Haplotype Reconstruction

Download Full-text

Faculty Opinions recommendation of SimPed: a simulation program to generate haplotype and genotype data for pedigree structures.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1163566.625312 ◽

2009 ◽

Author(s):

Alejandro Schaffer

Keyword(s):

Simulation Program ◽

Genotype Data

Download Full-text

Analysis of Flavin-Containing Monooxygenase 3 Genotype Data in Populations Administered the Anti-Schizophrenia Agent Olanzapine

Drug Metabolism Letters ◽

10.2174/187231208784040942 ◽

2008 ◽

Vol 2 (2) ◽

pp. 100-114 ◽

Cited By ~ 10

Author(s):

John Cashman ◽

Jun Zhang ◽

Matthew Nelson ◽

Andreas Braun

Keyword(s):

Genotype Data

Download Full-text

HTreeQA: Using Semi-Perfect Phylogeny Trees in Quantitative Trait Loci Study on Genotype Data

G3 Genes|Genome|Genetics ◽

10.1534/g3.111.001768 ◽

2012 ◽

Vol 2 (2) ◽

pp. 175-189 ◽

Cited By ~ 12

Author(s):

Zhaojun Zhang ◽

Xiang Zhang ◽

Wei Wang

Keyword(s):

Quantitative Trait Loci ◽

Quantitative Trait ◽

Genotype Data ◽

Perfect Phylogeny ◽

Trait Loci

Download Full-text

Bayesian Inference of Recent Migration Rates Using Multilocus Genotypes

Genetics ◽

10.1093/genetics/163.3.1177 ◽

2003 ◽

Vol 163 (3) ◽

pp. 1177-1191 ◽

Cited By ~ 13

Author(s):

Gregory A Wilson ◽

Bruce Rannala

Keyword(s):

Probability Distributions ◽

Gray Wolf ◽

Data Sets ◽

Genotype Data ◽

Migration Rates ◽

Multilocus Genotypes ◽

Monte Carlo Techniques ◽

Inbreeding Coefficients ◽

Genotype Frequencies ◽

Marker Loci

Abstract A new Bayesian method that uses individual multilocus genotypes to estimate rates of recent immigration (over the last several generations) among populations is presented. The method also estimates the posterior probability distributions of individual immigrant ancestries, population allele frequencies, population inbreeding coefficients, and other parameters of potential interest. The method is implemented in a computer program that relies on Markov chain Monte Carlo techniques to carry out the estimation of posterior probabilities. The program can be used with allozyme, microsatellite, RFLP, SNP, and other kinds of genotype data. We relax several assumptions of early methods for detecting recent immigrants, using genotype data; most significantly, we allow genotype frequencies to deviate from Hardy-Weinberg equilibrium proportions within populations. The program is demonstrated by applying it to two recently published microsatellite data sets for populations of the plant species Centaurea corymbosa and the gray wolf species Canis lupus. A computer simulation study suggests that the program can provide highly accurate estimates of migration rates and individual migrant ancestries, given sufficient genetic differentiation among populations and sufficient numbers of marker loci.

Download Full-text

Haplotype reconstruction from SNP alignment

Proceedings of the seventh annual international conference on Computational molecular biology - RECOMB '03 ◽

10.1145/640075.640102 ◽

2003 ◽

Cited By ~ 4

Author(s):

Lei Li ◽

Jong Hyun Kim ◽

Michael S. Waterman

Keyword(s):

Haplotype Reconstruction

Download Full-text

Integrating data from multiple Finnish biobanks and national health-care registers for retrospective studies: Practical experiences

Scandinavian Journal of Public Health ◽

10.1177/14034948211004421 ◽

2021 ◽

pp. 140349482110044

Author(s):

Jaakko Lähteenmäki ◽

Anna-Leena Vuorinen ◽

Juha Pajula ◽

Kari Harno ◽

Mika Lehto ◽

...

Keyword(s):

Health Care ◽

Statistical Power ◽

Data Exchange ◽

Register Data ◽

Genotype Data ◽

Multiple Sources ◽

Hospital Data ◽

National Register ◽

National Register Data ◽

Health Care Encounters

Aim: This case study aimed to investigate the process of integrating resources of multiple biobanks and health-care registers, especially addressing data permit application, time schedules, co-operation of stakeholders, data exchange and data quality. Methods: We investigated the process in the context of a retrospective study: Pharmacogenomics of antithrombotic drugs (PreMed study). The study involved linking the genotype data of three Finnish biobanks (Auria Biobank, Helsinki Biobank and THL Biobank) with register data on medicine dispensations, health-care encounters and laboratory results. Results: We managed to collect a cohort of 7005 genotyped individuals, thereby achieving the statistical power requirements of the study. The data collection process took 16 months, exceeding our original estimate by seven months. The main delays were caused by the congested data permit approval service to access national register data on health-care encounters. Comparison of hospital data lakes and national registers revealed differences, especially concerning medication data. Genetic variant frequencies were in line with earlier data reported for the European population. The yearly number of international normalised ratio (INR) tests showed stable behaviour over time. Conclusions: A large cohort, consisting of versatile individual-level phenotype and genotype data, can be constructed by integrating data from several biobanks and health data registers in Finland. Co-operation with biobanks is straightforward. However, long time periods need to be reserved when biobank resources are linked with national register data. There is a need for efforts to define general, harmonised co-operation practices and data exchange methods for enabling efficient collection of data from multiple sources.

Download Full-text