scholarly journals Haplotype reconstruction from genotype data using Imperfect Phylogeny

2004 ◽  
Vol 20 (12) ◽  
pp. 1842-1849 ◽  
Author(s):  
E. Halperin ◽  
E. Eskin
2008 ◽  
Vol 06 (01) ◽  
pp. 241-259 ◽  
Author(s):  
JING LI ◽  
TAO JIANG

Two grand challenges in the postgenomic era are to develop a detailed understanding of heritable variation in the human genome, and to develop robust strategies for identifying the genetic contribution to diseases and drug responses. Haplotypes of single nucleotide polymorphisms (SNPs) have been suggested as an effective representation of human variation, and various haplotype-based association mapping methods for complex traits have been proposed in the literature. However, humans are diploid and, in practice, genotype data instead of haplotype data are collected directly. Therefore, efficient and accurate computational methods for haplotype reconstruction are needed and have recently been investigated intensively, especially for tightly linked markers such as SNPs. This paper reviews statistical and combinatorial haplotyping algorithms using pedigree data, unrelated individuals, or pooled samples.


Author(s):  
Heather Manching ◽  
Randall J Wisser

Abstract Motivation Ancestral haplotype maps provide useful information about genomic variation and insights into biological processes. Reconstructing the descendent haplotype structure of homologous chromosomes, particularly for large numbers of individuals, can help with characterizing the recombination landscape, elucidating genotype-to-phenotype relationships, improving genomic predictions and more. Inferring haplotype maps from sparse genotype data is an efficient approach to whole-genome haplotyping, but this is a non-trivial problem. A standardized approach is needed to validate whether haplotype reconstruction software, conceived population designs and existing data for a given population provides accurate haplotype information for further inference. Results We introduce SPEARS, a pipeline for the simulation-based appraisal of genome-wide haplotype maps constructed from sparse genotype data. Using a specified pedigree, the pipeline generates virtual genotypes (known data) with genotyping errors and missing data structure. It then proceeds to mimic analysis in practice, capturing sources of error due to genotyping, imputation and haplotype inference. Standard metrics allow researchers to assess different population designs and which features of haplotype structure or regions of the genome are sufficiently accurate for analysis. Haplotype maps for 1000 outcross progeny from a multi-parent population of maize are used to demonstrate SPEARS. Availabilityand implementation SPEARS, the protocol and suite of scripts, are publicly available under an MIT license at GitHub (https://github.com/maizeatlas/spears).. Supplementary information Supplementary data are available at Bioinformatics online.


2008 ◽  
Vol 2 (2) ◽  
pp. 100-114 ◽  
Author(s):  
John Cashman ◽  
Jun Zhang ◽  
Matthew Nelson ◽  
Andreas Braun
Keyword(s):  

Genetics ◽  
2003 ◽  
Vol 163 (3) ◽  
pp. 1177-1191 ◽  
Author(s):  
Gregory A Wilson ◽  
Bruce Rannala

Abstract A new Bayesian method that uses individual multilocus genotypes to estimate rates of recent immigration (over the last several generations) among populations is presented. The method also estimates the posterior probability distributions of individual immigrant ancestries, population allele frequencies, population inbreeding coefficients, and other parameters of potential interest. The method is implemented in a computer program that relies on Markov chain Monte Carlo techniques to carry out the estimation of posterior probabilities. The program can be used with allozyme, microsatellite, RFLP, SNP, and other kinds of genotype data. We relax several assumptions of early methods for detecting recent immigrants, using genotype data; most significantly, we allow genotype frequencies to deviate from Hardy-Weinberg equilibrium proportions within populations. The program is demonstrated by applying it to two recently published microsatellite data sets for populations of the plant species Centaurea corymbosa and the gray wolf species Canis lupus. A computer simulation study suggests that the program can provide highly accurate estimates of migration rates and individual migrant ancestries, given sufficient genetic differentiation among populations and sufficient numbers of marker loci.


2021 ◽  
pp. 140349482110044
Author(s):  
Jaakko Lähteenmäki ◽  
Anna-Leena Vuorinen ◽  
Juha Pajula ◽  
Kari Harno ◽  
Mika Lehto ◽  
...  

Aim: This case study aimed to investigate the process of integrating resources of multiple biobanks and health-care registers, especially addressing data permit application, time schedules, co-operation of stakeholders, data exchange and data quality. Methods: We investigated the process in the context of a retrospective study: Pharmacogenomics of antithrombotic drugs (PreMed study). The study involved linking the genotype data of three Finnish biobanks (Auria Biobank, Helsinki Biobank and THL Biobank) with register data on medicine dispensations, health-care encounters and laboratory results. Results: We managed to collect a cohort of 7005 genotyped individuals, thereby achieving the statistical power requirements of the study. The data collection process took 16 months, exceeding our original estimate by seven months. The main delays were caused by the congested data permit approval service to access national register data on health-care encounters. Comparison of hospital data lakes and national registers revealed differences, especially concerning medication data. Genetic variant frequencies were in line with earlier data reported for the European population. The yearly number of international normalised ratio (INR) tests showed stable behaviour over time. Conclusions: A large cohort, consisting of versatile individual-level phenotype and genotype data, can be constructed by integrating data from several biobanks and health data registers in Finland. Co-operation with biobanks is straightforward. However, long time periods need to be reserved when biobank resources are linked with national register data. There is a need for efforts to define general, harmonised co-operation practices and data exchange methods for enabling efficient collection of data from multiple sources.


Sign in / Sign up

Export Citation Format

Share Document