Determination of complete chromosomal haplotypes by bulk DNA sequencing
AbstractHaplotype phase represents the collective genetic variation between homologous chromosomes and is an essential feature of non-haploid genomes. Determining the haplotype phase requires knowledge of both the genotypes at variant sites and their linkage across each chromosome. Haplotype linkage can be either inferred statistically from a genotyped population, or determined by long-range sequencing of an individual genome. However, extending haplotype inference to the whole-chromosome scale remains challenging and usually requires special experimental techniques. Here we describe a general computational strategy to determine complete chromosomal haplotypes using a combination of bulk long-range sequencing and Hi-C sequencing. We demonstrate that this strategy can resolve the haplotypes of parental chromosomes in diploid human genomes at high precision (99%) and completeness (98%), and is further able to assemble the syntenic organization of aneuploid genomes (“digital karyotype”).