scholarly journals GPHMM: an integrated hidden Markov model for identification of copy number alteration and loss of heterozygosity in complex tumor samples using whole genome SNP arrays

2011 ◽  
Vol 39 (12) ◽  
pp. 4928-4941 ◽  
Author(s):  
Ao Li ◽  
Zongzhi Liu ◽  
Kimberly Lezon-Geyda ◽  
Sudipa Sarkar ◽  
Donald Lannin ◽  
...  
Biostatistics ◽  
2020 ◽  
Author(s):  
Hyoyoung Choo-Wosoba ◽  
Paul S Albert ◽  
Bin Zhu

Summary Allele-specific copy number alteration (ASCNA) analysis is for identifying copy number abnormalities in tumor cells. Unlike normal cells, tumor cells are heterogeneous as a combination of dominant and minor subclones with distinct copy number profiles. Estimating the clonal proportion and identifying mainclone and subclone genotypes across the genome are important for understanding tumor progression. Several ASCNA tools have recently been developed, but they have been limited to the identification of subclone regions, and not the genotype of subclones. In this article, we propose subHMM, a hidden Markov model-based approach that estimates both subclone region and region-specific subclone genotype and clonal proportion. We specify a hidden state variable representing the conglomeration of clonal genotype and subclone status. We propose a two-step algorithm for parameter estimation, where in the first step, a standard hidden Markov model with this conglomerated state variable is fit. Then, in the second step, region-specific estimates of the clonal proportions are obtained by maximizing region-specific pseudo-likelihoods. We apply subHMM to study renal cell carcinoma datasets in The Cancer Genome Atlas. In addition, we conduct simulation studies that show the good performance of the proposed approach. The R source code is available online at https://dceg.cancer.gov/tools/analysis/subhmm. Expectation–Maximization algorithm; Forward–backward algorithm; Somatic copy number alteration; Tumor subclones.


2019 ◽  
Author(s):  
Hyoyoung Choo-Wosoba ◽  
Paul S. Albert ◽  
Bin Zhu

AbstractAllele-specific copy number alteration (ASCNA) analysis is for identifying copy number abnormalities in tumor cells. Unlike normal cells, tumor cells are heterogeneous as a combination of dominant and minor subclones with distinct copy number profiles. Estimating the clonal proportion and identifying mainclone and subclone genotypes across the genome is important for understanding tumor progression. Several ASCNA tools have recently been developed, but they have been limited to the identification of subclone regions, and not the genotype of subclones. In this paper, we propose subHMM, a hidden Markov model-based approach that estimates both subclone region as well as region-specific subclone genotype and clonal proportion. We specify a hidden state variable representing the conglomeration of clonal genotype and subclone status. We propose a two-step algorithm for parameter estimation, where in the first step, a standard hidden Markov model with this conglomerated state variable is fit. Then, in the second step, region-specific estimates of the clonal proportions are obtained by maximizing region-specific pseudo-likelihoods. We apply subHMM to study renal cell carcinoma datasets in The Cancer Genome Atlas. In addition, we conduct simulation studies that show the good performance of the proposed approach. The R package is available online at https://dceg.cancer.gov/tools/analysis/subhmm. somatic copy number alteration; tumor heterogeneity; E-M algorithm; forward-backward algorithm.


Author(s):  
Hai Yang ◽  
Daming Zhu

Copy number variation (CNV) is a prevalent kind of genetic structural variation which leads to an abnormal number of copies of large genomic regions, such as gain or loss of DNA segments larger than 1[Formula: see text]kb. CNV exists not only in human genome but also in plant genome. Current researches have testified that CNV is associated with many complex diseases. In this paper, guanine-cytosine (GC) bias, mappability and their effect on read depth signals in sequencing data are discussed first. Subsequently, a new correction method for GC bias and an improved combinatorial detection algorithm for CNV using high-throughput sequencing reads based on hidden Markov model (CNV-HMM) are proposed. The corrected read depth signals have lower correlation with GC content, mappability of reads and the width of analysis window. Then we create a hidden Markov model which maps the reads onto the reference genome and records the unmapped reads. The unmapped reads are counted and normalized. The CNV-HMM detects the abnormal signal of read count and gains the candidate CNVs using the expectation maximization (EM) algorithm. Finally, we filter the candidate CNVs using split reads to promote the performance of our algorithm. The experiment result indicates that the CNV-HMM algorithm has higher accuracy and sensitivity for CNVs detection than most current detection algorithms.


2007 ◽  
Vol 35 (6) ◽  
pp. 2013-2025 ◽  
Author(s):  
Stefano Colella ◽  
Christopher Yau ◽  
Jennifer M. Taylor ◽  
Ghazala Mirza ◽  
Helen Butler ◽  
...  

2018 ◽  
Author(s):  
Hyoyoung Choo-Wosoba ◽  
Paul S Albert ◽  
Bin Zhu

AbstractBackground:Somatic copy number alternation (SCNA) is a common feature of the cancer genome and is associated with cancer etiology and prognosis. The allele-specific SCNA analysis of a tumor sample aims to identify the allele-specific copy numbers of both alleles, adjusting for the ploidy and the tumor purity. Next generation sequencing platforms produce abundant read counts at the base-pair resolution across the exome or whole genome which is susceptible to hypersegmentation, a phenomenon where numerous regions with very short length are falsely identified as SCNA.Results:We propose hsegHMM, a hidden Markov model approach that accounts for hypersegmentation for allele-specific SCNA analysis. hsegHMM provides statistical inference of copy number profiles by using an effcient E-M algorithm procedure. Through simulation and application studies, we found that hsegHMM handles hypersegmentation effectively with a t-distribution as a part of the emission probability distribution structure and a carefully defined state space. We also compared hsegHMM with FACETS which is a current method for allele-specific SCNA analysis. For the application, we use a renal cell carcinoma sample from The Cancer Genome Atlas (TCGA) study.Conclusions:We demonstrate the robustness of hsegHMM to hypersegmentation. Furthermore, hsegHMM provides the quantification of uncertainty in identifying allele-specific SCNAs over the entire chromosomes. hsegHMM performs better than FACETS when read depth (coverage) is uneven across the genome.


Sign in / Sign up

Export Citation Format

Share Document