scholarly journals Spatially Enhanced Differential RNA Methylation Analysis from Affinity-Based Sequencing Data with Hidden Markov Model

2015 ◽  
Vol 2015 ◽  
pp. 1-12 ◽  
Author(s):  
Yu-Chen Zhang ◽  
Shao-Wu Zhang ◽  
Lian Liu ◽  
Hui Liu ◽  
Lin Zhang ◽  
...  

With the development of new sequencing technology, the entire N6-methyl-adenosine (m6A) RNA methylome can now be unbiased profiled with methylated RNA immune-precipitation sequencing technique (MeRIP-Seq), making it possible to detect differential methylation states of RNA between two conditions, for example, between normal and cancerous tissue. However, as an affinity-based method, MeRIP-Seq has yet provided base-pair resolution; that is, a single methylation site determined from MeRIP-Seq data can in practice contain multiple RNA methylation residuals, some of which can be regulated by different enzymes and thus differentially methylated between two conditions. Since existing peak-based methods could not effectively differentiate multiple methylation residuals located within a single methylation site, we propose a hidden Markov model (HMM) based approach to address this issue. Specifically, the detected RNA methylation site is further divided into multiple adjacent small bins and then scanned with higher resolution using a hidden Markov model to model the dependency between spatially adjacent bins for improved accuracy. We tested the proposed algorithm on both simulated data and real data. Result suggests that the proposed algorithm clearly outperforms existing peak-based approach on simulated systems and detects differential methylation regions with higher statistical significance on real dataset.

Author(s):  
Shuying Sun ◽  
Xiaoqing Yu

AbstractDNA methylation is an epigenetic event that plays an important role in regulating gene expression. It is important to study DNA methylation, especially differential methylation patterns between two groups of samples (e.g. patients vs. normal individuals). With next generation sequencing technologies, it is now possible to identify differential methylation patterns by considering methylation at the single CG site level in an entire genome. However, it is challenging to analyze large and complex NGS data. In order to address this difficult question, we have developed a new statistical method using a hidden Markov model and Fisher’s exact test (HMM-Fisher) to identify differentially methylated cytosines and regions. We first use a hidden Markov chain to model the methylation signals to infer the methylation state as Not methylated (N), Partly methylated (P), and Fully methylated (F) for each individual sample. We then use Fisher’s exact test to identify differentially methylated CG sites. We show the HMM-Fisher method and compare it with commonly cited methods using both simulated data and real sequencing data. The results show that HMM-Fisher outperforms the current available methods to which we have compared. HMM-Fisher is efficient and robust in identifying heterogeneous DM regions.


Author(s):  
Gökalp Çelik ◽  
TIMUR TUNCALI

Runs of long homozygous stretches (ROH) are considered to be the result of consanguinity and usually contain recessive deleterious disease causing mutations (Szpiech et al., 2013). Several algorithms have been developed to detect ROHs. Here, we developed a simple, alternative strategy by examining X chromosome non-pseudoautosomal region to detect the ROHs from next generation sequencing data utilizing the genotype probabilities and the Hidden Markov Model algorithm as a tool, namely ROHMM. It is implemented purely in java and contains both command-line and a graphical user interface. We tested ROHMM on simulated data as well as real population data from 1000G Project and a clinical sample. Our results have shown that ROHMM can perform robustly producing highly accurate homozygosity estimations under all conditions thereby meeting and even exceeding the performance of its natural competitors.


2017 ◽  
Author(s):  
T. Druet ◽  
M. Gautier

AbstractInbreeding results from the mating of related individuals and has negative consequence because it brings together deleterious variants in one individual. Inbreeding is associated with recessive diseases and reduced production or fitness. In general, inbreeding is estimated with respect to a base population that needs to be defined. Ancestors in generations anterior to the base population are considered unrelated. We herein propose a model that estimates inbreeding relative to multiple age-based classes. Each inbreeding distribution is associated to a different time in the past: recent inbreeding generating longer homozygous stretches than more ancient. Our model is a mixture of exponential distribution implemented in a hidden Markov model framework that uses marker allele frequencies, genetic distances, genotyping error rates and the sequences of observed genotypes. Based on simulations studies, we show that the inbreeding coefficients and the age of inbreeding are correctly estimated. Mean absolute errors of estimators are low, the efficiency depending on the available information. When several inbreeding classes are simulated, the model captures them if their ages are sufficiently different. Genotyping errors or low-fold sequencing data are easily accommodated in the hidden Markov model framework. Application to real data sets illustrate that the method can reveal recent different demographic histories among populations, some of them presenting very recent bottlenecks or founder effects. The method also clearly identifies individuals resulting from extreme consanguineous matings.


2020 ◽  
Vol 43 (1) ◽  
pp. 71-82
Author(s):  
Sebastian George ◽  
Ambily Jose

The most suitable statistical method for explaining serial dependency in time series count data is that based on Hidden Markov Models (HMMs). These models assume that the observations are generated from a finite mixture of distributions governed by the principle of Markov chain (MC). Poisson-Hidden Markov Model (P-HMM) may be the most widely used method for modelling the above said situations. However, in real life scenario, this model cannot be considered as the best choice. Taking this fact into account, we, in this paper, go for Generalised Poisson Distribution (GPD) for modelling count data. This method can rectify the overdispersion and underdispersion in the Poisson model. Here, we develop Generalised Poisson Hidden Markov model (GP-HMM) by combining GPD with HMM for modelling such data. The results of the study on simulated data and an application of real data, monthly cases of Leptospirosis in the state of Kerala in South India, show good convergence properties, proving that the GP-HMM is a better method compared to P-HMM.


2016 ◽  
Vol 32 (11) ◽  
pp. 1749-1751 ◽  
Author(s):  
Vagheesh Narasimhan ◽  
Petr Danecek ◽  
Aylwyn Scally ◽  
Yali Xue ◽  
Chris Tyler-Smith ◽  
...  

2019 ◽  
Author(s):  
Etienne Ackermann ◽  
Caleb T. Kemere ◽  
John P. Cunningham

AbstractSpike sorting is a standard preprocessing step to obtain ensembles of single unit data from multiunit, multichannel recordings in neuroscience. However, more recently, some researchers have started doing analyses directly on the unsorted data. Here we present a new computational model that is an extension of the standard (unsupervised) switching Poisson hidden Markov model (where observations are time-binned spike counts from each of N neurons), to a clusterless approximation in which we observe only a d-dimensional mark for each spike. Such an unsupervised yet clusterless approach has the potential to incorporate more information than is typically available from spike-sorted approaches, and to uncover temporal structure in neural data without access to behavioral correlates. We show that our approach can recover model parameters from simulated data, and that it can uncover task-relevant structure from real neural data.


Sign in / Sign up

Export Citation Format

Share Document