On Prediction Using Variable Order Markov Models

Journal of Artificial Intelligence Research ◽

10.1613/jair.1491 ◽

2004 ◽

Vol 22 ◽

pp. 385-421 ◽

Cited By ~ 174

Author(s):

R. Begleiter ◽

R. El-Yaniv ◽

G. Yona

Keyword(s):

Markov Models ◽

Real Life ◽

Compression Algorithm ◽

Finite Alphabet ◽

Protein Classification ◽

Variable Order ◽

Suffix Trees ◽

Classification Problems ◽

Context Tree ◽

Probabilistic Suffix Trees

This paper is concerned with algorithms for prediction of discrete sequences over a finite alphabet, using variable order Markov models. The class of such algorithms is large and in principle includes any lossless compression algorithm. We focus on six prominent prediction algorithms, including Context Tree Weighting (CTW), Prediction by Partial Match (PPM) and Probabilistic Suffix Trees (PSTs). We discuss the properties of these algorithms and compare their performance using real life sequences from three domains: proteins, English text and music pieces. The comparison is made with respect to prediction quality as measured by the average log-loss. We also compare classification algorithms based on these predictors with respect to a number of large protein classification tasks. Our results indicate that a ``decomposed'' CTW (a variant of the CTW algorithm) and PPM outperform all other algorithms in sequence prediction tasks. Somewhat surprisingly, a different algorithm, which is a modification of the Lempel-Ziv compression algorithm, significantly outperforms all algorithms on the protein classification problems.

Download Full-text

Improved Smoothing for Probabilistic Suffix Trees Seen as Variable Order Markov Chains

Lecture Notes in Computer Science - Machine Learning: ECML 2002 ◽

10.1007/3-540-36755-1_16 ◽

2002 ◽

pp. 185-194 ◽

Cited By ~ 2

Author(s):

Christopher Kermorvant ◽

Pierre Dupont

Keyword(s):

Markov Chains ◽

Variable Order ◽

Suffix Trees ◽

Probabilistic Suffix Trees

Download Full-text

Analysis of Markov Models

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.462-463.243 ◽

2013 ◽

Vol 462-463 ◽

pp. 243-246

Author(s):

Chang Guang Shi

Keyword(s):

Markov Models ◽

Suffix Trees

Highly-available models and IPv4 have garnered improbable interest from both statisticians and experts in the last several years. Here, we show the emulation of suffix trees. We motivate an algorithm for suffix trees, which we use to demonstrate that e-business and replication can interact to solve this challenge.

Download Full-text

Probabilistic distances between finite-state finite-alphabet hidden Markov models

IEEE Transactions on Automatic Control ◽

10.1109/tac.2005.844896 ◽

2005 ◽

Vol 50 (4) ◽

pp. 505-511 ◽

Cited By ~ 26

Author(s):

Li Xie ◽

V.A. Ugrinovskii ◽

I.R. Petersen

Keyword(s):

Hidden Markov Models ◽

Markov Models ◽

Hidden Markov ◽

Finite Alphabet ◽

Probabilistic Distances ◽

Finite State

Download Full-text

Lossless Compression Algorithm Based on Context Tree

Proceedings of the 3rd International Conference on Video and Image Processing ◽

10.1145/3376067.3376110 ◽

2019 ◽

Author(s):

Jiong Wang ◽

Jianhua Chen ◽

Zhiyuan He

Keyword(s):

Lossless Compression ◽

Compression Algorithm ◽

Context Tree

Download Full-text

The Joy of Markov Models—Channel Gating and Transport Cycling Made Easy

The Biophysicist ◽

10.35459/tbp.2019.000125 ◽

2021 ◽

Author(s):

G. Zifarelli ◽

P. Zuccolini ◽

S. Bertelli ◽

M. Pusch

Keyword(s):

Single Molecule ◽

Protein Function ◽

Markov Models ◽

Single Channel ◽

Real Life ◽

Power Spectra ◽

K Channel ◽

Modular Architecture ◽

Gating Currents ◽

Discrete State

ABSTRACT The behavior of ion channels and transporters is often modeled using discrete state continuous-time Markov models. Such models are helpful for the interpretation of experimental data and can guide the design of experiments by testing specific predictions. Here, we describe a computational tool that allows us to create Markov models of chosen complexity and to calculate the predictions on a macroscopic scale, as well on a single-molecule scale. The program calculates steady-state properties (current, state probabilities, and cycle frequencies), deterministic macroscopic and stochastic time courses, gating currents, dwell-time histograms, and power spectra of channels and transporters. In addition, a visual simulation mode allows us to follow the time-dependent stochastic behavior of a single channel or transporter. After a basic introduction into the concept of Markov models, real-life examples are discussed, including a model of a simple K+ channel, a voltage-gated sodium channel, a 3-state ligand-gated channel, and an electrogenic uniporter. In this manner, the article has a modular architecture, progressing from basic to more advanced topics. This illustrates how the MarkovEditor program can serve students to explore Markov models at a basic level but is also suited for research scientists to test and develop models on the mechanisms of protein function.

Download Full-text

CpG Island Identification with Higher Order and Variable Order Markov Models

Data Mining in Biomedicine - Springer Optimization and Its Applications ◽

10.1007/978-0-387-69319-4_4 ◽

2008 ◽

pp. 47-57

Author(s):

Zhenqiu Liu ◽

Dechang Chen ◽

Xue-wen Chen

Keyword(s):

Markov Models ◽

Cpg Island ◽

Higher Order ◽

Variable Order

Download Full-text

A framework for space-efficient variable-order Markov models

Bioinformatics ◽

10.1093/bioinformatics/btz268 ◽

2019 ◽

Vol 35 (22) ◽

pp. 4607-4616

Author(s):

Fabio Cunial ◽

Jarno Alanko ◽

Djamal Belazzougui

Keyword(s):

Language Processing ◽

Data Structures ◽

Markov Models ◽

Biological Properties ◽

Specific Model ◽

Suffix Array ◽

Training Data ◽

Supplementary Information ◽

Variable Order ◽

Scoring Functions

Abstract Motivation Markov models with contexts of variable length are widely used in bioinformatics for representing sets of sequences with similar biological properties. When models contain many long contexts, existing implementations are either unable to handle genome-scale training datasets within typical memory budgets, or they are optimized for specific model variants and are thus inflexible. Results We provide practical, versatile representations of variable-order Markov models and of interpolated Markov models, that support a large number of context-selection criteria, scoring functions, probability smoothing methods, and interpolations, and that take up to four times less space than previous implementations based on the suffix array, regardless of the number and length of contexts, and up to ten times less space than previous trie-based representations, or more, while matching the size of related, state-of-the-art data structures from Natural Language Processing. We describe how to further compress our indexes to a quantity related to the redundancy of the training data, saving up to 90% of their space on very repetitive datasets, and making them become up to 60 times smaller than previous implementations based on the suffix array. Finally, we show how to exploit constraints on the length and frequency of contexts to further shrink our compressed indexes to half of their size or more, achieving data structures that are a hundred times smaller than previous implementations based on the suffix array, or more. This allows variable-order Markov models to be used with bigger datasets and with longer contexts on the same hardware, thus possibly enabling new applications. Availability and implementation https://github.com/jnalanko/VOMM Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text