scholarly journals On Prediction Using Variable Order Markov Models

2004 ◽  
Vol 22 ◽  
pp. 385-421 ◽  
Author(s):  
R. Begleiter ◽  
R. El-Yaniv ◽  
G. Yona

This paper is concerned with algorithms for prediction of discrete sequences over a finite alphabet, using variable order Markov models. The class of such algorithms is large and in principle includes any lossless compression algorithm. We focus on six prominent prediction algorithms, including Context Tree Weighting (CTW), Prediction by Partial Match (PPM) and Probabilistic Suffix Trees (PSTs). We discuss the properties of these algorithms and compare their performance using real life sequences from three domains: proteins, English text and music pieces. The comparison is made with respect to prediction quality as measured by the average log-loss. We also compare classification algorithms based on these predictors with respect to a number of large protein classification tasks. Our results indicate that a ``decomposed'' CTW (a variant of the CTW algorithm) and PPM outperform all other algorithms in sequence prediction tasks. Somewhat surprisingly, a different algorithm, which is a modification of the Lempel-Ziv compression algorithm, significantly outperforms all algorithms on the protein classification problems.

2013 ◽  
Vol 462-463 ◽  
pp. 243-246
Author(s):  
Chang Guang Shi
Keyword(s):  

Highly-available models and IPv4 have garnered improbable interest from both statisticians and experts in the last several years. Here, we show the emulation of suffix trees. We motivate an algorithm for suffix trees, which we use to demonstrate that e-business and replication can interact to solve this challenge.


2021 ◽  
Author(s):  
G. Zifarelli ◽  
P. Zuccolini ◽  
S. Bertelli ◽  
M. Pusch

ABSTRACT The behavior of ion channels and transporters is often modeled using discrete state continuous-time Markov models. Such models are helpful for the interpretation of experimental data and can guide the design of experiments by testing specific predictions. Here, we describe a computational tool that allows us to create Markov models of chosen complexity and to calculate the predictions on a macroscopic scale, as well on a single-molecule scale. The program calculates steady-state properties (current, state probabilities, and cycle frequencies), deterministic macroscopic and stochastic time courses, gating currents, dwell-time histograms, and power spectra of channels and transporters. In addition, a visual simulation mode allows us to follow the time-dependent stochastic behavior of a single channel or transporter. After a basic introduction into the concept of Markov models, real-life examples are discussed, including a model of a simple K+ channel, a voltage-gated sodium channel, a 3-state ligand-gated channel, and an electrogenic uniporter. In this manner, the article has a modular architecture, progressing from basic to more advanced topics. This illustrates how the MarkovEditor program can serve students to explore Markov models at a basic level but is also suited for research scientists to test and develop models on the mechanisms of protein function.


2019 ◽  
Vol 35 (22) ◽  
pp. 4607-4616
Author(s):  
Fabio Cunial ◽  
Jarno Alanko ◽  
Djamal Belazzougui

Abstract Motivation Markov models with contexts of variable length are widely used in bioinformatics for representing sets of sequences with similar biological properties. When models contain many long contexts, existing implementations are either unable to handle genome-scale training datasets within typical memory budgets, or they are optimized for specific model variants and are thus inflexible. Results We provide practical, versatile representations of variable-order Markov models and of interpolated Markov models, that support a large number of context-selection criteria, scoring functions, probability smoothing methods, and interpolations, and that take up to four times less space than previous implementations based on the suffix array, regardless of the number and length of contexts, and up to ten times less space than previous trie-based representations, or more, while matching the size of related, state-of-the-art data structures from Natural Language Processing. We describe how to further compress our indexes to a quantity related to the redundancy of the training data, saving up to 90% of their space on very repetitive datasets, and making them become up to 60 times smaller than previous implementations based on the suffix array. Finally, we show how to exploit constraints on the length and frequency of contexts to further shrink our compressed indexes to half of their size or more, achieving data structures that are a hundred times smaller than previous implementations based on the suffix array, or more. This allows variable-order Markov models to be used with bigger datasets and with longer contexts on the same hardware, thus possibly enabling new applications. Availability and implementation https://github.com/jnalanko/VOMM Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document