A direct-concatenation approach to train hidden Markov models to recognize the highly confusing Mandarin syllables with very limited training data

F. Liu; Y. Lee; L. Lee

doi:10.1109/89.221375

Semi-supervised learning of Hidden Markov Models for biological sequence analysis

Bioinformatics ◽

10.1093/bioinformatics/bty910 ◽

2018 ◽

Vol 35 (13) ◽

pp. 2208-2215 ◽

Cited By ~ 5

Author(s):

Ioannis A Tamposis ◽

Konstantinos D Tsirigos ◽

Margarita C Theodoropoulou ◽

Panagiota I Kontou ◽

Pantelis G Bagos

Keyword(s):

Sequence Analysis ◽

Supervised Learning ◽

Hidden Markov Models ◽

Markov Models ◽

Hidden Markov ◽

Transmembrane Protein ◽

Training Data ◽

Supplementary Information ◽

Training Procedure ◽

Partially Labeled Data

Abstract Motivation Hidden Markov Models (HMMs) are probabilistic models widely used in applications in computational sequence analysis. HMMs are basically unsupervised models. However, in the most important applications, they are trained in a supervised manner. Training examples accompanied by labels corresponding to different classes are given as input and the set of parameters that maximize the joint probability of sequences and labels is estimated. A main problem with this approach is that, in the majority of the cases, labels are hard to find and thus the amount of training data is limited. On the other hand, there are plenty of unclassified (unlabeled) sequences deposited in the public databases that could potentially contribute to the training procedure. This approach is called semi-supervised learning and could be very helpful in many applications. Results We propose here, a method for semi-supervised learning of HMMs that can incorporate labeled, unlabeled and partially labeled data in a straightforward manner. The algorithm is based on a variant of the Expectation-Maximization (EM) algorithm, where the missing labels of the unlabeled or partially labeled data are considered as the missing data. We apply the algorithm to several biological problems, namely, for the prediction of transmembrane protein topology for alpha-helical and beta-barrel membrane proteins and for the prediction of archaeal signal peptides. The results are very promising, since the algorithms presented here can significantly improve the prediction performance of even the top-scoring classifiers. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Observable Operator Models for Discrete Stochastic Time Series

Neural Computation ◽

10.1162/089976600300015411 ◽

2000 ◽

Vol 12 (6) ◽

pp. 1371-1398 ◽

Cited By ~ 48

Author(s):

Herbert Jaeger

Keyword(s):

Hidden Markov Models ◽

Stochastic Systems ◽

Markov Models ◽

Learning Algorithm ◽

Hidden Markov ◽

Training Data ◽

Constructive Learning ◽

Proper Subclass ◽

Stochastic Time Series ◽

Dependent Processes

A widely used class of models for stochastic systems is hidden Markov models. Systems that can be modeled by hidden Markov models are a proper subclass of linearly dependent processes, a class of stochastic systems known from mathematical investigations carried out over the past four decades. This article provides a novel, simple characterization of linearly dependent processes, called observable operator models. The mathematical properties of observable operator models lead to a constructive learning algorithm for the identification of linearly dependent processes. The core of the algorithm has a time complexity of O (N + nm3), where N is the size of training data, n is the number of distinguishable outcomes of observations, and m is model state-space dimension.

Download Full-text

Noise-Robust Hidden Markov Models for Limited Training Data for Within-Species Bird Phrase Classification

10.21437/interspeech.2016-1360 ◽

2016 ◽

Author(s):

Kantapon Kaewtip ◽

Charles Taylor ◽

Abeer Alwan

Keyword(s):

Hidden Markov Models ◽

Markov Models ◽

Hidden Markov ◽

Training Data ◽

Noise Robust

Download Full-text

Robust Hidden Markov Models for limited training data for birdsong phrase classification

The Journal of the Acoustical Society of America ◽

10.1121/1.4988171 ◽

2017 ◽

Vol 141 (5) ◽

pp. 3725-3726

Author(s):

Kantapon Kaewtip ◽

Abeer Alwan ◽

Charles Taylor

Keyword(s):

Hidden Markov Models ◽

Markov Models ◽

Hidden Markov ◽

Training Data

Download Full-text

Tone recognition for continuous mandarin speech with limited training data using selected context‐dependent hidden markov models

Journal of the Chinese Institute of Engineers ◽

10.1080/02533839.1994.9677646 ◽

1994 ◽

Vol 17 (6) ◽

pp. 775-784 ◽

Cited By ~ 4

Author(s):

Hsin‐Min Wang ◽

Lin‐Shan Lee

Keyword(s):

Hidden Markov Models ◽

Markov Models ◽

Hidden Markov ◽

Training Data ◽

Tone Recognition ◽

Context Dependent

Download Full-text

A New Data Fusion Method for Hybrid MMC/RNA Learning : Application to Automatic Speech Recognition

Revue Africaine de la Recherche en Informatique et Mathématiques Appliquées ◽

10.46298/arima.1842 ◽

2005 ◽

Vol Volume 3, Special Issue... ◽

Author(s):

Lilia Lazli ◽

Mohamed Tayeb Laskri

Keyword(s):

Neural Networks ◽

Speech Recognition ◽

Data Fusion ◽

Hidden Markov Models ◽

Markov Models ◽

Hidden Markov ◽

Recognition System ◽

Training Data ◽

Nous Rapportons ◽

The One

International audience It is well known that traditional Hidden Markov Models (HMM) systems lead to a considerable improvement when more training data or more parameters are used. However, using more data with hybrid Hidden Markov Models and Artificial Neural Networks (HMM/ANN) models results in increased training times without improvements in performance. We developed in this work a new method based on automatically separating data into several sets and training several neural networks of Multi-Layer Perceptrons (MLP) type on each set. During the recognition phase, models are combined using several criteria (based on data fusion techniques) to provide the recognized word. We showed in this paper that this method significantly improved the recognition accuracy. This method was applied in an Arabic speech recognition system. This last is based on the one hand, on a fuzzy clustering (application of the fuzzy c-means algorithm) and of another share, on a segmentation at base of the genetic algorithms. De nombreuses expériences ont déjà montré qu'une forte amélioration du taux de reconnaissance des systèmes MMC (Modèles de Markov Cachés) traditionnels est observée lorsque plus de données d'apprentissage sont utilisées. En revanche, l'augmentation du nombre de données d'apprentissage pour les modèles hybrides MMC/RNA (Modèles de Markov cachés/Réseaux de Neurones Artificiels) s'accompagne d'une forte augmentation du temps nécessaire à l'apprentissage des modèles, mais pas ou peu des performances du système. Pour pallier cette limitation, nous rapportons dans ce papier les résultats obtenus avec une nouvelle méthode d'apprentissage basée sur la fusion de données. Cette méthode a été appliquée dans un système de reconnaissance de la parole arabe. Ce dernier est basé d'une part, sur une segmentation floue (application de l'algorithme c-moyennes floues) et d'une autre part, sur une segmentation à base des algorithmes génétiques.

Download Full-text

Estimating Personality Impression from Speech Record Using Hidden Markov Models

IEEJ Transactions on Electronics Information and Systems ◽

10.1541/ieejeiss.135.1517 ◽

2015 ◽

Vol 135 (12) ◽

pp. 1517-1523 ◽

Cited By ~ 1

Author(s):

Yicheng Jin ◽

Takuto Sakuma ◽

Shohei Kato ◽

Tsutomu Kunitachi

Keyword(s):

Hidden Markov Models ◽

Markov Models ◽

Hidden Markov

Download Full-text

Hidden Markov Processes

10.23943/princeton/9780691133157.001.0001 ◽

2014 ◽

Cited By ~ 2

Author(s):

M. Vidyasagar

Keyword(s):

Hidden Markov Models ◽

Markov Processes ◽

Viterbi Algorithm ◽

Markov Models ◽

Hidden Markov ◽

Local Alignment ◽

Biological Applications ◽

Standard Material ◽

Hidden Markov Processes ◽

Genomics And Proteomics

This book explores important aspects of Markov and hidden Markov processes and the applications of these ideas to various problems in computational biology. It starts from first principles, so that no previous knowledge of probability is necessary. However, the work is rigorous and mathematical, making it useful to engineers and mathematicians, even those not interested in biological applications. A range of exercises is provided, including drills to familiarize the reader with concepts and more advanced problems that require deep thinking about the theory. Biological applications are taken from post-genomic biology, especially genomics and proteomics. The topics examined include standard material such as the Perron–Frobenius theorem, transient and recurrent states, hitting probabilities and hitting times, maximum likelihood estimation, the Viterbi algorithm, and the Baum–Welch algorithm. The book contains discussions of extremely useful topics not usually seen at the basic level, such as ergodicity of Markov processes, Markov Chain Monte Carlo (MCMC), information theory, and large deviation theory for both i.i.d and Markov processes. It also presents state-of-the-art realization theory for hidden Markov models. Among biological applications, it offers an in-depth look at the BLAST (Basic Local Alignment Search Technique) algorithm, including a comprehensive explanation of the underlying theory. Other applications such as profile hidden Markov models are also explored.

Download Full-text