An HMM-Like Dynamic Time Warping Scheme for Automatic Speech Recognition

Mathematical Problems in Engineering ◽

10.1155/2014/898729 ◽

2014 ◽

Vol 2014 ◽

pp. 1-8 ◽

Cited By ~ 4

Author(s):

Ing-Jr Ding ◽

Yen-Ming Hsu

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Template Matching ◽

Dynamic Time Warping ◽

Recognition System ◽

Home Automation ◽

Speech Recognition System ◽

Time Warping ◽

Feature Based ◽

Dynamic Time

In the past, the kernel of automatic speech recognition (ASR) is dynamic time warping (DTW), which is feature-based template matching and belongs to the category technique of dynamic programming (DP). Although DTW is an early developed ASR technique, DTW has been popular in lots of applications. DTW is playing an important role for the known Kinect-based gesture recognition application now. This paper proposed an intelligent speech recognition system using an improved DTW approach for multimedia and home automation services. The improved DTW presented in this work, called HMM-like DTW, is essentially a hidden Markov model- (HMM-) like method where the concept of the typical HMM statistical model is brought into the design of DTW. The developed HMM-like DTW method, transforming feature-based DTW recognition into model-based DTW recognition, will be able to behave as the HMM recognition technique and therefore proposed HMM-like DTW with the HMM-like recognition model will have the capability to further perform model adaptation (also known as speaker adaptation). A series of experimental results in home automation-based multimedia access service environments demonstrated the superiority and effectiveness of the developed smart speech recognition system by HMM-like DTW.

Download Full-text

HMM Based Enhanced Dynamic Time Warping Model for Efficient Hindi Language Speech Recognition System

Mobile Communication and Power Engineering - Communications in Computer and Information Science ◽

10.1007/978-3-642-35864-7_28 ◽

2013 ◽

pp. 200-206

Author(s):

Sharma Krishna Kumar ◽

Lavania Krishan Kant ◽

Sharma Shachi

Keyword(s):

Speech Recognition ◽

Dynamic Time Warping ◽

Recognition System ◽

Speech Recognition System ◽

Time Warping ◽

Hindi Language ◽

Dynamic Time

Download Full-text

A speech recognition system using linear predictive coding and dynamic time warping

Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society ◽

10.1109/iembs.1988.94689 ◽

1988 ◽

Cited By ~ 2

Author(s):

W. Kinsner ◽

D. Peters

Keyword(s):

Speech Recognition ◽

Dynamic Time Warping ◽

Predictive Coding ◽

Recognition System ◽

Speech Recognition System ◽

Time Warping ◽

Linear Predictive Coding ◽

Dynamic Time

Download Full-text

An Improved Method In Speech Signal Input Representation Based On DTW Technique For NN Speech Recognition System

Jurnal Teknologi ◽

10.11113/jt.v46.291 ◽

2012 ◽

Author(s):

Rubita Sudirman ◽

Sh. Hussain Salleh ◽

Shaharuddin Salleh

Keyword(s):

Neural Network ◽

Speech Recognition ◽

Conjugate Gradient ◽

Template Matching ◽

Dynamic Time Warping ◽

Gradient Descent ◽

Reference Frames ◽

Time Warping ◽

Dynamic Time ◽

Quasi Newton

Kertas kerja ini membentangkan pemprosesan semula ciri pertuturan pemalar Pengekodan Ramalan Linear (LPC) bagi menyediakan template rujukan yang boleh diharapkan untuk set perkataan yang hendak dicam menggunakan rangkaian neural buatan. Kertas kerja ini juga mencadangkan penggunaan cirian kenyaringan yang ditakrifkan dari data pertuturan sebagai satu lagi ciri input. Algoritma Warping Masa Dinamik (DTW) menjadi asas kepada algoritma baru yang dibangunkan, ia dipanggil sebagai DTW padanan bingkai (DTW–FF). Algoritma ini direka bentuk untuk melakukan padanan bingkai bagi pemprosesan semula input LPC. Ia bertujuan untuk menyamakan bilangan bingkai input dalam set ujian dengan set rujukan. Pernormalan bingkaian ini adalah diperlukan oleh rangkaian neural yang direka untuk membanding data yang harus mempunyai kepanjangan yang sama, sedangkan perkataan yang sama dituturkan dengan kepanjangan yang berbeza–beza. Dengan melakukan padanan bingkai, bingkai input dan rujukan boleh diubahsuai supaya bilangan bingkaian sama seperti bingkaian rujukan. Satu lagi misi kertas kerja ini ialah mentakrif dan menggunakan cirian kenyaringan menggunakan algoritma penapis harmonik. Selepas kenyaringan ditakrif dan pemalar LPC dinormalkan kepada bilangan bingkaian dikehendaki, pengecaman pertuturan menggunakan rangkaian neural dilakukan. Keputusan yang baik diperoleh sehingga mencapai ketepatan setinggi 98% menggunakan kombinasi cirian DTW–FF dan cirian kenyaringan. Di akhir kertas kerja ini, perbandingan kadar convergence antara Conjugate gradient descent (CGD), Quasi–Newton, dan Steepest Gradient Descent (SGD) dilakukan untuk mendapatkan arah carian titik global yang optimal. Keputusan menunjukkan CGD memberikan nilai titik global yang paling optimal dibandingkan dengan Quasi–Newton dan SGD. Kata kunci: Warping masa dinamik, pernormalan masa, rangkaian neural, pengecaman pertuturan, conjugate gradient descent A pre–processing of linear predictive coefficient (LPC) features for preparation of reliable reference templates for the set of words to be recognized using the artificial neural network is presented in this paper. The paper also proposes the use of pitch feature derived from the recorded speech data as another input feature. The Dynamic Time Warping algorithm (DTW) is the back–bone of the newly developed algorithm called DTW fixing frame algorithm (DTW–FF) which is designed to perform template matching for the input preprocessing. The purpose of the new algorithm is to align the input frames in the test set to the template frames in the reference set. This frame normalization is required since NN is designed to compare data of the same length, however same speech varies in their length most of the time. By doing frame fixing, the input frames and the reference frames are adjusted to the same number of frames according to the reference frames. Another task of the study is to extract pitch features using the Harmonic Filter algorithm. After pitch extraction and linear predictive coefficient (LPC) features fixed to a desired number of frames, speech recognition using neural network can be performed and results showed a very promising solution. Result showed that as high as 98% recognition can be achieved using combination of two features mentioned above. At the end of the paper, a convergence comparison between conjugate gradient descent (CGD), Quasi–Newton, and steepest gradient descent (SGD) search direction is performed and results show that the CGD outperformed the Newton and SGD. Key words: Dynamic time warping, time normalization, neural network, speech recognition, conjugate gradient descent

Download Full-text

Some techniques for incorporating local timescale variability information into a dynamic time-warping algorithm for automatic speech recognition

10.1109/icassp.1983.1171934 ◽

2005 ◽

Cited By ~ 4

Author(s):

M. Russell ◽

R. Moore ◽

M. Tomlinson

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Dynamic Time Warping ◽

Time Warping ◽

Dynamic Time

Download Full-text

Voice-Based Speaker Identification and Verification

Advances in Library and Information Science - Handbook of Research on Knowledge and Organization Systems in Library and Information Science ◽

10.4018/978-1-7998-7258-0.ch016 ◽

2021 ◽

pp. 288-316

Author(s):

Keshav Sinha ◽

Rasha Subhi Hameed ◽

Partha Paul ◽

Karan Pratap Singh

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Speech Signal ◽

Reference Model ◽

Speaker Identification ◽

Recognition System ◽

Speech Recognition System ◽

Primary Focus ◽

Dynamic Time ◽

Dynamic Time Wrapping

In recent years, the advancement in voice-based authentication leads in the field of numerous forensic voice authentication technology. For verification, the speech reference model is collected from various open-source clusters. In this chapter, the primary focus is on automatic speech recognition (ASR) technique which stores and retrieves the data and processes them in a scalable manner. There are the various conventional techniques for speech recognition such as BWT, SVD, and MFCC, but for automatic speech recognition, the efficiency of these conventional recognition techniques degrade. So, to overcome this problem, the authors propose a speech recognition system using E-SVD, D3-MFCC, and dynamic time wrapping (DTW). The speech signal captures its important qualities while discarding the unimportant and distracting features using D3-MFCC.

Download Full-text

Feature-based noise robust speech recognition on an Indonesian language automatic speech recognition system

2014 International Conference on Electrical Engineering and Computer Science (ICEECS) ◽

10.1109/iceecs.2014.7045217 ◽

2014 ◽

Author(s):

Cil Hardianto Satriawan ◽

Dessi Puji Lestari

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Recognition System ◽

Robust Speech Recognition ◽

Speech Recognition System ◽

Automatic Speech Recognition System ◽

Feature Based ◽

Noise Robust Speech Recognition ◽

Noise Robust

Download Full-text

A Recognition Judgment Method of Isolated-Word Speech-Recognition

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.543-547.2337 ◽

2014 ◽

Vol 543-547 ◽

pp. 2337-2340 ◽

Cited By ~ 1

Author(s):

Yi Zhang ◽

Xiao Song Li ◽

Yang Song

Keyword(s):

Speech Recognition ◽

Recognition System ◽

Speech Recognition System ◽

Recognition Judgment ◽

Threshold Estimation ◽

Time Warping ◽

Distance Threshold ◽

Shortest Distance ◽

Isolated Word ◽

Dynamic Time

Isolated-word speech-recognition system adopted the shortest distance of Dynamic Time Warping (DTW) to make recognition judgment, which has the disadvantage of high False Accept Rate (FAR), poor anti-noise and robustness. This paper proposes a new method based on DTW distance Threshold Estimation for recognition judgment. This method processes the maximum distance between template speech and training input speech multiplying adjusting coefficient, then plus noise DTW distance, which regard the final result as distance Threshold Estimation. At the time of doing speech recognition, if the distance between testing speech and template speech exceeds the Threshold Estimation, then the system will not recognize this speech. The experiment shows that this method can greatly improve the anti-noise and robustness performance of the Isolated-word speech-recognition system and solve the problem of high FAR.

Download Full-text

Dynamic Time Warping Inside a Genetic Algorithm for Automatic Speech Recognition

Modelling and Implementation of Complex Systems - Lecture Notes in Networks and Systems ◽

10.1007/978-3-030-05481-6_14 ◽

2018 ◽

pp. 180-192

Author(s):

Fadila Maouche ◽

Mohamed Benmohammed

Keyword(s):

Genetic Algorithm ◽

Speech Recognition ◽

Automatic Speech Recognition ◽

Dynamic Time Warping ◽

Time Warping ◽

Dynamic Time

Download Full-text

Building An Automatic Speech Recognition System for Home Automation

Transactions on Machine Learning and Artificial Intelligence ◽

10.14738/tmlai.54.3190 ◽

2017 ◽

Vol 5 (4) ◽

Author(s):

Mohamed Aboulkhir ◽

Samira Khoulji ◽

Reda Jourani ◽

M.L Kerkeb

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Recognition System ◽

Home Automation ◽

Speech Recognition System ◽

Automatic Speech Recognition System

Download Full-text

Design Of A Voice Controlled Home Automation System Using Deep Learning Convolutional Neural Network (DL-CNN)

Telekontran : Jurnal Ilmiah Telekomunikasi, Kendali dan Elektronika Terapan ◽

10.34010/telekontran.v8i1.3078 ◽

2020 ◽

Vol 8 (1) ◽

pp. 57-73

Author(s):

Lery Sakti Ramba

Keyword(s):

Deep Learning ◽

Speech Recognition ◽

Background Noise ◽

Electronic Devices ◽

Recognition System ◽

Background Intensity ◽

Automation System ◽

Home Automation ◽

Speech Recognition System ◽

Home Automation System

The purpose of this research is to design home automation system that can be controlled using voice commands. This research was conducted by studying other research related to the topics in this research, discussing with competent parties, designing systems, testing systems, and conducting analyzes based on tests that have been done. In this research voice recognition system was designed using Deep Learning Convolutional Neural Networks (DL-CNN). The CNN model that has been designed will then be trained to recognize several kinds of voice commands. The result of this research is a speech recognition system that can be used to control several electronic devices connected to the system. The speech recognition system in this research has a 100% success rate in room conditions with background intensity of 24dB (silent), 67.67% in room conditions with 42dB background noise intensity, and only 51.67% in room conditions with background intensity noise 52dB (noisy). The percentage of the success of the speech recognition system in this research is strongly influenced by the intensity of background noise in a room. Therefore, to obtain optimal results, the speech recognition system in this research is more suitable for use in rooms with low intensity background noise.

Download Full-text