An Improved Method In Speech Signal Input Representation Based On DTW Technique For NN Speech Recognition System

2012 ◽  
Author(s):  
Rubita Sudirman ◽  
Sh. Hussain Salleh ◽  
Shaharuddin Salleh

Kertas kerja ini membentangkan pemprosesan semula ciri pertuturan pemalar Pengekodan Ramalan Linear (LPC) bagi menyediakan template rujukan yang boleh diharapkan untuk set perkataan yang hendak dicam menggunakan rangkaian neural buatan. Kertas kerja ini juga mencadangkan penggunaan cirian kenyaringan yang ditakrifkan dari data pertuturan sebagai satu lagi ciri input. Algoritma Warping Masa Dinamik (DTW) menjadi asas kepada algoritma baru yang dibangunkan, ia dipanggil sebagai DTW padanan bingkai (DTW–FF). Algoritma ini direka bentuk untuk melakukan padanan bingkai bagi pemprosesan semula input LPC. Ia bertujuan untuk menyamakan bilangan bingkai input dalam set ujian dengan set rujukan. Pernormalan bingkaian ini adalah diperlukan oleh rangkaian neural yang direka untuk membanding data yang harus mempunyai kepanjangan yang sama, sedangkan perkataan yang sama dituturkan dengan kepanjangan yang berbeza–beza. Dengan melakukan padanan bingkai, bingkai input dan rujukan boleh diubahsuai supaya bilangan bingkaian sama seperti bingkaian rujukan. Satu lagi misi kertas kerja ini ialah mentakrif dan menggunakan cirian kenyaringan menggunakan algoritma penapis harmonik. Selepas kenyaringan ditakrif dan pemalar LPC dinormalkan kepada bilangan bingkaian dikehendaki, pengecaman pertuturan menggunakan rangkaian neural dilakukan. Keputusan yang baik diperoleh sehingga mencapai ketepatan setinggi 98% menggunakan kombinasi cirian DTW–FF dan cirian kenyaringan. Di akhir kertas kerja ini, perbandingan kadar convergence antara Conjugate gradient descent (CGD), Quasi–Newton, dan Steepest Gradient Descent (SGD) dilakukan untuk mendapatkan arah carian titik global yang optimal. Keputusan menunjukkan CGD memberikan nilai titik global yang paling optimal dibandingkan dengan Quasi–Newton dan SGD. Kata kunci: Warping masa dinamik, pernormalan masa, rangkaian neural, pengecaman pertuturan, conjugate gradient descent A pre–processing of linear predictive coefficient (LPC) features for preparation of reliable reference templates for the set of words to be recognized using the artificial neural network is presented in this paper. The paper also proposes the use of pitch feature derived from the recorded speech data as another input feature. The Dynamic Time Warping algorithm (DTW) is the back–bone of the newly developed algorithm called DTW fixing frame algorithm (DTW–FF) which is designed to perform template matching for the input preprocessing. The purpose of the new algorithm is to align the input frames in the test set to the template frames in the reference set. This frame normalization is required since NN is designed to compare data of the same length, however same speech varies in their length most of the time. By doing frame fixing, the input frames and the reference frames are adjusted to the same number of frames according to the reference frames. Another task of the study is to extract pitch features using the Harmonic Filter algorithm. After pitch extraction and linear predictive coefficient (LPC) features fixed to a desired number of frames, speech recognition using neural network can be performed and results showed a very promising solution. Result showed that as high as 98% recognition can be achieved using combination of two features mentioned above. At the end of the paper, a convergence comparison between conjugate gradient descent (CGD), Quasi–Newton, and steepest gradient descent (SGD) search direction is performed and results show that the CGD outperformed the Newton and SGD. Key words: Dynamic time warping, time normalization, neural network, speech recognition, conjugate gradient descent

Now every day's speech recognition is utilized broadly in numerous packages. In software program engineering and electric constructing, speech recognition (SR) is the interpretation of verbally expressed words into textual content. it's miles otherwise referred to as "computerized speech recognition" (CSR), "pc speech reputation", or most effective "speech to text" (STT). A hid Markov model (HMM) is a measurable Markov model wherein the framework being verified is notion to be a Markov process with in mystery (shrouded) states. A HMM may be introduced as the least hard dynamic Bayesian system. Dynamic time warping (DTW) is a truly understood strategy to locate a really perfect arrangement among two given (time-subordinate) groupings underneath sure confinements instinctively; the groupings are distorted in a nonlinear manner to coordinate each other. ANN is non-immediately statistics driven self-versatile methodology. it can distinguish and research co-related examples between information dataset and evaluating target esteems. Within the wake of preparing ANN may be utilized to anticipate the end result of new unfastened facts.


2014 ◽  
Vol 2014 ◽  
pp. 1-8 ◽  
Author(s):  
Ing-Jr Ding ◽  
Yen-Ming Hsu

In the past, the kernel of automatic speech recognition (ASR) is dynamic time warping (DTW), which is feature-based template matching and belongs to the category technique of dynamic programming (DP). Although DTW is an early developed ASR technique, DTW has been popular in lots of applications. DTW is playing an important role for the known Kinect-based gesture recognition application now. This paper proposed an intelligent speech recognition system using an improved DTW approach for multimedia and home automation services. The improved DTW presented in this work, called HMM-like DTW, is essentially a hidden Markov model- (HMM-) like method where the concept of the typical HMM statistical model is brought into the design of DTW. The developed HMM-like DTW method, transforming feature-based DTW recognition into model-based DTW recognition, will be able to behave as the HMM recognition technique and therefore proposed HMM-like DTW with the HMM-like recognition model will have the capability to further perform model adaptation (also known as speaker adaptation). A series of experimental results in home automation-based multimedia access service environments demonstrated the superiority and effectiveness of the developed smart speech recognition system by HMM-like DTW.


2014 ◽  
Vol 29 (6) ◽  
pp. 1072-1082 ◽  
Author(s):  
Xiang-Lilan Zhang ◽  
Zhi-Gang Luo ◽  
Ming Li

Author(s):  
Shi-bo Pan ◽  
Di-lin Pan ◽  
Nan Pan ◽  
Xiao Ye ◽  
Miaohan Zhang

Traditional gun archiving methods are mostly carried out through bullets’ physics or photography, which are inefficient and difficult to trace, and cannot meet the needs of large-scale archiving. Aiming at such problems, a rapid archival technology of bullets based on graph convolutional neural network has been studied and developed. First, the spot laser is used to take the circle points of the bullet rifling traces. The obtained data is filtered and noise-reduced to make the corresponding line graph, and then the dynamic time warping (DTW) algorithm convolutional neural network model is used to perform the processing on the processed data. Not only is similarity matched, the rapid matching of the rifling of the bullet is also accomplished. Comparison of experimental results shows that this technology has the advantages of rapid archiving and high accuracy. Furthermore, it can be carried out in large numbers at the same time, and is more suitable for practical promotion and application.


Agriculture ◽  
2022 ◽  
Vol 12 (1) ◽  
pp. 77
Author(s):  
Tsu Chiang Lei ◽  
Shiuan Wan ◽  
You Cheng Wu ◽  
Hsin-Ping Wang ◽  
Chia-Wen Hsieh

This study employed a data fusion method to extract the high-similarity time series feature index of a dataset through the integration of MS (Multi-Spectrum) and SAR (Synthetic Aperture Radar) images. The farmlands are divided into small pieces that consider the different behaviors of farmers for their planting contents in Taiwan. Hence, the conventional image classification process cannot produce good outcomes. The crop phenological information will be a core factor to multi-period image data. Accordingly, the study intends to resolve the previous problem by using three different SPOT6 satellite images and nine Sentinel-1A synthetic aperture radar images, which were used to calculate features such as texture and indicator information, in 2019. Considering that a Dynamic Time Warping (DTW) index (i) can integrate different image data sources, (ii) can integrate data of different lengths, and (iii) can generate information with time characteristics, this type of index can resolve certain classification problems with long-term crop classification and monitoring. More specifically, this study used the time series data analysis of DTW to produce “multi-scale time series feature similarity indicators”. We used three approaches (Support Vector Machine, Neural Network, and Decision Tree) to classify paddy patches into two groups: (a) the first group did not apply a DTW index, and (b) the second group extracted conflict predicted data from (a) to apply a DTW index. The outcomes from the second group performed better than the first group in regard to overall accuracy (OA) and kappa. Among those classifiers, the Neural Network approach had the largest improvement of OA and kappa from 89.51, 0.66 to 92.63, 0.74, respectively. The rest of the two classifiers also showed progress. The best performance of classification results was obtained from the Decision Tree of 94.71, 0.81. Observing the outcomes, the interference effects of the image were resolved successfully by various image problems using the spectral image and radar image for paddy rice classification. The overall accuracy and kappa showed improvement, and the maximum kappa was enhanced by about 8%. The classification performance was improved by considering the DTW index.


Sign in / Sign up

Export Citation Format

Share Document