Robust Cochlear-Model-Based Speech Recognition

Mladen Russo; Maja Stella; Marjan Sikora; Vesna Pekić

doi:10.3390/computers8010005

Robust Cochlear-Model-Based Speech Recognition

Computers ◽

10.3390/computers8010005 ◽

2019 ◽

Vol 8 (1) ◽

pp. 5 ◽

Cited By ~ 3

Author(s):

Mladen Russo ◽

Maja Stella ◽

Marjan Sikora ◽

Vesna Pekić

Keyword(s):

Speech Recognition ◽

Hair Cell ◽

Recognition System ◽

Inner Hair Cell ◽

Audio Signals ◽

Acoustic Feature ◽

Mel Frequency Cepstral Coefficients ◽

Feature Extraction Method ◽

Front End ◽

Cochlear Processing

Accurate speech recognition can provide a natural interface for human–computer interaction. Recognition rates of the modern speech recognition systems are highly dependent on background noise levels and a choice of acoustic feature extraction method can have a significant impact on system performance. This paper presents a robust speech recognition system based on a front-end motivated by human cochlear processing of audio signals. In the proposed front-end, cochlear behavior is first emulated by the filtering operations of the gammatone filterbank and subsequently by the Inner Hair cell (IHC) processing stage. Experimental results using a continuous density Hidden Markov Model (HMM) recognizer with the proposed Gammatone Hair Cell (GHC) coefficients are lower for clean speech conditions, but demonstrate significant improvement in performance in noisy conditions compared to standard Mel-Frequency Cepstral Coefficients (MFCC) baseline.

Download Full-text

The ZTSpeech system for CHiME-5 Challenge: A far-field speech recognition system with front-end and robust back-end

10.21437/chime.2018-13 ◽

2018 ◽

Author(s):

Chenxing Li ◽

Tieqiang Wang

Keyword(s):

Speech Recognition ◽

Recognition System ◽

Far Field ◽

Speech Recognition System ◽

Front End

Download Full-text

Enhancing robustness of zero resource children's speech recognition system through bispectrum based front-end acoustic features

Digital Signal Processing ◽

10.1016/j.dsp.2021.103226 ◽

2021 ◽

pp. 103226

Author(s):

S. Shahnawazuddin ◽

Avinash Kumar ◽

Saurabh Kumar ◽

Waquar Ahmad

Keyword(s):

Speech Recognition ◽

Recognition System ◽

Speech Recognition System ◽

Acoustic Features ◽

Front End ◽

Children’S Speech Recognition ◽

Children's Speech

Download Full-text

Voice Controlled Vehicle Dashboard

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.a2148.059120 ◽

2020 ◽

Vol 9 (1) ◽

pp. 1022-1027

Keyword(s):

Speech Recognition ◽

Speaker Recognition ◽

Feature Matching ◽

Heavy Traffic ◽

Recognition Rate ◽

Recognition System ◽

Low Noise ◽

Raspberry Pi ◽

Mel Frequency Cepstral Coefficients ◽

Power Spectral

Driving a vehicle or a car has become tedious job nowadays due to heavy traffic so focus on driving is utmost important. This makes a scope for automation in Automobiles in minimizing human intervention in controlling the dashboard functions such as Headlamps, Indicators, Power window, Wiper System, and to make it possible this is a small effort from this paper to make driving distraction free using Voice controlled dashboard. and system proposed in this paper works on speech commands from the user (Driver or Passenger). As Speech Recognition system acts Human machine Interface (HMI) in this system hence this system makes use of Speaker recognition and Speech recognition for recognizing the command and recognize whether the command is coming from authenticated user(Driver or Passenger). System performs Feature Extraction and extracts speech features such Mel Frequency Cepstral Coefficients(MFCC),Power Spectral Density(PSD),Pitch, Spectrogram. Then further for Feature matching system uses Vector Quantization Linde Buzo Gray(VQLBG) algorithm. This algorithm makes use of Euclidean distance for calculating the distance between test feature and codebook feature. Then based on speech command recognized controller (Raspberry Pi-3b) activates the device driver for motor, Solenoid valve depending on function. This system is mainly aimed to work in low noise environment as most speech recognition systems suffer when noise is introduced. When it comes to speech recognition acoustics of the room matters a lot as recognition rate differs depending on acoustics. when several testing and simulation trials were taken for testing, system has speech recognition rate of 76.13%. This system encourages Automation of vehicle dashboard and hence making driving Distraction Free.

Download Full-text

A Novel Information Integration Algorithm for Speech Recognition System: Basing on Adaptive Clustering and Supervised State of Acoustic Feature

Journal of Physics Conference Series ◽

10.1088/1742-6596/1229/1/012073 ◽

2019 ◽

Vol 1229 ◽

pp. 012073

Author(s):

Chen Xu ◽

Xi Xiao

Keyword(s):

Speech Recognition ◽

Information Integration ◽

Recognition System ◽

Speech Recognition System ◽

Acoustic Feature ◽

Integration Algorithm ◽

Adaptive Clustering

Download Full-text

A low-power, fixed-point, front-end feature extraction for a distributed speech recognition system

IEEE International Conference on Acoustics Speech and Signal Processing ◽

10.1109/icassp.2002.1005859 ◽

2002 ◽

Cited By ~ 8

Author(s):

Delaney ◽

Jayant ◽

Hans ◽

Simunic ◽

Acquaviva

Keyword(s):

Feature Extraction ◽

Fixed Point ◽

Speech Recognition ◽

Low Power ◽

Recognition System ◽

Speech Recognition System ◽

Distributed Speech Recognition ◽

Front End

Download Full-text

Continuous kannada speech segmentation and speech recognition based on threshold using MFCC And VQ

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v9i6.pp4684-4695 ◽

2019 ◽

Vol 9 (6) ◽

pp. 4684

Author(s):

Vanajakshi Puttaswamy Gowda ◽

Mathivanan Murugavelu ◽

Senthil Kumaran Thangamuthu

Keyword(s):

Speech Recognition ◽

Language Processing ◽

Speech Signal ◽

Recognition Rate ◽

Recognition System ◽

Training Data ◽

Speech Segmentation ◽

Significant Feature ◽

Mel Frequency Cepstral Coefficients ◽

Simple Method

<p><span>Continuous speech segmentation and its recognition is playing important role in natural language processing. Continuous context based Kannada speech segmentation depends on context, grammer and semantics rules present in the kannada language. The significant feature extraction of kannada speech signal for recognition system is quite exciting for researchers. In this paper proposed method is divided into two parts. First part of the method is continuous kannada speech signal segmentation with respect to the context based is carried out by computing average short term energy and its spectral centroid coefficients of the speech signal present in the specified window. The segmented outputs are completely meaningful segmentation for different scenarios with less segmentation error. The second part of the method is speech recognition by extracting less number Mel frequency cepstral coefficients with less number of codebooks using vector quantization .In this recognition is completely based on threshold value.This threshold setting is a challenging task however the simple method is used to achieve better recognition rate.The experimental results shows more efficient and effective segmentation with high recognition rate for any continuous context based kannada speech signal with different accents for male and female than the existing methods and also used minimal feature dimensions for training data.</span></p>

Download Full-text

Front-end of Wake-Up-Word Speech Recognition System Design on FPGA

Journal of Telecommunications System & Management ◽

10.4172/2167-0919.1000108 ◽

2013 ◽

Vol 02 (01) ◽

Author(s):

Mohamed M Eljhani Brian H Hight

Keyword(s):

Speech Recognition ◽

System Design ◽

Recognition System ◽

Speech Recognition System ◽

Front End

Download Full-text

Features Extraction for Lhasa Tibetan Speech Recognition

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.571-572.205 ◽

2014 ◽

Vol 571-572 ◽

pp. 205-208

Author(s):

Guan Yu Li ◽

Hong Zhi Yu ◽

Yong Hong Li ◽

Ning Ma

Keyword(s):

Speech Recognition ◽

Linear Prediction ◽

Recognition System ◽

Continuous Speech Recognition ◽

Mel Frequency Cepstral Coefficients ◽

Linear Prediction Coefficient ◽

Speech Feature ◽

Perceptual Linear Prediction ◽

Prediction Coefficient ◽

Speech Feature Extraction

Speech feature extraction is discussed. Mel frequency cepstral coefficients (MFCC) and perceptual linear prediction coefficient (PLP) method is analyzed. These two types of features are extracted in Lhasa large vocabulary continuous speech recognition system. Then the recognition results are compared.

Download Full-text

New Features Using Robust MVDR Spectrum of Filtered Autocorrelation Sequence for Robust Speech Recognition

The Scientific World JOURNAL ◽

10.1155/2013/634160 ◽

2013 ◽

Vol 2013 ◽

pp. 1-11 ◽

Cited By ~ 3

Author(s):

Sanaz Seyedin ◽

Seyed Mohammad Ahadi ◽

Saeed Gazor

Keyword(s):

Speech Recognition ◽

Spectral Estimation ◽

Estimation Method ◽

Minimum Variance ◽

Mel Frequency Cepstral Coefficients ◽

Feature Extraction Method ◽

Minimum Variance Distortionless Response ◽

Robust Feature Extraction ◽

Noise Robust ◽

Short Time

This paper presents a novel noise-robust feature extraction method for speech recognition using the robust perceptual minimum variance distortionless response (MVDR) spectrum of temporally filtered autocorrelation sequence. The perceptual MVDR spectrum of the filtered short-time autocorrelation sequence can reduce the effects of residue of the nonstationary additive noise which remains after filtering the autocorrelation. To achieve a more robust front-end, we also modify the robust distortionless constraint of the MVDR spectral estimation method via revised weighting of the subband power spectrum values based on the sub-band signal to noise ratios (SNRs), which adjusts it to the new proposed approach. This new function allows the components of the input signal at the frequencies least affected by noise to pass with larger weights and attenuates more effectively the noisy and undesired components. This modification results in reduction of the noise residuals of the estimated spectrum from the filtered autocorrelation sequence, thereby leading to a more robust algorithm. Our proposed method, when evaluated on Aurora 2 task for recognition purposes, outperformed all Mel frequency cepstral coefficients (MFCC) as the baseline, relative autocorrelation sequence MFCC (RAS-MFCC), and the MVDR-based features in several different noisy conditions.

Download Full-text

Formantinių požymių naudojimas kalbai atpažinti

Informacijos mokslai ◽

10.15388/im.2009.0.3236 ◽

2009 ◽

Vol 50 ◽

pp. 212-216

Author(s):

Antanas Leonas Lipeika

Keyword(s):

Feature Extraction ◽

Speech Recognition ◽

Linear Prediction ◽

Prediction Models ◽

Recognition System ◽

Feature Extraction Method ◽

Frame Length ◽

Frequency Scale ◽

Odd Order ◽

Data Points

Straipsnyje nagrinėjami formantinių požymių taikymo atpažįstant kalbą klausimai. Nustatyta, kad formantiniai požymiai tam gali būti naudojami, tačiau atpažinimo tikslumas labai priklauso nuo formantinių požymių išskyrimo metodo. Geriausi atpažinimo rezultatai gaunami formantinių požymių išskyrimui naudojant išsigimusius prognozės polinomus. Šie polinomai gali būti skaičiuojami iš lyginės arba nelyginės eilės tiesinės prognozės modelio parametrų. Be to, atpažinimui galima naudoti simetrinius arba antisimetrinius išsigimusius tiesinės prognozės polinomus. Taip pat svarbu ištirti, kaip kalbos atpažinimo rezultatai priklauso ne tik nuo išsigimusių tiesinės prognozės polinomų parinkimo, bet ir kitų atpažinimo sistemos parametrų: analizės kadro ilgio, atpažinimui naudojamų formančių skaičiaus, formantinių požymių vaizdavimui naudojamos dažnių skalės. Tyrimais nustatyta, kad geriausi atpažinimo rezultatai gaunami naudojant dvi arba tris formantes, apskaičiuotas iš simetrinių išsigimusių prognozės polinomų. Tiriant atskirų formančių informatyvumą paaiškėjo, kad didžiausias indėlis į atpažinimą yra antros formantės. Pirmos, trečios ir ketvirtos formančių indėlis maždaug vienodas, bet aukštesnės formantės mažiau atsparios balto triukšmo įtakai. Tiriant analizės kadro ilgio parinkimą nustatyta, kad geriausi atpažinimo rezultatai yra esant 500 atskaitų kadro ilgiui. Atpažinimo rezultatai taip pat gaunami geresni vaizduojant formančių trajektorijas melų skalėje.Investigation of Formant Features in Speech RecognitionAntanas Leonas Lipeika SummaryThe use of formant features in speech recognition is investigated in the paper. It was established that formant features can be used in speech recognition but recognition accuracy depends remarkably on the formant feature extraction method. The best recognition results were obtained when singular prediction polynomials were used for formant feature extraction. These polynomials can be calculated from parameters of linear prediction models of even or odd order. These polynomials can by symmetric or antisymmetric as well. Also it is important to investigate how results of speech recognition depends not only on choice of singular prediction polynomials but although on other parameters of the recognition system: frame length, number of used formants in recognition, frequency scale, used for representation of formant features. During the experiments it was defi ned that the best recognition results were obtained using 2 or 3 formants calculated from symmetric singular prediction polynomials. The experiments have shown that the most informative is the 2-nd formant. Contribution of the 1-st, 3-rd and 4-th formants is approximately similar, but higher formants are less resistant to white noise. Recognition results also depends on analysis frame length and frequency scale. The best results were obtained using 500 data points frame length and Mel frequency scale.

Download Full-text