scholarly journals Robust Cochlear-Model-Based Speech Recognition

Computers ◽  
2019 ◽  
Vol 8 (1) ◽  
pp. 5 ◽  
Author(s):  
Mladen Russo ◽  
Maja Stella ◽  
Marjan Sikora ◽  
Vesna Pekić

Accurate speech recognition can provide a natural interface for human–computer interaction. Recognition rates of the modern speech recognition systems are highly dependent on background noise levels and a choice of acoustic feature extraction method can have a significant impact on system performance. This paper presents a robust speech recognition system based on a front-end motivated by human cochlear processing of audio signals. In the proposed front-end, cochlear behavior is first emulated by the filtering operations of the gammatone filterbank and subsequently by the Inner Hair cell (IHC) processing stage. Experimental results using a continuous density Hidden Markov Model (HMM) recognizer with the proposed Gammatone Hair Cell (GHC) coefficients are lower for clean speech conditions, but demonstrate significant improvement in performance in noisy conditions compared to standard Mel-Frequency Cepstral Coefficients (MFCC) baseline.

2020 ◽  
Vol 9 (1) ◽  
pp. 1022-1027

Driving a vehicle or a car has become tedious job nowadays due to heavy traffic so focus on driving is utmost important. This makes a scope for automation in Automobiles in minimizing human intervention in controlling the dashboard functions such as Headlamps, Indicators, Power window, Wiper System, and to make it possible this is a small effort from this paper to make driving distraction free using Voice controlled dashboard. and system proposed in this paper works on speech commands from the user (Driver or Passenger). As Speech Recognition system acts Human machine Interface (HMI) in this system hence this system makes use of Speaker recognition and Speech recognition for recognizing the command and recognize whether the command is coming from authenticated user(Driver or Passenger). System performs Feature Extraction and extracts speech features such Mel Frequency Cepstral Coefficients(MFCC),Power Spectral Density(PSD),Pitch, Spectrogram. Then further for Feature matching system uses Vector Quantization Linde Buzo Gray(VQLBG) algorithm. This algorithm makes use of Euclidean distance for calculating the distance between test feature and codebook feature. Then based on speech command recognized controller (Raspberry Pi-3b) activates the device driver for motor, Solenoid valve depending on function. This system is mainly aimed to work in low noise environment as most speech recognition systems suffer when noise is introduced. When it comes to speech recognition acoustics of the room matters a lot as recognition rate differs depending on acoustics. when several testing and simulation trials were taken for testing, system has speech recognition rate of 76.13%. This system encourages Automation of vehicle dashboard and hence making driving Distraction Free.


Author(s):  
Vanajakshi Puttaswamy Gowda ◽  
Mathivanan Murugavelu ◽  
Senthil Kumaran Thangamuthu

<p><span>Continuous speech segmentation and its  recognition is playing important role in natural language processing. Continuous context based Kannada speech segmentation depends  on context, grammer and semantics rules present in the kannada language. The significant feature extraction of kannada speech signal  for recognition system is quite exciting for researchers. In this paper proposed method  is  divided into two parts. First part of the method is continuous kannada speech signal segmentation with respect to the context based is carried out  by computing  average short term energy and its spectral centroid coefficients of  the speech signal present in the specified window. The segmented outputs are completely  meaningful  segmentation  for different scenarios with less segmentation error. The second part of the method is speech recognition by extracting less number Mel frequency cepstral coefficients with less  number of codebooks  using vector quantization .In this recognition is completely based on threshold value.This threshold setting is a challenging task however the simple method is used to achieve better recognition rate.The experimental results shows more efficient  and effective segmentation    with high recognition rate for any continuous context based kannada speech signal with different accents for male and female than the existing methods and also used minimal feature dimensions for training data.</span></p>


2014 ◽  
Vol 571-572 ◽  
pp. 205-208
Author(s):  
Guan Yu Li ◽  
Hong Zhi Yu ◽  
Yong Hong Li ◽  
Ning Ma

Speech feature extraction is discussed. Mel frequency cepstral coefficients (MFCC) and perceptual linear prediction coefficient (PLP) method is analyzed. These two types of features are extracted in Lhasa large vocabulary continuous speech recognition system. Then the recognition results are compared.


2013 ◽  
Vol 2013 ◽  
pp. 1-11 ◽  
Author(s):  
Sanaz Seyedin ◽  
Seyed Mohammad Ahadi ◽  
Saeed Gazor

This paper presents a novel noise-robust feature extraction method for speech recognition using the robust perceptual minimum variance distortionless response (MVDR) spectrum of temporally filtered autocorrelation sequence. The perceptual MVDR spectrum of the filtered short-time autocorrelation sequence can reduce the effects of residue of the nonstationary additive noise which remains after filtering the autocorrelation. To achieve a more robust front-end, we also modify the robust distortionless constraint of the MVDR spectral estimation method via revised weighting of the subband power spectrum values based on the sub-band signal to noise ratios (SNRs), which adjusts it to the new proposed approach. This new function allows the components of the input signal at the frequencies least affected by noise to pass with larger weights and attenuates more effectively the noisy and undesired components. This modification results in reduction of the noise residuals of the estimated spectrum from the filtered autocorrelation sequence, thereby leading to a more robust algorithm. Our proposed method, when evaluated on Aurora 2 task for recognition purposes, outperformed all Mel frequency cepstral coefficients (MFCC) as the baseline, relative autocorrelation sequence MFCC (RAS-MFCC), and the MVDR-based features in several different noisy conditions.


2009 ◽  
Vol 50 ◽  
pp. 212-216
Author(s):  
Antanas Leonas Lipeika

Straipsnyje nagrinėjami formantinių požymių taikymo atpažįstant kalbą klausimai. Nustatyta, kad formantiniai požymiai tam gali būti naudojami, tačiau atpažinimo tikslumas labai priklauso nuo formantinių požymių išskyrimo metodo. Geriausi atpažinimo rezultatai gaunami formantinių požymių išskyrimui naudojant išsigimusius prognozės polinomus. Šie polinomai gali būti skaičiuojami iš lyginės arba nelyginės eilės tiesinės prognozės modelio parametrų. Be to, atpažinimui galima naudoti simetrinius arba antisimetrinius išsigimusius tiesinės prognozės polinomus. Taip pat svarbu ištirti, kaip kalbos atpažinimo rezultatai priklauso ne tik nuo išsigimusių tiesinės prognozės polinomų parinkimo, bet ir kitų atpažinimo sistemos parametrų: analizės kadro ilgio, atpažinimui naudojamų formančių skaičiaus, formantinių požymių vaizdavimui naudojamos dažnių skalės. Tyrimais nustatyta, kad geriausi atpažinimo rezultatai gaunami naudojant dvi arba tris formantes, apskaičiuotas iš simetrinių išsigimusių prognozės polinomų. Tiriant atskirų formančių informatyvumą paaiškėjo, kad didžiausias indėlis į atpažinimą yra antros formantės. Pirmos, trečios ir ketvirtos formančių indėlis maždaug vienodas, bet aukštesnės formantės mažiau atsparios balto triukšmo įtakai. Tiriant analizės kadro ilgio parinkimą nustatyta, kad geriausi atpažinimo rezultatai yra esant 500 atskaitų kadro ilgiui. Atpažinimo rezultatai taip pat gaunami geresni vaizduojant formančių trajektorijas melų skalėje.Investigation of Formant Features in Speech RecognitionAntanas Leonas Lipeika SummaryThe use of formant features in speech recognition is investigated in the paper. It was established that formant features can be used in speech recognition but recognition accuracy depends remarkably on the formant feature extraction method. The best recognition results were obtained when singular prediction polynomials were used for formant feature extraction. These polynomials can be calculated from parameters of linear prediction models of even or odd order. These polynomials can by symmetric or antisymmetric as well. Also it is important to investigate how results of speech recognition depends not only on choice of singular prediction polynomials but although on other parameters of the recognition system: frame length, number of used formants in recognition, frequency scale, used for representation of formant features. During the experiments it was defi ned that the best recognition results were obtained using 2 or 3 formants calculated from symmetric singular prediction polynomials. The experiments have shown that the most informative is the 2-nd formant. Contribution of the 1-st, 3-rd and 4-th formants is approximately similar, but higher formants are less resistant to white noise. Recognition results also depends on analysis frame length and frequency scale. The best results were obtained using 500 data points frame length and Mel frequency scale.


Sign in / Sign up

Export Citation Format

Share Document