Enhancing robustness of speech recognizers by bimodal features

Inge Gavat; Gabriel Costache; Claudia Iancu

doi:10.2298/fuee0602287g

Enhancing robustness of speech recognizers by bimodal features

Facta universitatis - series Electronics and Energetics ◽

10.2298/fuee0602287g ◽

2006 ◽

Vol 19 (2) ◽

pp. 287-298

Author(s):

Inge Gavat ◽

Gabriel Costache ◽

Claudia Iancu

Keyword(s):

Speech Signal ◽

Additive Noise ◽

Effective Algorithm ◽

Feature Vectors ◽

Speech Recognizers ◽

Combined Features ◽

Speech Features ◽

Multiclass Svm ◽

Speech Recognizer ◽

Better Than

In this paper a robust speech recognizer is presented based on features obtained from the speech signal and also from the image of the speaker. The features were combined by simple concatenation, resulting composed feature vectors to train the models corresponding to each class. For recognition, the classification process relies on a very effective algorithm, namely the multiclass SVM. Under additive noise conditions the bimodal system based on combined features acts better than the unimodal system, based only on the speech features, the added information obtained from the image playing an important role in robustness improvement.

Download Full-text

Speech Emotional Features Extraction Based on Electroglottograph

Neural Computation ◽

10.1162/neco_a_00523 ◽

2013 ◽

Vol 25 (12) ◽

pp. 3294-3317 ◽

Cited By ~ 7

Author(s):

Lijiang Chen ◽

Xia Mao ◽

Pengfei Wei ◽

Angelo Compare

Keyword(s):

Emotion Recognition ◽

Speech Signal ◽

Vocal Tract ◽

Vocal Folds ◽

Distribution Coefficients ◽

Speech Emotion Recognition ◽

Support Vector ◽

Power Law Distribution ◽

Transform Coefficients ◽

Better Than

This study proposes two classes of speech emotional features extracted from electroglottography (EGG) and speech signal. The power-law distribution coefficients (PLDC) of voiced segments duration, pitch rise duration, and pitch down duration are obtained to reflect the information of vocal folds excitation. The real discrete cosine transform coefficients of the normalized spectrum of EGG and speech signal are calculated to reflect the information of vocal tract modulation. Two experiments are carried out. One is of proposed features and traditional features based on sequential forward floating search and sequential backward floating search. The other is the comparative emotion recognition based on support vector machine. The results show that proposed features are better than those commonly used in the case of speaker-independent and content-independent speech emotion recognition.

Download Full-text

Stochastic resonance in an array of integrate-and-fire neurons with threshold

Modern Physics Letters B ◽

10.1142/s0217984918501695 ◽

2018 ◽

Vol 32 (16) ◽

pp. 1850169 ◽

Cited By ~ 1

Author(s):

Bingchang Zhou ◽

Qianqian Qi

Keyword(s):

Stochastic Resonance ◽

Additive Noise ◽

Signal To Noise Ratio ◽

Array Size ◽

Periodic Signal ◽

Signal Parameter ◽

Input Noise ◽

Integrate And Fire ◽

Noisy Input ◽

Better Than

We investigate the phenomenon of stochastic resonance (SR) in parallel integrate-and-fire neuronal arrays with threshold driven by additive noise or signal-dependent noise (SDN) and a noisy input signal. SR occurs in this system. Whether the system is subject to the additive noise or SDN, the input noise [Formula: see text] weakens the performance of SR but the array size N and signal parameter [Formula: see text] promote the performance of SR. Signal parameter [Formula: see text] promotes the performance of SR for the additive noise, but the peak values of the output signal-to-noise ratio [Formula: see text] first decrease, then increase as [Formula: see text] increases for the SDN. Moreover, when [Formula: see text] tends to infinity, for the SDN, the curve of [Formula: see text] first increases and then decreases, however, for the additive noise, the curve of [Formula: see text] increases to reach a plain. By comparing system performance with the additive noise to one with SDN, we also find that the information transmission of a periodic signal with SDN is significantly better than one with the additive noise in limited array size N.

Download Full-text

Deep Convolutional Neural Network based Speech Signal Enhancement using Extensive Speech Features

International Journal of Computational Methods ◽

10.1142/s0219876221420056 ◽

2021 ◽

Author(s):

Anil Garg ◽

O. P. Sahu

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Speech Signal ◽

Signal Enhancement ◽

Deep Convolutional Neural Network ◽

Speech Features

Download Full-text

A modification of the Redlich-Kwong-Soave equation of state and the determination of its parameters on the basis of saturated vapour pressures and second virial coefficients of pure substances

Collection of Czechoslovak Chemical Communications ◽

10.1135/cccc19891446 ◽

1989 ◽

Vol 54 (6) ◽

pp. 1446-1463 ◽

Cited By ~ 2

Author(s):

Petr Voňka ◽

Pavel Dittrich ◽

Josef P. Novák

Keyword(s):

Equation Of State ◽

Virial Coefficients ◽

Effective Algorithm ◽

Liquid Equilibrium ◽

Second Virial Coefficients ◽

Saturated Vapour ◽

Pure Substances ◽

Better Than ◽

Wagner Equation

The temperature dependence of parameter a = a(Tr) of the Redlich-Kwong-Soave equation of state was modified. To calculate the corresponding individual parameters, an effective algorithm applying the Newton method was proposed. The parameters were determined for 60 substances, and the new modification correlates saturated vapour pressures from the values of pr = 0.001 to pr = 1.0 with the accuracy which is usually better than 0.2% and is comparable with that attained in terms of the Wagner equation. The modification proposed is utilized above all when applying the equation of state to the calculation of vapour-liquid equilibrium.

Download Full-text

Study of Dementia Severity Prognosis in Mr Images Using Grey Wolf Optimization Based Dual Deep Learning Technique

10.21203/rs.3.rs-412358/v1 ◽

2021 ◽

Author(s):

Ahana priynaka ◽

Kavitha Ganesan

Keyword(s):

Deep Learning ◽

Symmetry Analysis ◽

Support Vector ◽

Grey Wolf ◽

Mr Images ◽

Grey Wolf Optimization ◽

Feature Vectors ◽

Dementia Severity ◽

Improved Performance ◽

Multiclass Svm

Abstract Prognosis of in a dementia disorder is a tedious task in preclinical stage. Ventricle pathology changes in dementia appear to be overlapped for neuro degeneration in brain. Identification of these overlaps among the groups severity helps to understand the pathogenesis of this disorder. In this work impact of changes in ventricle region on severity stages of dementia is observed using dual deep learning techniques (DDLT). Alzheimer's Disease Neuroimaging Initiative (ADNI) database that contains 1169 MR images are used in this study. Segmentation of ventricle region is carried out using multilevel threshold based Grey Wolf Optimization (GWO) technique. The feature vectors obtained from combined AlexNet and ResNet are analysed. The fused feature vectors are given to support vector machine (SVM) to observe the severity changes. Consequently, symmetry analysis of ventricle is carried out to perceive the distinctive changes in progression. The obtained results show that ventricle region is accurately delineated from other region with optimized thresholds. The segmented ventricle shows better correlation for all considered classes (> 0.9). It is observed that DDLT with multiclass SVM provides an improved accuracy of about 79.87% compared to individual transfer learning such as AlexNet (74%) and ResNet (76.53%). Further, symmetry analysis shows that left side ventricle with DDLT features shows an improved performance than right side for onset stages. Further, clinical correlation of left ventricle seems to be statically significant (p<0.0001) which prominently differentiate dementia severity variations. This framework is more prominent and clinically useful to identify the distinct ventricle region variation in dementia.

Download Full-text

The Dantzig selector: recovery of signal via ℓ 1 − αℓ 2 minimization

Inverse Problems ◽

10.1088/1361-6420/ac39f8 ◽

2021 ◽

Vol 38 (1) ◽

pp. 015006

Author(s):

Huanmin Ge ◽

Peng Li

Keyword(s):

Numerical Experiments ◽

Restricted Isometry Property ◽

Signal Recovery ◽

Effective Algorithm ◽

Dantzig Selector ◽

Error Vector ◽

Small Constant ◽

Proposed Model ◽

Better Than

Abstract In the paper, we proposed the Dantzig selector based on the ℓ 1 − αℓ 2 (0 < α ⩽ 1) minimization for the signal recovery. In the Dantzig selector, the constraint ‖ A ⊤ ( b − Ax )‖∞ ⩽ η for some small constant η > 0 means the columns of A has very weakly correlated with the error vector e = Ax − b . First, recovery guarantees based on the restricted isometry property are established for signals. Next, we propose the effective algorithm to solve the proposed Dantzig selector. Last, we illustrate the proposed model and algorithm by extensive numerical experiments for the recovery of signals in the cases of Gaussian, impulsive and uniform noises. And the performance of the proposed Dantzig selector is better than that of the existing methods.

Download Full-text

Mobile-based Human Emotion Recognition based on Speech and Heart rate

Journal of Engineering ◽

10.31026/j.eng.2019.11.05 ◽

2019 ◽

Vol 25 (11) ◽

pp. 55-66

Author(s):

Huda Majed Swadi ◽

Hamid Mohammed Ali

Keyword(s):

Neural Network ◽

Heart Rate ◽

Emotion Recognition ◽

Speech Signal ◽

Rate Measurement ◽

Human Speech ◽

Human Emotion ◽

Server Side ◽

The Neural Network ◽

Speech Features

Mobile-based human emotion recognition is very challenging subject, most of the approaches suggested and built in this field utilized various contexts that can be derived from the external sensors and the smartphone, but these approaches suffer from different obstacles and challenges. The proposed system integrated human speech signal and heart rate, in one system, to leverage the accuracy of the human emotion recognition. The proposed system is designed to recognize four human emotions; angry, happy, sad and normal. In this system, the smartphone is used to record user speech and send it to a server. The smartwatch, fixed on user wrist, is used to measure user heart rate while the user is speaking and send it, via Bluetooth, to the smartphone which in turn sends it to the server. At the server side, the speech features are extracted from the speech signal to be classified by neural network. To minimize the misclassification of the neural network, the user heart rate measurement is used to direct the extracted speech features to either excited (angry and happy) neural network or to the calm (sad and normal) neural network. In spite of the challenges associated with the system, the system achieved 96.49% for known speakers and 79.05% for unknown speakers

Download Full-text

Fuzzy Hidden Markov Models for Indonesian Speech Classification

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2012.p0381 ◽

2012 ◽

Vol 16 (3) ◽

pp. 381-387 ◽

Cited By ~ 9

Author(s):

Intan Nurma Yulita Houw Liong The ◽

◽

Adiwijaya ◽

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Hidden Markov Models ◽

Speech Signal ◽

Markov Models ◽

Hidden Markov ◽

New Combination ◽

Speech Classification ◽

Different Characteristics ◽

Better Than

Indonesia has many tribes, so that there are many dialects. Speech classification is difficult if the database uses speech signals from various people who have different characteristics because of gender and dialect. The different characteristics will influence frequency, intonation, amplitude, and period of the speech. It makes the system must be trained for the various templates reference of speech signal. Therefore, this study has been developed for Indonesian speech classification. The solution is a new combination of fuzzy on hidden Markov models. The result shows a new version of fuzzy hiddenMarkovmodels is better than hidden Markov model.

Download Full-text

Estimation of Quality and Intelligibility of a Speech Signal with varying forms of Additive Noise

International Journal of Emerging Trends in Engineering Research ◽

10.30534/ijeter/2019/057112019 ◽

2019 ◽

Vol 7 (11) ◽

pp. 430-433

Author(s):

Venkata Narayana V ◽

Keyword(s):

Speech Signal ◽

Additive Noise

Download Full-text

BUILDING LEXICON FOR TELUGU SPEECH RECOGNITION

INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY ◽

10.24297/ijct.v5i1.4381 ◽

2006 ◽

Vol 5 (1) ◽

pp. 7-14

Author(s):

K. V. N. Sunitha ◽

A. Sharada

Keyword(s):

Speech Recognition ◽

Initial Data ◽

Language Model ◽

Major Objective ◽

Arbitrary Length ◽

Basic Language ◽

Speech Recognizers ◽

Speech Recognizer ◽

Work Done

Speech recognizers usually consist of a language model, a lexicon and a collection of phone models. The lexicon for a language is important to improve the efficacy of speech recognizer for a language. Traditionally building a lexicon for a language was a significant piece of work taking several expert linguists perhaps several years to construct a lexicon with reasonable coverage. However we include a method here that can cut this time significantly. The basic idea is add the most common words to a lexicon where explicitly the user of the system gives the new word, then automatically build letter to sound rules from the initial data. The word entered might be of any arbitrary length. Over multiple passes the lexicon and letter to sound rules will improve. As each pass the letter to sound rules are re-generate with the new data making them more correct. This paper presents the work done in building a lexicon for Telugu language. The major objective is to make the speech systems for Telugu language more proficient. This technique has been proved successful for a number of languages cutting the amount to time and effort to perhaps checking thousands of words rather than tens of thousands of words. It also is a structured method that requires only knowledge of the basic language to carry out.

Download Full-text