scholarly journals Enhancing robustness of speech recognizers by bimodal features

2006 ◽  
Vol 19 (2) ◽  
pp. 287-298
Author(s):  
Inge Gavat ◽  
Gabriel Costache ◽  
Claudia Iancu

In this paper a robust speech recognizer is presented based on features obtained from the speech signal and also from the image of the speaker. The features were combined by simple concatenation, resulting composed feature vectors to train the models corresponding to each class. For recognition, the classification process relies on a very effective algorithm, namely the multiclass SVM. Under additive noise conditions the bimodal system based on combined features acts better than the unimodal system, based only on the speech features, the added information obtained from the image playing an important role in robustness improvement.

2013 ◽  
Vol 25 (12) ◽  
pp. 3294-3317 ◽  
Author(s):  
Lijiang Chen ◽  
Xia Mao ◽  
Pengfei Wei ◽  
Angelo Compare

This study proposes two classes of speech emotional features extracted from electroglottography (EGG) and speech signal. The power-law distribution coefficients (PLDC) of voiced segments duration, pitch rise duration, and pitch down duration are obtained to reflect the information of vocal folds excitation. The real discrete cosine transform coefficients of the normalized spectrum of EGG and speech signal are calculated to reflect the information of vocal tract modulation. Two experiments are carried out. One is of proposed features and traditional features based on sequential forward floating search and sequential backward floating search. The other is the comparative emotion recognition based on support vector machine. The results show that proposed features are better than those commonly used in the case of speaker-independent and content-independent speech emotion recognition.


2018 ◽  
Vol 32 (16) ◽  
pp. 1850169 ◽  
Author(s):  
Bingchang Zhou ◽  
Qianqian Qi

We investigate the phenomenon of stochastic resonance (SR) in parallel integrate-and-fire neuronal arrays with threshold driven by additive noise or signal-dependent noise (SDN) and a noisy input signal. SR occurs in this system. Whether the system is subject to the additive noise or SDN, the input noise [Formula: see text] weakens the performance of SR but the array size N and signal parameter [Formula: see text] promote the performance of SR. Signal parameter [Formula: see text] promotes the performance of SR for the additive noise, but the peak values of the output signal-to-noise ratio [Formula: see text] first decrease, then increase as [Formula: see text] increases for the SDN. Moreover, when [Formula: see text] tends to infinity, for the SDN, the curve of [Formula: see text] first increases and then decreases, however, for the additive noise, the curve of [Formula: see text] increases to reach a plain. By comparing system performance with the additive noise to one with SDN, we also find that the information transmission of a periodic signal with SDN is significantly better than one with the additive noise in limited array size N.


1989 ◽  
Vol 54 (6) ◽  
pp. 1446-1463 ◽  
Author(s):  
Petr Voňka ◽  
Pavel Dittrich ◽  
Josef P. Novák

The temperature dependence of parameter a = a(Tr) of the Redlich-Kwong-Soave equation of state was modified. To calculate the corresponding individual parameters, an effective algorithm applying the Newton method was proposed. The parameters were determined for 60 substances, and the new modification correlates saturated vapour pressures from the values of pr = 0.001 to pr = 1.0 with the accuracy which is usually better than 0.2% and is comparable with that attained in terms of the Wagner equation. The modification proposed is utilized above all when applying the equation of state to the calculation of vapour-liquid equilibrium.


2021 ◽  
Author(s):  
Ahana priynaka ◽  
Kavitha Ganesan

Abstract Prognosis of in a dementia disorder is a tedious task in preclinical stage. Ventricle pathology changes in dementia appear to be overlapped for neuro degeneration in brain. Identification of these overlaps among the groups severity helps to understand the pathogenesis of this disorder. In this work impact of changes in ventricle region on severity stages of dementia is observed using dual deep learning techniques (DDLT). Alzheimer's Disease Neuroimaging Initiative (ADNI) database that contains 1169 MR images are used in this study. Segmentation of ventricle region is carried out using multilevel threshold based Grey Wolf Optimization (GWO) technique. The feature vectors obtained from combined AlexNet and ResNet are analysed. The fused feature vectors are given to support vector machine (SVM) to observe the severity changes. Consequently, symmetry analysis of ventricle is carried out to perceive the distinctive changes in progression. The obtained results show that ventricle region is accurately delineated from other region with optimized thresholds. The segmented ventricle shows better correlation for all considered classes (> 0.9). It is observed that DDLT with multiclass SVM provides an improved accuracy of about 79.87% compared to individual transfer learning such as AlexNet (74%) and ResNet (76.53%). Further, symmetry analysis shows that left side ventricle with DDLT features shows an improved performance than right side for onset stages. Further, clinical correlation of left ventricle seems to be statically significant (p<0.0001) which prominently differentiate dementia severity variations. This framework is more prominent and clinically useful to identify the distinct ventricle region variation in dementia.


2021 ◽  
Vol 38 (1) ◽  
pp. 015006
Author(s):  
Huanmin Ge ◽  
Peng Li

Abstract In the paper, we proposed the Dantzig selector based on the ℓ 1 − αℓ 2 (0 < α ⩽ 1) minimization for the signal recovery. In the Dantzig selector, the constraint ‖ A ⊤ ( b − Ax )‖∞ ⩽ η for some small constant η > 0 means the columns of A has very weakly correlated with the error vector e = Ax − b . First, recovery guarantees based on the restricted isometry property are established for signals. Next, we propose the effective algorithm to solve the proposed Dantzig selector. Last, we illustrate the proposed model and algorithm by extensive numerical experiments for the recovery of signals in the cases of Gaussian, impulsive and uniform noises. And the performance of the proposed Dantzig selector is better than that of the existing methods.


2019 ◽  
Vol 25 (11) ◽  
pp. 55-66
Author(s):  
Huda Majed Swadi ◽  
Hamid Mohammed Ali

Mobile-based human emotion recognition is very challenging subject, most of the approaches suggested and built in this field utilized various contexts that can be derived from the external sensors and the smartphone, but these approaches suffer from different obstacles and challenges. The proposed system integrated human speech signal and heart rate, in one system, to leverage the accuracy of the human emotion recognition. The proposed system is designed to recognize four human emotions; angry, happy, sad and normal. In this system, the smartphone is used to   record user speech and send it to a server. The smartwatch, fixed on user wrist, is used to measure user heart rate while the user is speaking and send it, via Bluetooth, to the smartphone which in turn sends it to the server. At the server side, the speech features are extracted from the speech signal to be classified by neural network. To minimize the misclassification of the neural network, the user heart rate measurement is used to direct the extracted speech features to either excited (angry and happy) neural network or to the calm (sad and normal) neural network. In spite of the challenges associated with the system, the system achieved 96.49% for known speakers and 79.05% for unknown speakers


Author(s):  
Intan Nurma Yulita Houw Liong The ◽  
◽  
Adiwijaya ◽  

Indonesia has many tribes, so that there are many dialects. Speech classification is difficult if the database uses speech signals from various people who have different characteristics because of gender and dialect. The different characteristics will influence frequency, intonation, amplitude, and period of the speech. It makes the system must be trained for the various templates reference of speech signal. Therefore, this study has been developed for Indonesian speech classification. The solution is a new combination of fuzzy on hidden Markov models. The result shows a new version of fuzzy hiddenMarkovmodels is better than hidden Markov model.


2006 ◽  
Vol 5 (1) ◽  
pp. 7-14
Author(s):  
K. V. N. Sunitha ◽  
A. Sharada

Speech recognizers usually consist of a language model, a lexicon and a collection of phone models. The lexicon for a language is important to improve the efficacy of speech recognizer for a language. Traditionally building a lexicon for a language was a significant piece of work taking several expert linguists perhaps several years to construct a lexicon with reasonable coverage. However we include a method here that can cut this time significantly. The basic idea is add the most common words to a lexicon where explicitly the user of the system gives the new word, then automatically build letter to sound rules from the initial data. The word entered might be of any arbitrary length. Over multiple passes the lexicon and letter to sound rules will improve. As each pass the letter to sound rules are re-generate with the new data making them more correct. This paper presents the work done in building a lexicon for Telugu language. The major objective is to make the speech systems for Telugu language more proficient. This technique has been proved successful for a number of languages cutting the amount to time and effort to perhaps checking thousands of words rather than tens of thousands of words. It also is a structured method that requires only knowledge of the basic language to carry out.


Sign in / Sign up

Export Citation Format

Share Document