A Two-Level Speaker Identification System via Fusion of Heterogeneous Classifiers and Complementary Feature Cooperation

Mohammad Al-Qaderi; Elfituri Lahamer; Ahmad Rad

doi:10.3390/s21155097

A Two-Level Speaker Identification System via Fusion of Heterogeneous Classifiers and Complementary Feature Cooperation

Sensors ◽

10.3390/s21155097 ◽

2021 ◽

Vol 21 (15) ◽

pp. 5097

Author(s):

Mohammad Al-Qaderi ◽

Elfituri Lahamer ◽

Ahmad Rad

Keyword(s):

Speaker Identification ◽

Signal To Noise Ratio ◽

Support Vector ◽

Spectral Features ◽

Prosodic Features ◽

Signal To Noise ◽

Short Term ◽

Classifier System ◽

Cepstral Coefficients ◽

Noise Ratio

We present a new architecture to address the challenges of speaker identification that arise in interaction of humans with social robots. Though deep learning systems have led to impressive performance in many speech applications, limited speech data at training stage and short utterances with background noise at test stage present challenges and are still open problems as no optimum solution has been reported to date. The proposed design employs a generative model namely the Gaussian mixture model (GMM) and a discriminative model—support vector machine (SVM) classifiers as well as prosodic features and short-term spectral features to concurrently classify a speaker’s gender and his/her identity. The proposed architecture works in a semi-sequential manner consisting of two stages: the first classifier exploits the prosodic features to determine the speaker’s gender which in turn is used with the short-term spectral features as inputs to the second classifier system in order to identify the speaker. The second classifier system employs two types of short-term spectral features; namely mel-frequency cepstral coefficients (MFCC) and gammatone frequency cepstral coefficients (GFCC) as well as gender information as inputs to two different classifiers (GMM and GMM supervector-based SVM) which in total leads to construction of four classifiers. The outputs from the second stage classifiers; namely GMM-MFCC maximum likelihood classifier (MLC), GMM-GFCC MLC, GMM-MFCC supervector SVM, and GMM-GFCC supervector SVM are fused at score level by the weighted Borda count approach. The weight factors are computed on the fly via Mamdani fuzzy inference system that its inputs are the signal to noise ratio and the length of utterance. Experimental evaluations suggest that the proposed architecture and the fusion framework are promising and can improve the recognition performance of the system in challenging environments where the signal-to-noise ratio is low, and the length of utterance is short; such scenarios often arise in social robot interactions with humans.

Download Full-text

Near-field sound source localization using principal component analysis–multi-output support vector regression

International Journal of Distributed Sensor Networks ◽

10.1177/1550147720916405 ◽

2020 ◽

Vol 16 (4) ◽

pp. 155014772091640

Author(s):

Lanmei Wang ◽

Yao Wang ◽

Guibao Wang ◽

Jianke Jia

Keyword(s):

Principal Component Analysis ◽

Support Vector Regression ◽

Covariance Matrix ◽

Signal To Noise Ratio ◽

Near Field ◽

Principal Component ◽

Component Analysis ◽

Support Vector ◽

Signal To Noise ◽

Noise Ratio

In this article, principal component analysis method, which is applied to image compression and feature extraction, is introduced into the dimension reduction of input characteristic variable of support vector regression, and a method of joint estimation of near-field angle and range based on principal component analysis dimension reduction is proposed. Signal-to-noise ratio and calculation amount are the decisive factors affecting the performance of the algorithm. Principal component analysis is used to fuse the main characteristics of training data and discard redundant information, the signal-to-noise ratio is improved, and the calculation amount is reduced accordingly. Similarly, support vector regression is used to model the signal, and the upper triangular elements of the signal covariance matrix are usually used as input features. Since the covariance matrix has more upper triangular elements, training it as a feature input will affect the training speed to some extent. Principal component analysis is used to reduce the dimensionality of the upper triangular element of the covariance matrix of the known signal, and it is used as the input feature of the multi-output support vector regression machine to construct the near-field parameter estimation model, and the parameter estimation of unknown signal is herein obtained. Simulation results show that this method has high estimation accuracy and training speed, and has strong adaptability at low signal-to-noise ratio, and the performance is better than that of the back-propagation neural network algorithm and the two-step multiple signal classification algorithm.

Download Full-text

Detection and Recognition of Stop Consonants by Normal.Hearing and Hearing Impaired Listeners

Journal of Speech Language and Hearing Research ◽

10.1044/jshr.3504.942 ◽

1992 ◽

Vol 35 (4) ◽

pp. 942-949 ◽

Cited By ~ 17

Author(s):

Christopher W. Turner ◽

David A. Fabry ◽

Stephanie Barrett ◽

Amy R. Horwitz

Keyword(s):

Signal To Noise Ratio ◽

Recognition Performance ◽

Normal Hearing ◽

Hearing Impaired ◽

Stop Consonants ◽

Signal To Noise ◽

Short Term ◽

Masking Noise ◽

Noise Ratio ◽

Detection And Recognition

This study examined the possibility that hearing-impaired listeners, in addition to displaying poorer-than-normal recognition of speech presented in background noise, require a larger signal-to-noise ratio for the detection of the speech sounds. Psychometric functions for the detection and recognition of stop consonants were obtained from both normal-hearing and hearing-impaired listeners. Expressing the speech levels in terms of their short-term spectra, the detection of consonants for both subject groups occurred at the same signal-to-noise ratio. In contrast, the hearing-impaired listeners displayed poorer recognition performance than the normal-hearing listeners. These results imply that the higher signal-to-noise ratios required for a given level of recognition by some subjects with hearing loss are not due in part to a deficit in detection of the signals in the masking noise, but rather are due exclusively to a deficit in recognition.

Download Full-text

A Blind Spectrum Sensing Method Based on Deep Learning

Sensors ◽

10.3390/s19102270 ◽

2019 ◽

Vol 19 (10) ◽

pp. 2270 ◽

Cited By ~ 6

Author(s):

Kai Yang ◽

Zhitao Huang ◽

Xiang Wang ◽

Xueqiong Li

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Spectrum Sensing ◽

Short Term Memory ◽

Signal To Noise Ratio ◽

Signal To Noise ◽

Short Term ◽

Term Memory ◽

Long Short Term Memory ◽

Noise Ratio

Spectrum sensing is one of the technologies that is used to solve the current problem of low utilization of spectrum resources. However, when the signal-to-noise ratio is low, current spectrum sensing methods cannot well-handle a situation in which the prior information of the licensed user signal is lacking. In this paper, a blind spectrum sensing method based on deep learning is proposed that uses three kinds of neural networks together, namely convolutional neural networks, long short-term memory, and fully connected neural networks. Experiments show that the proposed method has better performance than an energy detector, especially when the signal-to-noise ratio is low. At the same time, this paper also analyzes the effect of different long short-term memory layers on detection performance, and explores why the deep-learning-based detector can achieve better performance.

Download Full-text

On the signal-to-noise ratio and short-term stability of passive rubidium frequency standards

IEEE Transactions on Instrumentation and Measurement ◽

10.1109/tim.1981.6312408 ◽

1981 ◽

Vol IM-30 (4) ◽

pp. 277-282 ◽

Cited By ~ 67

Author(s):

Jacques Vanier ◽

Laurent-Guy Bernier

Keyword(s):

Signal To Noise Ratio ◽

Signal To Noise ◽

Short Term ◽

Term Stability ◽

Frequency Standards ◽

Noise Ratio

Download Full-text

Mechanism for optimization of signal-to-noise ratio of dopamine release based on short-term bidirectional plasticity

Brain Research ◽

10.1016/j.brainres.2017.05.002 ◽

2017 ◽

Vol 1667 ◽

pp. 68-73

Author(s):

Claudio Da Cunha ◽

Eric McKimm ◽

Rafael M. Da Cunha ◽

Suelen L. Boschen ◽

Peter Redgrave ◽

...

Keyword(s):

Dopamine Release ◽

Signal To Noise Ratio ◽

Signal To Noise ◽

Short Term ◽

Noise Ratio

Download Full-text

A MODWT-Based Algorithm for the Identification and Removal of Jumps/Short-Term Distortions in Displacement Measurements Used for Structural Health Monitoring

IoT ◽

10.3390/iot3010003 ◽

2021 ◽

Vol 3 (1) ◽

pp. 60-72

Author(s):

Davi V. Q. Rodrigues ◽

Delong Zuo ◽

Changzhi Li

Keyword(s):

Structural Health Monitoring ◽

Health Monitoring ◽

Signal To Noise Ratio ◽

Structural Condition ◽

Signal To Noise ◽

Short Term ◽

Structural Health ◽

Noise Ratio ◽

Displacement Measurements

Researchers have made substantial efforts to improve the measurement of structural reciprocal motion using radars in the last years. However, the signal-to-noise ratio of the radar’s received signal still plays an important role for long-term monitoring of structures that are susceptible to excessive vibration. Although the prolonged monitoring of structural deflections may provide paramount information for the assessment of structural condition, most of the existing structural health monitoring (SHM) works did not consider the challenges to handle long-term displacement measurements when the signal-to-noise ratio of the measurement is low. This may cause discontinuities in the detected reciprocal motion and can result in wrong assessments during the data analyses. This paper introduces a novel approach that uses a wavelet-based multi-resolution analysis to correct short-term distortions in the calculated displacements even when previously proposed denoising techniques are not effective. Experimental results are presented to validate and demonstrate the feasibility of the proposed algorithm. The advantages and limitations of the proposed approach are also discussed.

Download Full-text

Robust speaker identification under noisy conditions using feature compensation and signal to noise ratio estimation

2016 IEEE 59th International Midwest Symposium on Circuits and Systems (MWSCAS) ◽

10.1109/mwscas.2016.7869973 ◽

2016 ◽

Cited By ~ 2

Author(s):

Megan N. Frankle ◽

Ravi P. Ramachandran

Keyword(s):

Speaker Identification ◽

Signal To Noise Ratio ◽

Ratio Estimation ◽

Signal To Noise ◽

Robust Speaker Identification ◽

Noisy Conditions ◽

Noise Ratio

Download Full-text

Machine learning applied to wind turbine blades impact detection

Wind Engineering ◽

10.1177/0309524x19849859 ◽

2019 ◽

Vol 44 (3) ◽

pp. 325-338 ◽

Cited By ~ 2

Author(s):

Congcong Hu ◽

Roberto Albertani

Keyword(s):

Support Vector Machine ◽

Predictive Model ◽

Signal To Noise Ratio ◽

Turbine Blades ◽

Environmental Benefits ◽

Support Vector ◽

Signal To Noise ◽

Impact Detection ◽

Species Vulnerability ◽

Noise Ratio

The significant development of wind power generation worldwide brings, together with environmental benefits, wildlife concerns, especially for volant species vulnerability to interactions with wind energy facilities. For surveying such events, an automatic system for continuous monitoring of blade collisions is critical. An onboard multi-senor system capable of providing real-time collision detection using integrated vibration sensors is developed and successfully tested. However, to detect low signal-to-noise ratio impact can be challenging; hence, an advanced impact detection method has been developed and presented in this article. A robust automated detection algorithm based on support vector machine is proposed. After a preliminary signal pre-processing, geometric features specifically selected for their sensitivity to impact signals are extracted from raw vibration signal and energy distribution graph. The predictive model is formulated by training conventional support vector machine using extracted features for impact identification. Finally, the performance of the predictive model is evaluated by accuracy, precision, and recall. Results indicate a linear regression relationship between signal-to-noise ratio and model overall performance. The proposed method is much reliable on higher signal-to-noise ratio [Formula: see text], but it shows to be ineffective at lower signal-to-noise ratio [Formula: see text].

Download Full-text

Automatic Wireless Signal Classification: A Neural-Induced Support Vector Machine-Based Approach

Information ◽

10.3390/info10110338 ◽

2019 ◽

Vol 10 (11) ◽

pp. 338

Author(s):

Arfan Haider Wahla ◽

Lan Chen ◽

Yali Wang ◽

Rong Chen

Keyword(s):

Support Vector Machine ◽

Classification Accuracy ◽

Signal To Noise Ratio ◽

Radio Spectrum ◽

Spectrum Management ◽

Support Vector ◽

Intermediate Step ◽

Signal To Noise ◽

Feature Based ◽

Noise Ratio

Automatic Classification of Wireless Signals (ACWS), which is an intermediate step between signal detection and demodulation, is investigated in this paper. ACWS plays a crucial role in several military and non-military applications, by identifying interference sources and adversary attacks, to achieve efficient radio spectrum management. The performance of traditional feature-based (FB) classification approaches is limited due to their specific input feature set, which in turn results in poor generalization under unknown conditions. Therefore, in this paper, a novel feature-based classifier Neural-Induced Support Vector Machine (NSVM) is proposed, in which the features are learned automatically from raw input signals using Convolutional Neural Networks (CNN). The output of NSVM is given by a Gaussian Support Vector Machine (SVM), which takes the features learned by CNN as its input. The proposed scheme NSVM is trained as a single architecture, and in this way, it learns to minimize a margin-based loss instead of cross-entropy loss. The proposed scheme NSVM outperforms the traditional softmax-based CNN modulation classifier by managing faster convergence of accuracy and loss curves during training. Furthermore, the robustness of the NSVM classifier is verified by extensive simulation experiments under the presence of several non-ideal real-world channel impairments over a range of signal-to-noise ratio (SNR) values. The performance of NSVM is remarkable in classifying wireless signals, such as at low signal-to-noise ratio (SNR), the overall averaged classification accuracy is > 97% at SNR = −2 dB and at higher SNR it achieves overall classification accuracy at > 99%, when SNR = 10 dB. In addition to that, the analytical comparison with other studies shows the performance of NSVM is superior over a range of settings.

Download Full-text

Automatic Phase‐Picking Method for Detecting Earthquakes Based on the Signal‐to‐Noise‐Ratio Concept

Seismological Research Letters ◽

10.1785/0220190043 ◽

2019 ◽

Vol 91 (1) ◽

pp. 334-342

Author(s):

Jihua Fu ◽

Xu Wang ◽

Zhitao Li ◽

Hao Meng ◽

Jianjun Wang ◽

...

Keyword(s):

Signal To Noise Ratio ◽

Ratio Method ◽

Mean Deviation ◽

Signal To Noise ◽

Short Term ◽

Low Snr ◽

Automatic Phase ◽

Noise Ratio ◽

Automatic Phase Picking

Abstract The automatic phase‐picking detection of earthquakes is a challenge under the background of big data and strong noise circumstances. The short‐term average/long‐term average (STA/LTA) ratio is widely used to detect earthquake due to its simplicity and robustness. However, STA/LTA‐based methods may not perform well with noisy data. Based on the signal‐to‐noise‐ratio (SNR) concept, a short‐term power/long‐term power (STP/LTP) ratio method is proposed. The characteristic function and the detection thresholds of the STP/LTP method are given physical meanings. Through a sample analysis, the STP/LTP detection results of both the P and S phases are better than the results of the STA/LTA by means of mean deviation, standard deviations, distributions of detection results, error rate, and missed rate on different SNR levels. In general, the STP/LTP method inherits the simple characteristics of the STA/LTA method, and it is suitable for phase picking of low‐SNR seismic data.

Download Full-text