Restricted Boltzmann Machine Vectors for Speaker Clustering and Tracking Tasks in TV Broadcast Shows

Umair Khan; Pooyan Safari; Javier Hernando

doi:10.3390/app9132761

Restricted Boltzmann Machine Vectors for Speaker Clustering and Tracking Tasks in TV Broadcast Shows

Applied Sciences ◽

10.3390/app9132761 ◽

2019 ◽

Vol 9 (13) ◽

pp. 2761 ◽

Cited By ~ 3

Author(s):

Umair Khan ◽

Pooyan Safari ◽

Javier Hernando

Keyword(s):

Speaker Verification ◽

Global Model ◽

Specific Information ◽

Restricted Boltzmann Machines ◽

Speaker Clustering ◽

Linear Discriminant ◽

Relative Improvement ◽

Front End ◽

Speaker Tracking ◽

Baseline System

Restricted Boltzmann Machines (RBMs) have shown success in both the front-end and backend of speaker verification systems. In this paper, we propose applying RBMs to the front-end for the tasks of speaker clustering and speaker tracking in TV broadcast shows. RBMs are trained to transform utterances into a vector based representation. Because of the lack of data for a test speaker, we propose RBM adaptation to a global model. First, the global model—which is referred to as universal RBM—is trained with all the available background data. Then an adapted RBM model is trained with the data of each test speaker. The visible to hidden weight matrices of the adapted models are concatenated along with the bias vectors and are whitened to generate the vector representation of speakers. These vectors, referred to as RBM vectors, were shown to preserve speaker-specific information and are used in the tasks of speaker clustering and speaker tracking. The evaluation was performed on the audio recordings of Catalan TV Broadcast shows. The experimental results show that our proposed speaker clustering system gained up to 12% relative improvement, in terms of Equal Impurity (EI), over the baseline system. On the other hand, in the task of speaker tracking, our system has a relative improvement of 11% and 7% compared to the baseline system using cosine and Probabilistic Linear Discriminant Analysis (PLDA) scoring, respectively.

Download Full-text

Front-end speech enhancement for commercial speaker verification systems

Speech Communication ◽

10.1016/j.specom.2018.03.008 ◽

2018 ◽

Vol 99 ◽

pp. 101-113 ◽

Cited By ~ 6

Author(s):

Sefik Emre Eskimez ◽

Peter Soufleris ◽

Zhiyao Duan ◽

Wendi Heinzelman

Keyword(s):

Speech Enhancement ◽

Speaker Verification ◽

Front End ◽

Verification Systems

Download Full-text

Front-End Factor Analysis For Speaker Verification

2018 International Conference on Communications (COMM) ◽

10.1109/iccomm.2018.8453731 ◽

2018 ◽

Author(s):

Florin Curelaru

Keyword(s):

Factor Analysis ◽

Speaker Verification ◽

Front End

Download Full-text

PLDA using Gaussian restricted boltzmann machines with application to speaker verification

10.21437/interspeech.2012-463 ◽

2012 ◽

Author(s):

Themos Stafylakis ◽

Patrick Kenny ◽

Mohammed Senoussaoui ◽

Pierre Dumouchel

Keyword(s):

Speaker Verification ◽

Restricted Boltzmann Machines ◽

Boltzmann Machines

Download Full-text

On the use of Total Variability and Probabilistic Linear Discriminant Analysis for Speaker Verification on Short Utterances

Advances in Speech and Language Technologies for Iberian Languages - Communications in Computer and Information Science ◽

10.1007/978-3-642-35292-8_2 ◽

2012 ◽

pp. 11-19 ◽

Cited By ~ 2

Author(s):

Javier González Domínguez ◽

Rubén Zazo ◽

Joaquin González-Rodríguez

Keyword(s):

Discriminant Analysis ◽

Linear Discriminant Analysis ◽

Speaker Verification ◽

Linear Discriminant ◽

Total Variability ◽

Short Utterances

Download Full-text

Regularized Within-Class Precision Matrix Based PLDA in Text-Dependent Speaker Verification

Applied Sciences ◽

10.3390/app10186571 ◽

2020 ◽

Vol 10 (18) ◽

pp. 6571 ◽

Cited By ~ 1

Author(s):

Sung-Hyun Yoon ◽

Jong-June Jeon ◽

Ha-Jin Yu

Keyword(s):

Conditional Independence ◽

Speaker Verification ◽

Equal Error Rate ◽

Estimation Errors ◽

Precision Matrix ◽

Linear Discriminant ◽

Selection Operator ◽

Independence Structure ◽

Empirical Covariance ◽

Text Dependent Speaker Verification

In the field of speaker verification, probabilistic linear discriminant analysis (PLDA) is the dominant method for back-end scoring. To estimate the PLDA model, the between-class covariance and within-class precision matrices must be estimated from samples. However, the empirical covariance/precision estimated from samples has estimation errors due to the limited number of samples available. In this paper, we propose a method to improve the conventional PLDA by estimating the PLDA model using the regularized within-class precision matrix. We use graphical least absolute shrinking and selection operator (GLASSO) for the regularization. The GLASSO regularization decreases the estimation errors in the empirical precision matrix by making the precision matrix sparse, which corresponds to the reflection of the conditional independence structure. The experimental results on text-dependent speaker verification reveal that the proposed method reduce the relative equal error rate by up to 23% compared with the conventional PLDA.

Download Full-text

Discriminatively trained Probabilistic Linear Discriminant Analysis for speaker verification

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2011.5947437 ◽

2011 ◽

Cited By ~ 58

Author(s):

Lukas Burget ◽

Oldrich Plchot ◽

Sandro Cumani ◽

Ondrej Glembek ◽

Pavel Matejka ◽

...

Keyword(s):

Discriminant Analysis ◽

Linear Discriminant Analysis ◽

Speaker Verification ◽

Linear Discriminant

Download Full-text

A fuzzy‐clustering‐based hierarchical i‐vector/probabilistic linear discriminant analysis system for text‐dependent speaker verification

Expert Systems ◽

10.1111/exsy.12496 ◽

2020 ◽

Vol 37 (3) ◽

Author(s):

Mohammad Azharuddin Laskar ◽

Rabul Hussain Laskar

Keyword(s):

Discriminant Analysis ◽

Linear Discriminant Analysis ◽

Fuzzy Clustering ◽

Speaker Verification ◽

Linear Discriminant ◽

Analysis System ◽

Text Dependent Speaker Verification

Download Full-text

Simultaneous Identification and Localization of Still and Mobile Speakers Based on Binaural Robot Audition

Journal of Robotics and Mechatronics ◽

10.20965/jrm.2017.p0059 ◽

2017 ◽

Vol 29 (1) ◽

pp. 59-71 ◽

Cited By ~ 2

Author(s):

Karim Youssef ◽

◽

Katsutoshi Itoyama ◽

Kazuyoshi Yoshii

Keyword(s):

Neural Networks ◽

Speaker Identification ◽

Position Estimation ◽

Acoustic Features ◽

Front End ◽

Robot Audition ◽

Speaker Tracking ◽

Common Signal ◽

Interaural Level Differences ◽

Simultaneous Identification

[abstFig src='/00290001/06.jpg' width='300' text='Efficient mobile speaker tracking' ] This paper jointly addresses the tasks of speaker identification and localization with binaural signals. The proposed system operates in noisy and echoic environments and involves limited computations. It demonstrates that a simultaneous identification and localization operation can benefit from a common signal processing front end for feature extraction. Moreover, a joint exploitation of the identity and position estimation outputs allows the outputs to limit each other’s errors. Equivalent rectangular bandwidth frequency cepstral coefficients (ERBFCC) and interaural level differences (ILD) are extracted. These acoustic features are respectively used for speaker identity and azimuth estimation through artificial neural networks (ANNs). The system was evaluated in simulated and real environments, with still and mobile speakers. Results demonstrate its ability to produce accurate estimations in the presence of noises and reflections. Moreover, the advantage of the binaural context over the monaural context for speaker identification is shown.

Download Full-text

Speaker Verification Under Degraded Conditions Using Empirical Mode Decomposition Based Voice Activity Detection Algorithm

Journal of Intelligent Systems ◽

10.1515/jisys-2013-0085 ◽

2014 ◽

Vol 23 (4) ◽

pp. 359-378

Author(s):

M. S. Rudramurthy ◽

V. Kamakshi Prasad ◽

R. Kumaraswamy

Keyword(s):

Speaker Recognition ◽

Speaker Verification ◽

Signal To Noise Ratio ◽

Gaussian Mixture ◽

Detection Algorithm ◽

Voice Activity Detection ◽

Activity Detection ◽

Front End ◽

Different Types ◽

Voice Activity

AbstractThe performance of most of the state-of-the-art speaker recognition (SR) systems deteriorates under degraded conditions, owing to mismatch between the training and testing sessions. This study focuses on the front end of the speaker verification (SV) system to reduce the mismatch between training and testing. An adaptive voice activity detection (VAD) algorithm using zero-frequency filter assisted peaking resonator (ZFFPR) was integrated into the front end of the SV system. The performance of this proposed SV system was studied under degraded conditions with 50 selected speakers from the NIST 2003 database. The degraded condition was simulated by adding different types of noises to the original speech utterances. The different types of noises were chosen from the NOISEX-92 database to simulate degraded conditions at signal-to-noise ratio levels from 0 to 20 dB. In this study, widely used 39-dimension Mel frequency cepstral coefficient (MFCC; i.e., 13-dimension MFCCs augmented with 13-dimension velocity and 13-dimension acceleration coefficients) features were used, and Gaussian mixture model–universal background model was used for speaker modeling. The proposed system’s performance was studied against the energy-based VAD used as the front end of the SV system. The proposed SV system showed some encouraging results when EMD-based VAD was used at its front end.

Download Full-text

Locally Weighted Linear Discriminant Analysis for Robust Speaker Verification

10.21437/interspeech.2017-581 ◽

2017 ◽

Cited By ~ 1

Author(s):

Abhinav Misra ◽

Shivesh Ranjan ◽

John H.L. Hansen

Keyword(s):

Discriminant Analysis ◽

Linear Discriminant Analysis ◽

Speaker Verification ◽

Linear Discriminant

Download Full-text