Self-Attentive Multi-Layer Aggregation with Feature Recalibration and Deep Length Normalization for Text-Independent Speaker Verification System

Soonshin Seo; Ji-Hwan Kim

doi:10.3390/electronics9101706

Self-Attentive Multi-Layer Aggregation with Feature Recalibration and Deep Length Normalization for Text-Independent Speaker Verification System

Electronics ◽

10.3390/electronics9101706 ◽

2020 ◽

Vol 9 (10) ◽

pp. 1706

Author(s):

Soonshin Seo ◽

Ji-Hwan Kim

Keyword(s):

State Of The Art ◽

Speaker Verification ◽

Model Parameters ◽

Equal Error Rate ◽

Layer Depth ◽

Verification System ◽

Evaluation Dataset ◽

Representational Power ◽

Fully Connected ◽

Text Independent Speaker Verification

One of the most important parts of a text-independent speaker verification system is speaker embedding generation. Previous studies demonstrated that shortcut connections-based multi-layer aggregation improves the representational power of a speaker embedding system. However, model parameters are relatively large in number, and unspecified variations increase in the multi-layer aggregation. Therefore, in this study, we propose a self-attentive multi-layer aggregation with feature recalibration and deep length normalization for a text-independent speaker verification system. To reduce the number of model parameters, we set the ResNet with the scaled channel width and layer depth as a baseline. To control the variability in the training, we apply a self-attention mechanism to perform multi-layer aggregation with dropout regularizations and batch normalizations. Subsequently, we apply a feature recalibration layer to the aggregated feature using fully-connected layers and nonlinear activation functions. Further, deep length normalization is used on a recalibrated feature in the training process. Experimental results using the VoxCeleb1 evaluation dataset showed that the performance of the proposed methods was comparable to that of state-of-the-art models (equal error rate of 4.95% and 2.86%, using the VoxCeleb1 and VoxCeleb2 training datasets, respectively).

Download Full-text

Speaker verification system based on articulatory information from ultrasound recordings

DYNA ◽

10.15446/dyna.v87n213.81772 ◽

2020 ◽

Vol 87 (213) ◽

pp. 9-16

Author(s):

Franklin Alexander Sepulveda Sepulveda ◽

Dagoberto Porras-Plata ◽

Milton Sarria-Paja

Keyword(s):

State Of The Art ◽

Speaker Verification ◽

Environmental Noise ◽

Speech Signals ◽

Acoustic Information ◽

Current State ◽

Verification System ◽

Vocal Effort ◽

Ultrasound System ◽

Verification Systems

Current state-of-the-art speaker verification (SV) systems are known to be strongly affected by unexpected variability presented during testing, such as environmental noise or changes in vocal effort. In this work, we analyze and evaluate articulatory information of the tongue's movement as a means to improve the performance of speaker verification systems. We use a Spanish database, where besides the speech signals, we also include articulatory information that was acquired with an ultrasound system. Two groups of features are proposed to represent the articulatory information, and the obtained performance is compared to an SV system trained only with acoustic information. Our results show that the proposed features contain highly discriminative information, and they are related to speaker identity; furthermore, these features can be used to complement and improve existing systems by combining such information with cepstral coefficients at the feature level.

Download Full-text

Retracted: Feature Selection Using Ant Colony Optimization for Text-Independent Speaker Verification System

Advances in Computation and Intelligence - Lecture Notes in Computer Science ◽

10.1007/978-3-642-16493-4_2 ◽

2010 ◽

pp. 13-24

Author(s):

Javad Sohafi-Bonab ◽

Mehdi Hosseinzadeh Aghdam

Keyword(s):

Feature Selection ◽

Ant Colony Optimization ◽

Speaker Verification ◽

Ant Colony ◽

Verification System ◽

Text Independent Speaker Verification

Download Full-text

A Text-Independent Speaker Verification System Based on Cross Entropy

Communications in Computer and Information Science - Computational Intelligence and Intelligent Systems ◽

10.1007/978-3-642-04962-0_48 ◽

2009 ◽

pp. 419-426

Author(s):

Xiaochun Lu ◽

Junxun Yin

Keyword(s):

Speaker Verification ◽

Cross Entropy ◽

Verification System ◽

Text Independent Speaker Verification

Download Full-text

Enhancement of a text-independent speaker verification system by using feature combination and parallel structure classifiers

Neural Computing and Applications ◽

10.1007/s00521-016-2470-x ◽

2016 ◽

Vol 29 (3) ◽

pp. 637-651 ◽

Cited By ~ 1

Author(s):

Kerlos Atia Abdalmalak ◽

Ascensión Gallardo-Antolín

Keyword(s):

Speaker Verification ◽

Parallel Structure ◽

Feature Combination ◽

Verification System ◽

Text Independent Speaker Verification

Download Full-text

Development and evaluation of online text-independent speaker verification system for remote person authentication

International Journal of Speech Technology ◽

10.1007/s10772-012-9160-6 ◽

2012 ◽

Vol 16 (1) ◽

pp. 75-88 ◽

Cited By ~ 16

Author(s):

Debmalya Chakrabarty ◽

S. R. Mahadeva Prasanna ◽

Rohan Kumar Das

Keyword(s):

Speaker Verification ◽

Verification System ◽

Text Independent Speaker Verification

Download Full-text

Optimalization of GMM for Text Independent Speaker Verification System

2008 18th International Conference Radioelektronika ◽

10.1109/radioelek.2008.4542690 ◽

2008 ◽

Cited By ~ 1

Author(s):

Peter Varchol ◽

Dusan Levicky ◽

Jozef Juhar

Keyword(s):

Speaker Verification ◽

Verification System ◽

Text Independent Speaker Verification

Download Full-text

Spoofing Speaker Verification System by Adversarial Examples Leveraging the Generalized Speaker Difference

Security and Communication Networks ◽

10.1155/2021/6664578 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Hongwei Luo ◽

Yijie Shen ◽

Feng Lin ◽

Guoai Xu

Keyword(s):

Neural Networks ◽

Loss Function ◽

Deep Neural Networks ◽

State Of The Art ◽

Speaker Verification ◽

Signal To Noise Ratio ◽

The State ◽

Verification System ◽

Adversarial Examples ◽

Human Hearing

Speaker verification system has gained great popularity in recent years, especially with the development of deep neural networks and Internet of Things. However, the security of speaker verification system based on deep neural networks has not been well investigated. In this paper, we propose an attack to spoof the state-of-the-art speaker verification system based on generalized end-to-end (GE2E) loss function for misclassifying illegal users into the authentic user. Specifically, we design a novel loss function to deploy a generator for generating effective adversarial examples with slight perturbation and then spoof the system with these adversarial examples to achieve our goals. The success rate of our attack can reach 82% when cosine similarity is adopted to deploy the deep-learning-based speaker verification system. Beyond that, our experiments also reported the signal-to-noise ratio at 76 dB, which proves that our attack has higher imperceptibility than previous works. In summary, the results show that our attack not only can spoof the state-of-the-art neural-network-based speaker verification system but also more importantly has the ability to hide from human hearing or machine discrimination.

Download Full-text

Emotional Variability Analysis Based I-Vector for Speaker Verification in Under-Stress Conditions

Electronics ◽

10.3390/electronics9091420 ◽

2020 ◽

Vol 9 (9) ◽

pp. 1420

Author(s):

Barlian Henryranu Prasetio ◽

Hiroki Tamura ◽

Koichi Tanno

Keyword(s):

State Of The Art ◽

Speaker Verification ◽

Vector System ◽

Verification Task ◽

Scoring Methods ◽

Equal Error Rate ◽

Variability Analysis ◽

Ablation Study ◽

Speech Segments ◽

Neutral Conditions

Emotional conditions cause changes in the speech production system. It produces the differences in the acoustical characteristics compared to neutral conditions. The presence of emotion makes the performance of a speaker verification system degrade. In this paper, we propose a speaker modeling that accommodates the presence of emotions on the speech segments by extracting a speaker representation compactly. The speaker model is estimated by following a similar procedure to the i-vector technique, but it considerate the emotional effect as the channel variability component. We named this method as the emotional variability analysis (EVA). EVA represents the emotion subspace separately to the speaker subspace, like the joint factor analysis (JFA) model. The effectiveness of the proposed system is evaluated by comparing it with the standard i-vector system in the speaker verification task of the Speech Under Simulated and Actual Stress (SUSAS) dataset with three different scoring methods. The evaluation focus in terms of the equal error rate (EER). In addition, we also conducted an ablation study for a more comprehensive analysis of the EVA-based i-vector. Based on experiment results, the proposed system outperformed the standard i-vector system and achieved state-of-the-art results in the verification task for the under-stressed speakers.

Download Full-text

State-of-the-Art Performance in Text-Independent Speaker Verification Through Open-Source Software

IEEE Transactions on Audio Speech and Language Processing ◽

10.1109/tasl.2007.902877 ◽

2007 ◽

Vol 15 (7) ◽

pp. 1960-1968 ◽

Cited By ~ 42

Author(s):

BenoÎt G. B. Fauve ◽

Driss Matrouf ◽

Nicolas Scheffer ◽

Jean-FranÇois Bonastre ◽

John S. D. Mason

Keyword(s):

Open Source ◽

Open Source Software ◽

State Of The Art ◽

Speaker Verification ◽

Art Performance ◽

Text Independent Speaker Verification

Download Full-text

Delta-MFCC Features and Information Theoretic Expectation Maximization based Text-independent Speaker Verification System

IETE Journal of Research ◽

10.4103/0377-2063.94075 ◽

2012 ◽

Vol 58 (1) ◽

pp. 5 ◽

Cited By ~ 2

Author(s):

Sheeraz Memon ◽

ImranAli Jokhio ◽

SanaHoor Arisar ◽

Margaret Lech ◽

Namunu Maddage

Keyword(s):

Expectation Maximization ◽

Speaker Verification ◽

Information Theoretic ◽

Verification System ◽

Text Independent Speaker Verification

Download Full-text