Speaker Recognition Using Wavelet Cepstral Coefficient, I-Vector, and Cosine Distance Scoring and Its Application for Forensics

Journal of Electrical and Computer Engineering ◽

10.1155/2016/4908412 ◽

2016 ◽

Vol 2016 ◽

pp. 1-11 ◽

Cited By ~ 2

Author(s):

Lei Lei ◽

She Kun

Keyword(s):

Speaker Recognition ◽

Ambient Noise ◽

Spectral Feature ◽

Time Cost ◽

Linear Discriminant ◽

Recognition Model ◽

The Troubles ◽

Proposed Model ◽

Robust Speaker Recognition ◽

Cosine Distance

An important application of speaker recognition is forensics. However, the accuracy of speaker recognition in forensic cases often drops off rapidly because of the ill effect of ambient noise, variable channel, different duration of speech data, and so on. Therefore, finding a robust speaker recognition model is very important for forensics. This paper builds a new speaker recognition model based on wavelet cepstral coefficient (WCC), i-vector, and cosine distance scoring (CDS). This model firstly uses the WCC to transform the speech into spectral feature vecors and then uses those spectral feature vectors to train the i-vectors that represent the speeches having different durations. CDS is used to compare the i-vectors to give out the evidence. Moreover, linear discriminant analysis (LDA) and the within-class covariance normalization (WCNN) are added to the CDS algorithm to deal with the channel variability problem. Finally, the likelihood ratio estimates the strength of the evidence. We use the TIMIT database to evaluate the performance of the proposed model. The experimental results show that the proposed model can effectively solve the troubles of forensic scenario, but the time cost of the method is high.

Download Full-text

Speaker Recognition Using Wavelet Packet Entropy, I-Vector, and Cosine Distance Scoring

Journal of Electrical and Computer Engineering ◽

10.1155/2017/1735698 ◽

2017 ◽

Vol 2017 ◽

pp. 1-9 ◽

Cited By ~ 2

Author(s):

Lei Lei ◽

She Kun

Keyword(s):

Speaker Recognition ◽

Wavelet Packet ◽

Time Cost ◽

Noisy Environment ◽

Proposed Model ◽

Speech Database ◽

Recognition Result ◽

Spectrum Feature ◽

The Difference ◽

Cosine Distance

Today, more and more people have benefited from the speaker recognition. However, the accuracy of speaker recognition often drops off rapidly because of the low-quality speech and noise. This paper proposed a new speaker recognition model based on wavelet packet entropy (WPE), i-vector, and cosine distance scoring (CDS). In the proposed model, WPE transforms the speeches into short-term spectrum feature vectors (short vectors) and resists the noise. I-vector is generated from those short vectors and characterizes speech to improve the recognition accuracy. CDS fast compares with the difference between two i-vectors to give out the recognition result. The proposed model is evaluated by TIMIT speech database. The results of the experiments show that the proposed model can obtain good performance in clear and noisy environment and be insensitive to the low-quality speech, but the time cost of the model is high. To reduce the time cost, the parallel computation is used.

Download Full-text

Curriculum Learning Based Probabilistic Linear Discriminant Analysis for Noise Robust Speaker Recognition

10.21437/interspeech.2017-1199 ◽

2017 ◽

Cited By ~ 3

Author(s):

Shivesh Ranjan ◽

Abhinav Misra ◽

John H.L. Hansen

Keyword(s):

Discriminant Analysis ◽

Linear Discriminant Analysis ◽

Speaker Recognition ◽

Linear Discriminant ◽

Robust Speaker Recognition ◽

Noise Robust ◽

Noise Robust Speaker Recognition

Download Full-text

Towards noise-robust speaker recognition using probabilistic linear discriminant analysis

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2012.6288858 ◽

2012 ◽

Cited By ~ 44

Author(s):

Yun Lei ◽

Lukas Burget ◽

Luciana Ferrer ◽

Martin Graciarena ◽

Nicolas Scheffer

Keyword(s):

Discriminant Analysis ◽

Linear Discriminant Analysis ◽

Speaker Recognition ◽

Linear Discriminant ◽

Robust Speaker Recognition ◽

Noise Robust ◽

Noise Robust Speaker Recognition

Download Full-text

Learning to Fool the Speaker Recognition

ACM Transactions on Multimedia Computing Communications and Applications ◽

10.1145/3468673 ◽

2021 ◽

Vol 17 (3s) ◽

pp. 1-21

Author(s):

Jiguo Li ◽

Xinfeng Zhang ◽

Jizheng Xu ◽

Siwei Ma ◽

Wen Gao

Keyword(s):

Success Rate ◽

Speaker Recognition ◽

State Of The Art ◽

Low Frequency ◽

Recognition System ◽

Perceptual Quality ◽

Recognition Model ◽

Proposed Model ◽

Adversarial Attack ◽

Ablation Study

Due to the widespread deployment of fingerprint/face/speaker recognition systems, the risk in these systems, especially the adversarial attack, has drawn increasing attention in recent years. Previous researches mainly studied the adversarial attack to the vision-based systems, such as fingerprint and face recognition. While the attack for speech-based systems has not been well studied yet, although it has been widely used in our daily life. In this article, we attempt to fool the state-of-the-art speaker recognition model and present speaker recognition attacker , a lightweight multi-layer convolutional neural network to fool the well-trained state-of-the-art speaker recognition model by adding imperceptible perturbations onto the raw speech waveform. We find that the speaker recognition system is vulnerable to the adversarial attack, and achieve a high success rate on both the non-targeted attack and targeted attack. Besides, we present an effective method by leveraging a pretrained phoneme recognition model to optimize the speaker recognition attacker to obtain a tradeoff between the attack success rate and the perceptual quality. Experimental results on the TIMIT and LibriSpeech datasets demonstrate the effectiveness and efficiency of our proposed model. And the experiments for frequency analysis indicate that high-frequency attack is more effective than low-frequency attack, which is different from the conclusion drawn in previous image-based works. Additionally, the ablation study gives more insights into our model.

Download Full-text

Robust speaker recognition based on level-building voice activity detection

JOURNAL OF SHENZHEN UNIVERSITY SCIENCE AND ENGINEERING ◽

10.3724/sp.j.1249.2012.04328 ◽

2012 ◽

Vol 29 (4) ◽

pp. 328-334

Author(s):

Yan-lu XIE ◽

Jing-song ZHANG ◽

Ming-hui LIU ◽

Zhong-wei HUANG

Keyword(s):

Speaker Recognition ◽

Voice Activity Detection ◽

Activity Detection ◽

Robust Speaker Recognition ◽

Level Building ◽

Voice Activity

Download Full-text

On the use of X-vectors for Robust Speaker Recognition

10.21437/odyssey.2018-24 ◽

2018 ◽

Cited By ~ 7

Author(s):

Ondřej Novotný ◽

Oldřich Plchot ◽

Pavel Matějka ◽

Ladislav Mošner ◽

Ondřej Glembek

Keyword(s):

Speaker Recognition ◽

Robust Speaker Recognition

Download Full-text

Duration weighted Gaussian Mixture Model supervector modeling for robust speaker recognition

2013 Ninth International Conference on Natural Computation (ICNC) ◽

10.1109/icnc.2013.6817977 ◽

2013 ◽

Author(s):

Zhe Ji ◽

Wei Hou ◽

Xin Jin ◽

Zhi-Yi Li

Keyword(s):

Gaussian Mixture Model ◽

Mixture Model ◽

Speaker Recognition ◽

Gaussian Mixture ◽

Robust Speaker Recognition

Download Full-text

A Two-Stage Exon Recognition Model Based on Synergetic Neural Network

Computational and Mathematical Methods in Medicine ◽

10.1155/2014/503132 ◽

2014 ◽

Vol 2014 ◽

pp. 1-7 ◽

Cited By ~ 4

Author(s):

Zhehuang Huang ◽

Yidong Chen

Keyword(s):

Neural Network ◽

Digital Signal ◽

Two Stage ◽

Recognition Model ◽

Artificial Fish Swarm Algorithm ◽

Proposed Model ◽

Artificial Fish Swarm ◽

Recognition Efficiency ◽

Processing Techniques ◽

Signal Processing Techniques

Exon recognition is a fundamental task in bioinformatics to identify the exons of DNA sequence. Currently, exon recognition algorithms based on digital signal processing techniques have been widely used. Unfortunately, these methods require many calculations, resulting in low recognition efficiency. In order to overcome this limitation, a two-stage exon recognition model is proposed and implemented in this paper. There are three main works. Firstly, we use synergetic neural network to rapidly determine initial exon intervals. Secondly, adaptive sliding window is used to accurately discriminate the final exon intervals. Finally, parameter optimization based on artificial fish swarm algorithm is used to determine different species thresholds and corresponding adjustment parameters of adaptive windows. Experimental results show that the proposed model has better performance for exon recognition and provides a practical solution and a promising future for other recognition tasks.

Download Full-text