Discriminative Feature Extraction Based on Sequential Variational Autoencoder for Speaker Recognition

Author(s):  
Takenori Yoshimura ◽  
Natsumi Koike ◽  
Kei Hashimoto ◽  
Keiichiro Oura ◽  
Yoshihiko Nankaku ◽  
...  
Processes ◽  
2022 ◽  
Vol 10 (1) ◽  
pp. 122
Author(s):  
Yang Li ◽  
Fangyuan Ma ◽  
Cheng Ji ◽  
Jingde Wang ◽  
Wei Sun

Feature extraction plays a key role in fault detection methods. Most existing methods focus on comprehensive and accurate feature extraction of normal operation data to achieve better detection performance. However, discriminative features based on historical fault data are usually ignored. Aiming at this point, a global-local marginal discriminant preserving projection (GLMDPP) method is proposed for feature extraction. Considering its comprehensive consideration of global and local features, global-local preserving projection (GLPP) is used to extract the inherent feature of the data. Then, multiple marginal fisher analysis (MMFA) is introduced to extract the discriminative feature, which can better separate normal data from fault data. On the basis of fisher framework, GLPP and MMFA are integrated to extract inherent and discriminative features of the data simultaneously. Furthermore, fault detection methods based on GLMDPP are constructed and applied to the Tennessee Eastman (TE) process. Compared with the PCA and GLPP method, the effectiveness of the proposed method in fault detection is validated with the result of TE process.


Author(s):  
Musab T. S. Al-Kaltakchi ◽  
Haithem Abd Al-Raheem Taha ◽  
Mohanad Abd Shehab ◽  
Mohamed A.M. Abdullah

<p><span lang="EN-GB">In this paper, different feature extraction and feature normalization methods are investigated for speaker recognition. With a view to give a good representation of acoustic speech signals, Power Normalized Cepstral Coefficients (PNCCs) and Mel Frequency Cepstral Coefficients (MFCCs) are employed for feature extraction. Then, to mitigate the effect of linear channel, Cepstral Mean-Variance Normalization (CMVN) and feature warping are utilized. The current paper investigates Text-independent speaker identification system by using 16 coefficients from both the MFCCs and PNCCs features. Eight different speakers are selected from the GRID-Audiovisual database with two females and six males. The speakers are modeled using the coupling between the Universal Background Model and Gaussian Mixture Models (GMM-UBM) in order to get a fast scoring technique and better performance. The system shows 100% in terms of speaker identification accuracy. The results illustrated that PNCCs features have better performance compared to the MFCCs features to identify females compared to male speakers. Furthermore, feature wrapping reported better performance compared to the CMVN method. </span></p>


Sign in / Sign up

Export Citation Format

Share Document