Addressing Text-Dependent Speaker Verification Using Singing Speech

Abstract In this paper, a brief summary of the author’s research in the field of the contour-based telephone speech Endpoint Detection (ED) is presented. This research includes: development of new robust features for ED – the Mean-Delta feature and the Group Delay Mean-Delta feature and estimation of the effect of the analyzed ED features and two additional features in the Dynamic Time Warping fixed-text speaker verification task with short noisy telephone phrases in Bulgarian language.

Download Full-text

Telephone Speech Endpoint Detection using Mean-Delta Feature

Cybernetics and Information Technologies ◽

10.2478/cait-2014-0025 ◽

2014 ◽

Vol 14 (2) ◽

pp. 127-139 ◽

Cited By ~ 3

Author(s):

Atanas Ouzounov

Keyword(s):

Dynamic Time Warping ◽

Speaker Verification ◽

Verification Task ◽

Endpoint Detection ◽

Time Warping ◽

Energy Entropy ◽

Telephone Speech ◽

Dynamic Time ◽

Teager Energy ◽

Speech Endpoint Detection

Abstract In the study the efficiency of three features for trajectory-based endpoint detection is experimentally evaluated in the fixed-text Dynamic Time Warping (DTW) - a based speaker verification task with short phrases of telephone speech. The employed features are Modified Teager Energy (MTE), Energy-Entropy (EE) feature and Mean-Delta (MD) feature. The utterance boundaries in the endpoint detector are provided by means of state automaton and a set of thresholds based only on trajectory characteristics. The training and testing have been done with noisy telephone speech (short phrases in Bulgarian language with length of about 2 s) selected from BG-SRDat corpus. The results of the experiments have shown that the MD feature demonstrates the best performance in the endpoint detection tests in terms of the verification rate.

Download Full-text

Building Sequence Kernels for Speaker Verification and Word Recognition

Intelligent Information Technologies ◽

10.4018/978-1-59904-941-0.ch033 ◽

2011 ◽

pp. 575-589

Author(s):

Vincent Wan

Keyword(s):

Speech Recognition ◽

Speech Processing ◽

Kernel Methods ◽

Speaker Recognition ◽

Dynamic Time Warping ◽

Speaker Verification ◽

Dimensional Space ◽

Time Warping ◽

Recognition Systems ◽

Dynamic Time

This chapter describes the adaptation and application of kernel methods for speech processing. It is divided into two sections dealing with speaker verification and isolated-word speech recognition applications. Significant advances in kernel methods have been realised in the field of speaker verification, particularly relating to the direct scoring of variable-length speech utterances by sequence kernel SVMs. The improvements are so substantial that most state-of-the-art speaker recognition systems now incorporate SVMs. We describe the architecture of some of these sequence kernels. Speech recognition presents additional challenges to kernel methods and their application in this area is not as straightforward as for speaker verification. We describe a sequence kernel that uses dynamic time warping to capture temporal information within the kernel directly. The formulation also extends the standard dynamic time-warping algorithm by enabling the dynamic alignment to be computed in a high-dimensional space induced by a kernel function. This kernel is shown to work well in an application for recognising low-intelligibility speech of severely dysarthric individuals.

Download Full-text

Building Sequence Kernels for Speaker Verification and Word Recognition

Kernel Methods in Bioengineering, Signal and Image Processing ◽

10.4018/978-1-59904-042-4.ch010 ◽

2011 ◽

pp. 246-262

Author(s):

Vincent Wan

Keyword(s):

Speech Recognition ◽

Speech Processing ◽

Kernel Methods ◽

Speaker Recognition ◽

Dynamic Time Warping ◽

Speaker Verification ◽

Dimensional Space ◽

Time Warping ◽

Recognition Systems ◽

Dynamic Time

This chapter describes the adaptation and application of kernel methods for speech processing. It is divided into two sections dealing with speaker verification and isolated-word speech recognition applications. Significant advances in kernel methods have been realised in the field of speaker verification, particularly relating to the direct scoring of variable-length speech utterances by sequence kernel SVMs. The improvements are so substantial that most state-of-the-art speaker recognition systems now incorporate SVMs. We describe the architecture of some of these sequence kernels. Speech recognition presents additional challenges to kernel methods and their application in this area is not as straightforward as for speaker verification. We describe a sequence kernel that uses dynamic time warping to capture temporal information within the kernel directly. The formulation also extends the standard dynamic time-warping algorithm by enabling the dynamic alignment to be computed in a high-dimensional space induced by a kernel function. This kernel is shown to work well in an application for recognising low-intelligibility speech of severely dysarthric individuals.

Download Full-text

Short-time speaker verification with different speaking style utterances

PLoS ONE ◽

10.1371/journal.pone.0241809 ◽

2020 ◽

Vol 15 (11) ◽

pp. e0241809

Author(s):

Hongwei Mao ◽

Yan Shi ◽

Yue Liu ◽

Linqiang Wei ◽

Yijie Li ◽

...

Keyword(s):

State Of The Art ◽

Speaker Verification ◽

Feature Space ◽

Gaussian Mixture ◽

Normal Reading ◽

Great Progress ◽

Short Time ◽

Speaking Style ◽

Target Speaker ◽

Made In

In recent years, great progress has been made in the technical aspects of automatic speaker verification (ASV). However, the promotion of ASV technology is still a very challenging issue, because most technologies are still very sensitive to new, unknown and spoofing conditions. Most previous studies focused on extracting target speaker information from natural speech. This paper aims to design a new ASV corpus with multi-speaking styles and investigate the ASV robustness to these different speaking styles. We first release this corpus in the Zenodo website for public research, in which each speaker has several text-dependent and text-independent singing, humming and normal reading speech utterances. Then, we investigate the speaker discrimination of each speaking style in the feature space. Furthermore, the intra and inter-speaker variabilities in each different speaking style and cross-speaking styles are investigated in both text-dependent and text-independent ASV tasks. Conventional Gaussian Mixture Model (GMM), and the state-of-the-art x-vector are used to build ASV systems. Experimental results show that the voiceprint information in humming and singing speech are more distinguishable than that in normal reading speech for conventional ASV systems. Furthermore, we find that combing the three speaking styles can significantly improve the x-vector based ASV system, even when only limited gains are obtained by conventional GMM-based systems.

Download Full-text

LTSD and GDMD features for Telephone Speech Endpoint Detection

Cybernetics and Information Technologies ◽

10.1515/cait-2017-0045 ◽

2017 ◽

Vol 17 (4) ◽

pp. 114-133

Author(s):

Atanas Ouzounov

Keyword(s):

Dynamic Time Warping ◽

Group Delay ◽

Speaker Verification ◽

Endpoint Detection ◽

Time Warping ◽

Adaptive Thresholds ◽

Telephone Speech ◽

Dynamic Time ◽

Speech Endpoint Detection

AbstractThis paper proposes a new contour-based speech endpoint detector which combines the log-Group Delay Mean-Delta (log-GDMD) feature, an adaptive twothreshold scheme and an eight-state automaton. The adaptive thresholds scheme uses two pairs of thresholds - for the starting and for the ending points, respectively. Each pair of thresholds is calculated by using the contour characteristics in the corresponded region of the utterance. The experimental results have shown that the proposed detector demonstrates better performance compared to the Long-Term Spectral Divergence (LTSD) one in terms of endpoint accuracy. Additional fixed-text speaker verification tests with short phrases of telephone speech based on the Dynamic Time Warping (DTW) and left-to-right Hidden Markov Model (HMM) frameworks confirm the improvements of the verification rate due to the better endpoint accuracy.

Download Full-text

A computational analysis on similarities and dissimilarities of Acem Kurdi, Kurdi and Muhayyer Kurdi makams in Turkish music

African Educational Research Journal ◽

10.30918/aerj.84.20.158 ◽

2020 ◽

Vol 8 (4) ◽

pp. 761-773

Author(s):

Naciye Hardalaç ◽

Keyword(s):

Dynamic Time Warping ◽

Computational Analysis ◽

Pattern Search ◽

The Other ◽

Time Dependent ◽

Time Warping ◽

Dynamic Time ◽

The One ◽

Distinguishing Features ◽

Time Dependent Analysis

In Turkish music, it is possible to find different makams sharing the same core scale of notes. The subjects of this study are three such makams, namely Acem Küdri, Kürdi, Muhayyer Kürdi. We use computational analysis based on histograms, pattern search and dynamic time warping to reveal the similarities and dissimilarities of these three makams. On the one hand, our results show that a time independent histogram analysis is unable to properly highlight the differences between different makams. On the other hand, our study also reveals that a time dependent analysis is well suited for the identification of their distinguishing features. In particular, the application of a specialized dynamic time warping technique leads to the establishment of low correlation between these makams.

Download Full-text

Design of speaker verification systems with the use of an algorithm of Dynamic Time Warping (DTW)

Pattern Recognition and Image Analysis ◽

10.1134/s1054661807040050 ◽

2007 ◽

Vol 17 (4) ◽

pp. 470-479

Author(s):

V. V. Geppener ◽

K. K. Simonchik ◽

A. S. Haidar

Keyword(s):

Dynamic Time Warping ◽

Speaker Verification ◽

Time Warping ◽

Dynamic Time ◽

Verification Systems

Download Full-text