speaker discrimination
Recently Published Documents


TOTAL DOCUMENTS

38
(FIVE YEARS 4)

H-INDEX

4
(FIVE YEARS 0)

2021 ◽  
Author(s):  
Danoush Hosseinzadeh

This work presents two hardware independent and ubiquitous biometric solutions that can significantly improve security for computer and telephone related applications. Firstly, for computer security, a GMM based keystroke verification method is proposed along with the up-up keystroke latency (UUKL) feature which is being used for the first time. This method can verify the identity of users based on their typing pattern and achieved a FAR of 5.1%, a FRR of 6.5%, and a EER of 5.8% for a database of 41 users. Due to many inconsistencies in previous works, a new keystroke protocol has also been proposed. This protocol makes a number of recommendations concerning how to improve performance, reliability, and accuracy of any keystroke recognition system. Secondly, a GMM based text-independent speaker identification scheme is also proposed that utilizes novel spectral features for better speaker discrimination. Based on 100 users from the TIMIT database, these features achieved an identification error of 1.22% by incorporating information about the source of the speech signal. This represents a 6% improvement over the MFCC based features.


2021 ◽  
Author(s):  
Danoush Hosseinzadeh

This work presents two hardware independent and ubiquitous biometric solutions that can significantly improve security for computer and telephone related applications. Firstly, for computer security, a GMM based keystroke verification method is proposed along with the up-up keystroke latency (UUKL) feature which is being used for the first time. This method can verify the identity of users based on their typing pattern and achieved a FAR of 5.1%, a FRR of 6.5%, and a EER of 5.8% for a database of 41 users. Due to many inconsistencies in previous works, a new keystroke protocol has also been proposed. This protocol makes a number of recommendations concerning how to improve performance, reliability, and accuracy of any keystroke recognition system. Secondly, a GMM based text-independent speaker identification scheme is also proposed that utilizes novel spectral features for better speaker discrimination. Based on 100 users from the TIMIT database, these features achieved an identification error of 1.22% by incorporating information about the source of the speech signal. This represents a 6% improvement over the MFCC based features.


PLoS ONE ◽  
2021 ◽  
Vol 16 (2) ◽  
pp. e0246645
Author(s):  
Julio Cesar Cavalcanti ◽  
Anders Eriksson ◽  
Plinio A. Barbosa

The purpose of this study was to explore the speaker-discriminatory potential of vowel formant mean frequencies in comparisons of identical twin pairs and non-genetically related speakers. The influences of lexical stress and the vowels’ acoustic distances on the discriminatory patterns of formant frequencies were also assessed. Acoustic extraction and analysis of the first four speech formants F1-F4 were carried out using spontaneous speech materials. The recordings comprise telephone conversations between identical twin pairs while being directly recorded through high-quality microphones. The subjects were 20 male adult speakers of Brazilian Portuguese (BP), aged between 19 and 35. As for comparisons, stressed and unstressed oral vowels of BP were segmented and transcribed manually in the Praat software. F1-F4 formant estimates were automatically extracted from the middle points of each labeled vowel. Formant values were represented in both Hertz and Bark. Comparisons within identical twin pairs using the Bark scale were performed to verify whether the measured differences would be potentially significant when following a psychoacoustic criterion. The results revealed consistent patterns regarding the comparison of low-frequency and high-frequency formants in twin pairs and non-genetically related speakers, with high-frequency formants displaying a greater speaker-discriminatory power compared to low-frequency formants. Among all formants, F4 seemed to display the highest discriminatory potential within identical twin pairs, followed by F3. As for non-genetically related speakers, both F3 and F4 displayed a similar high discriminatory potential. Regarding vowel quality, the central vowel /a/ was found to be the most speaker-discriminatory segment, followed by front vowels. Moreover, stressed vowels displayed a higher inter-speaker discrimination than unstressed vowels in both groups; however, the combination of stressed and unstressed vowels was found even more explanatory in terms of the observed differences. Although identical twins displayed a higher phonetic similarity, they were not found phonetically identical.


2021 ◽  
Vol 42 (1) ◽  
pp. 62-64
Author(s):  
Kanae Amino ◽  
Maki Shono ◽  
Takayuki Arai

Author(s):  
Sarah V. Stevenage ◽  
Rebecca Tomlin ◽  
Greg J. Neil ◽  
Ashley E. Symons

Abstract In the fields of face recognition and voice recognition, a growing literature now suggests that the ability to recognize an individual despite changes from one instance to the next is a considerable challenge. The present paper reports on one experiment in the voice domain designed to determine whether a change in the mere style of speech may result in a measurable difficulty when trying to discriminate between speakers. Participants completed a speaker discrimination task to pairs of speech clips, which represented either free speech or scripted speech segments. The results suggested that speaker discrimination was significantly better when the style of speech did not change compared to when it did change, and was significantly better from scripted than from free speech segments. These results support the emergent body of evidence suggesting that within-identity variability is a challenge, and the forensic implications of such a mild change in speech style are discussed.


2020 ◽  
Author(s):  
David Ben

Tracking an unknown and time-varying number of targets (e.g., speakers) in indoor environments using audio-visual (AV) modalities has received increasing interest in numerous fields including video conferencing, individual speaker discrimination, and human-computer interaction. The audio-visual sequential Monte Carlo probability hypothesis density (AV-SMC-PHD) filter is a popular baseline for multi-target tracking, offering an elegant framework for fusing audio-visual information and dealing with a varying number of speakers. However, the performance of this filter can be adversely affected by the weight degeneracy problem, where the weights of most of the particles may become very small, while only few remain significant, during the iteration of the algorithm. In this paper, we will short discuss the multi-target tracking.


Sign in / Sign up

Export Citation Format

Share Document