speaker discrimination Latest Research Papers

Signal processing for ubiquitous biometric systems

10.32920/ryerson.14644407.v1 ◽

2021 ◽

Author(s):

Danoush Hosseinzadeh

Keyword(s):

Computer Security ◽

Speaker Identification ◽

Recognition System ◽

Improve Performance ◽

Identification Scheme ◽

Verification Method ◽

Biometric Systems ◽

Speaker Discrimination ◽

First Time ◽

Timit Database

This work presents two hardware independent and ubiquitous biometric solutions that can significantly improve security for computer and telephone related applications. Firstly, for computer security, a GMM based keystroke verification method is proposed along with the up-up keystroke latency (UUKL) feature which is being used for the first time. This method can verify the identity of users based on their typing pattern and achieved a FAR of 5.1%, a FRR of 6.5%, and a EER of 5.8% for a database of 41 users. Due to many inconsistencies in previous works, a new keystroke protocol has also been proposed. This protocol makes a number of recommendations concerning how to improve performance, reliability, and accuracy of any keystroke recognition system. Secondly, a GMM based text-independent speaker identification scheme is also proposed that utilizes novel spectral features for better speaker discrimination. Based on 100 users from the TIMIT database, these features achieved an identification error of 1.22% by incorporating information about the source of the speech signal. This represents a 6% improvement over the MFCC based features.

Download Full-text

Signal processing for ubiquitous biometric systems

10.32920/ryerson.14644407 ◽

2021 ◽

Author(s):

Danoush Hosseinzadeh

Keyword(s):

Computer Security ◽

Speaker Identification ◽

Recognition System ◽

Improve Performance ◽

Identification Scheme ◽

Verification Method ◽

Biometric Systems ◽

Speaker Discrimination ◽

First Time ◽

Timit Database

This work presents two hardware independent and ubiquitous biometric solutions that can significantly improve security for computer and telephone related applications. Firstly, for computer security, a GMM based keystroke verification method is proposed along with the up-up keystroke latency (UUKL) feature which is being used for the first time. This method can verify the identity of users based on their typing pattern and achieved a FAR of 5.1%, a FRR of 6.5%, and a EER of 5.8% for a database of 41 users. Due to many inconsistencies in previous works, a new keystroke protocol has also been proposed. This protocol makes a number of recommendations concerning how to improve performance, reliability, and accuracy of any keystroke recognition system. Secondly, a GMM based text-independent speaker identification scheme is also proposed that utilizes novel spectral features for better speaker discrimination. Based on 100 users from the TIMIT database, these features achieved an identification error of 1.22% by incorporating information about the source of the speech signal. This represents a 6% improvement over the MFCC based features.

Download Full-text

Acoustic analysis of vowel formant frequencies in genetically-related and non-genetically related speakers with implications for forensic speaker comparison

PLoS ONE ◽

10.1371/journal.pone.0246645 ◽

2021 ◽

Vol 16 (2) ◽

pp. e0246645

Author(s):

Julio Cesar Cavalcanti ◽

Anders Eriksson ◽

Plinio A. Barbosa

Keyword(s):

High Frequency ◽

Acoustic Analysis ◽

Low Frequency ◽

Identical Twin ◽

Vowel Quality ◽

Male Adult ◽

Formant Frequencies ◽

Phonetic Similarity ◽

Vowel Formant ◽

Speaker Discrimination

The purpose of this study was to explore the speaker-discriminatory potential of vowel formant mean frequencies in comparisons of identical twin pairs and non-genetically related speakers. The influences of lexical stress and the vowels’ acoustic distances on the discriminatory patterns of formant frequencies were also assessed. Acoustic extraction and analysis of the first four speech formants F1-F4 were carried out using spontaneous speech materials. The recordings comprise telephone conversations between identical twin pairs while being directly recorded through high-quality microphones. The subjects were 20 male adult speakers of Brazilian Portuguese (BP), aged between 19 and 35. As for comparisons, stressed and unstressed oral vowels of BP were segmented and transcribed manually in the Praat software. F1-F4 formant estimates were automatically extracted from the middle points of each labeled vowel. Formant values were represented in both Hertz and Bark. Comparisons within identical twin pairs using the Bark scale were performed to verify whether the measured differences would be potentially significant when following a psychoacoustic criterion. The results revealed consistent patterns regarding the comparison of low-frequency and high-frequency formants in twin pairs and non-genetically related speakers, with high-frequency formants displaying a greater speaker-discriminatory power compared to low-frequency formants. Among all formants, F4 seemed to display the highest discriminatory potential within identical twin pairs, followed by F3. As for non-genetically related speakers, both F3 and F4 displayed a similar high discriminatory potential. Regarding vowel quality, the central vowel /a/ was found to be the most speaker-discriminatory segment, followed by front vowels. Moreover, stressed vowels displayed a higher inter-speaker discrimination than unstressed vowels in both groups; however, the combination of stressed and unstressed vowels was found even more explanatory in terms of the observed differences. Although identical twins displayed a higher phonetic similarity, they were not found phonetically identical.

Download Full-text

Auditory speaker discrimination using modal and falsetto speech

Acoustical Science and Technology ◽

10.1250/ast.42.62 ◽

2021 ◽

Vol 42 (1) ◽

pp. 62-64

Author(s):

Kanae Amino ◽

Maki Shono ◽

Takayuki Arai

Keyword(s):

Speaker Discrimination

Download Full-text

May I Speak Freely? The Difficulty in Vocal Identity Processing Across Free and Scripted Speech

Journal of Nonverbal Behavior ◽

10.1007/s10919-020-00348-w ◽

2020 ◽

Author(s):

Sarah V. Stevenage ◽

Rebecca Tomlin ◽

Greg J. Neil ◽

Ashley E. Symons

Keyword(s):

Face Recognition ◽

Free Speech ◽

Discrimination Task ◽

Voice Recognition ◽

Speaker Discrimination ◽

Forensic Implications ◽

Speech Segments ◽

The Voice ◽

Identity Processing

Abstract In the fields of face recognition and voice recognition, a growing literature now suggests that the ability to recognize an individual despite changes from one instance to the next is a considerable challenge. The present paper reports on one experiment in the voice domain designed to determine whether a change in the mere style of speech may result in a measurable difficulty when trying to discriminate between speakers. Participants completed a speaker discrimination task to pairs of speech clips, which represented either free speech or scripted speech segments. The results suggested that speaker discrimination was significantly better when the style of speech did not change compared to when it did change, and was significantly better from scripted than from free speech segments. These results support the emergent body of evidence suggesting that within-identity variability is a challenge, and the forensic implications of such a mild change in speech style are discussed.

Download Full-text

Speaker Discrimination in Humans and Machines: Effects of Speaking Style Variability

10.21437/interspeech.2020-3004 ◽

2020 ◽

Author(s):

Amber Afshan ◽

Jody Kreiman ◽

Abeer Alwan

Keyword(s):

Speaker Discrimination ◽

Speaking Style

Download Full-text

Quick Overview for Multi-target Tracking

10.31219/osf.io/brhu5 ◽

2020 ◽

Author(s):

David Ben

Keyword(s):

Target Tracking ◽

Visual Information ◽

Sequential Monte Carlo ◽

Indoor Environments ◽

Time Varying ◽

Probability Hypothesis Density ◽

Multi Target Tracking ◽

Speaker Discrimination ◽

Degeneracy Problem ◽

Individual Speaker

Tracking an unknown and time-varying number of targets (e.g., speakers) in indoor environments using audio-visual (AV) modalities has received increasing interest in numerous fields including video conferencing, individual speaker discrimination, and human-computer interaction. The audio-visual sequential Monte Carlo probability hypothesis density (AV-SMC-PHD) filter is a popular baseline for multi-target tracking, offering an elegant framework for fusing audio-visual information and dealing with a varying number of speakers. However, the performance of this filter can be adversely affected by the weight degeneracy problem, where the weights of most of the particles may become very small, while only few remain significant, during the iteration of the algorithm. In this paper, we will short discuss the multi-target tracking.

Download Full-text

Improving Speaker Discrimination of Target Speech Extraction With Time-Domain Speakerbeam

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp40776.2020.9054683 ◽

2020 ◽

Cited By ~ 1

Author(s):

Marc Delcroix ◽

Tsubasa Ochiai ◽

Katerina Zmolikova ◽

Keisuke Kinoshita ◽

Naohiro Tawara ◽

...

Keyword(s):

Time Domain ◽

Speaker Discrimination ◽

Speech Extraction

Download Full-text

Speaker discrimination: Citation tones vs. coarticulated tones

Speech Communication ◽

10.1016/j.specom.2019.06.006 ◽

2020 ◽

Vol 117 ◽

pp. 38-50

Author(s):

Ricky KW Chan

Keyword(s):

Speaker Discrimination

Download Full-text

Target and Non-target Speaker Discrimination by Humans and Machines

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2019.8683362 ◽

2019 ◽

Author(s):

Soo Jin Park ◽

Amber Afshan ◽

Jody Kreiman ◽

Gary Yeung ◽

Abeer Alwan

Keyword(s):

Speaker Discrimination ◽

Target Speaker

Download Full-text

speaker discrimination
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Signal processing for ubiquitous biometric systems

Signal processing for ubiquitous biometric systems

Acoustic analysis of vowel formant frequencies in genetically-related and non-genetically related speakers with implications for forensic speaker comparison

Auditory speaker discrimination using modal and falsetto speech

May I Speak Freely? The Difficulty in Vocal Identity Processing Across Free and Scripted Speech

Speaker Discrimination in Humans and Machines: Effects of Speaking Style Variability

Quick Overview for Multi-target Tracking

Improving Speaker Discrimination of Target Speech Extraction With Time-Domain Speakerbeam

Speaker discrimination: Citation tones vs. coarticulated tones

Target and Non-target Speaker Discrimination by Humans and Machines

Export Citation Format

speaker discriminationRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Signal processing for ubiquitous biometric systems

Signal processing for ubiquitous biometric systems

Acoustic analysis of vowel formant frequencies in genetically-related and non-genetically related speakers with implications for forensic speaker comparison

Auditory speaker discrimination using modal and falsetto speech

May I Speak Freely? The Difficulty in Vocal Identity Processing Across Free and Scripted Speech

Speaker Discrimination in Humans and Machines: Effects of Speaking Style Variability

Quick Overview for Multi-target Tracking

Improving Speaker Discrimination of Target Speech Extraction With Time-Domain Speakerbeam

Speaker discrimination: Citation tones vs. coarticulated tones

Target and Non-target Speaker Discrimination by Humans and Machines

speaker discrimination
Recently Published Documents