Speaker verification using heterogeneous neural network architecture with linear correlation speech activity detection

In speech technology, a pivotal role is being played by the Speaker diarization mechanism. In general, speaker diarization is the mechanism of partitioning the input audio stream into homogeneous segments based on the identity of the speakers. The automatic transcription readability can be improved with the speaker diarization as it is good in recognizing the audio stream into the speaker turn and often provides the true speaker identity. In this research work, a novel speaker diarization approach is introduced under three major phases: Feature Extraction, Speech Activity Detection (SAD), and Speaker Segmentation and Clustering process. Initially, from the input audio stream (Telugu language) collected, the Mel Frequency Cepstral coefficient (MFCC) based features are extracted. Subsequently, in Speech Activity Detection (SAD), the music and silence signals are removed. Then, the acquired speech signals are segmented for each individual speaker. Finally, the segmented signals are subjected to the speaker clustering process, where the Optimized Convolutional Neural Network (CNN) is used. To make the clustering more appropriate, the weight and activation function of CNN are fine-tuned by a new Self Adaptive Sea Lion Algorithm (SA-SLnO). Finally, a comparative analysis is made to exhibit the superiority of the proposed speaker diarization work. Accordingly, the accuracy of the proposed method is 0.8073, which is 5.255, 2.45%, and 0.075, superior to the existing works.

Download Full-text

Semi-supervised speech activity detection with an application to automatic speaker verification

Computer Speech & Language ◽

10.1016/j.csl.2017.07.005 ◽

2018 ◽

Vol 47 ◽

pp. 132-156 ◽

Cited By ~ 12

Author(s):

Alexey Sholokhov ◽

Md Sahidullah ◽

Tomi Kinnunen

Keyword(s):

Speaker Verification ◽

Activity Detection ◽

Speech Activity ◽

Speech Activity Detection

Download Full-text

Statistical and Neural Network Based Speech Activity Detection in Non-Stationary Acoustic Environments

10.21437/interspeech.2020-1252 ◽

2020 ◽

Author(s):

Jens Heitkaemper ◽

Joerg Schmalenstroeer ◽

Reinhold Haeb-Umbach

Keyword(s):

Neural Network ◽

Activity Detection ◽

Speech Activity ◽

Acoustic Environments ◽

Speech Activity Detection

Download Full-text

A Resting State fMRI Study on The Functional Connectivity, Neural Network Architecture and Neural Network Properties of PTSD

PsycEXTRA Dataset ◽

10.1037/e533652013-471 ◽

2012 ◽

Author(s):

Xiaodan Yan ◽

Charles Marmar

Keyword(s):

Neural Network ◽

Functional Connectivity ◽

Resting State ◽

Network Architecture ◽

Resting State Fmri ◽

Neural Network Architecture ◽

Fmri Study ◽

Network Properties

Download Full-text

SCORING MODELING BASED ON NEURAL NETWORKS FOR DETERMINING A BANK BORROWER'S RATING

Economy of Ukraine ◽

10.15407/economyukr.2020.10.054 ◽

2020 ◽

Vol 2020 (10) ◽

pp. 54-62

Author(s):

Oleksii VASYLIEV ◽

Keyword(s):

Neural Network ◽

Neural Networks ◽

Network Architecture ◽

Statistical Data ◽

Activation Function ◽

Decision Making Process ◽

Neural Network Architecture ◽

Acceptable Accuracy ◽

The Neural Network ◽

Sigmoid Activation Function

The problem of applying neural networks to calculate ratings used in banking in the decision-making process on granting or not granting loans to borrowers is considered. The task is to determine the rating function of the borrower based on a set of statistical data on the effectiveness of loans provided by the bank. When constructing a regression model to calculate the rating function, it is necessary to know its general form. If so, the task is to calculate the parameters that are included in the expression for the rating function. In contrast to this approach, in the case of using neural networks, there is no need to specify the general form for the rating function. Instead, certain neural network architecture is chosen and parameters are calculated for it on the basis of statistical data. Importantly, the same neural network architecture can be used to process different sets of statistical data. The disadvantages of using neural networks include the need to calculate a large number of parameters. There is also no universal algorithm that would determine the optimal neural network architecture. As an example of the use of neural networks to determine the borrower's rating, a model system is considered, in which the borrower's rating is determined by a known non-analytical rating function. A neural network with two inner layers, which contain, respectively, three and two neurons and have a sigmoid activation function, is used for modeling. It is shown that the use of the neural network allows restoring the borrower's rating function with quite acceptable accuracy.

Download Full-text