Communication system capable of improving a speech quality by classifying speech signals

Kazunori Ozaka

doi:10.1121/1.404132

Speech Enhancement for Secure Communication Using Coupled Spectral Subtraction and Wiener Filter

Electronics ◽

10.3390/electronics8080897 ◽

2019 ◽

Vol 8 (8) ◽

pp. 897 ◽

Cited By ~ 2

Author(s):

Hilman Pardede ◽

Kalamullah Ramli ◽

Yohan Suryanto ◽

Nur Hayati ◽

Alfan Presekal

Keyword(s):

Speech Enhancement ◽

Communication System ◽

Secure Communication ◽

Wiener Filter ◽

Speech Quality ◽

Spectral Subtraction ◽

Speech Signals ◽

Voice Activity Detector ◽

Noise Estimate

The encryption process for secure voice communication may degrade the speech quality when it is applied to the speech signals before encoding them through a conventional communication system such as GSM or radio trunking. This is because the encryption process usually includes a randomization of the speech signals, and hence, when the speech is decrypted, it may perceptibly be distorted, so satisfactory speech quality for communication is not achieved. To deal with this, we could apply a speech enhancement method to improve the quality of decrypted speech. However, many speech enhancement methods work by assuming noise is present all the time, so the voice activity detector (VAD) is applied to detect the non-speech period to update the noise estimate. Unfortunately, this assumption is not valid for the decrypted speech. Since the encryption process is applied only when speech is detected, distortions from the secure communication system are characteristically different. They exist when speech is present. Therefore, a noise estimator that is able to update noise even when speech is present is needed. However, most noise estimator techniques only adapt to slow changes of noise to avoid over-estimation of noise, making them unsuitable for this task. In this paper, we propose a speech enhancement technique to improve the quality of speech from secure communication. We use a combination of the Wiener filter and spectral subtraction for the noise estimator, so our method is better at tracking fast changes of noise without over-estimating them. Our experimental results on various communication channels indicate that our method is better than other popular noise estimators and speech enhancement methods.

Download Full-text

Regression-Based Noise Modeling for Speech Signal Processing

Fluctuation and Noise Letters ◽

10.1142/s021947752150022x ◽

2021 ◽

pp. 2150022

Author(s):

Caio Cesar Enside de Abreu ◽

Marco Aparecido Queiroz Duarte ◽

Bruno Rodrigues de Oliveira ◽

Jozue Vieira Filho ◽

Francisco Villarreal

Keyword(s):

Speech Enhancement ◽

Speech Processing ◽

Acoustic Analysis ◽

Voice Quality ◽

Wiener Filter ◽

Processing System ◽

Speech Quality ◽

Speech Signals ◽

Speech Signal Processing ◽

Acoustic Environment

Speech processing systems are very important in different applications involving speech and voice quality such as automatic speech recognition, forensic phonetics and speech enhancement, among others. In most of them, the acoustic environmental noise is added to the original signal, decreasing the signal-to-noise ratio (SNR) and the speech quality by consequence. Therefore, estimating noise is one of the most important steps in speech processing whether to reduce it before processing or to design robust algorithms. In this paper, a new approach to estimate noise from speech signals is presented and its effectiveness is tested in the speech enhancement context. For this purpose, partial least squares (PLS) regression is used to model the acoustic environment (AE) and a Wiener filter based on a priori SNR estimation is implemented to evaluate the proposed approach. Six noise types are used to create seven acoustically modeled noises. The basic idea is to consider the AE model to identify the noise type and estimate its power to be used in a speech processing system. Speech signals processed using the proposed method and classical noise estimators are evaluated through objective measures. Results show that the proposed method produces better speech quality than state-of-the-art noise estimators, enabling it to be used in real-time applications in the field of robotic, telecommunications and acoustic analysis.

Download Full-text

Machine Learning for Non-Intrusive Speech Quality Assessment

10.26686/wgtn.16985584 ◽

2021 ◽

Author(s):

◽

Mouna Hakami

Keyword(s):

Machine Learning ◽

Quality Assessment ◽

Unsupervised Learning ◽

Supervised Learning ◽

Latent Variable ◽

Generative Models ◽

Speech Quality ◽

Speech Signals ◽

Latent Space ◽

Speech Quality Assessment

This thesis presents two studies on non-intrusive speech quality assessment methods. The first applies supervised learning methods to speech quality assessment, which is a common approach in machine learning based quality assessment. To outperform existing methods, we concentrate on enhancing the feature set. In the second study, we analyse quality assessment from a different point of view inspired by the biological brain and present the first unsupervised learning based non-intrusive quality assessment that removes the need for labelled training data. Supervised learning based, non-intrusive quality predictors generally involve the development of a regressor that maps signal features to a representation of perceived quality. The performance of the predictor largely depends on 1) how sensitive the features are to the different types of distortion, and 2) how well the model learns the relation between the features and the quality score. We improve the performance of the quality estimation by enhancing the feature set and using a contemporary machine learning model that fits this objective. We propose an augmented feature set that includes raw features that are presumably redundant. The speech quality assessment system benefits from this redundancy as it results in reducing the impact of unwanted noise in the input. Feature set augmentation generally leads to the inclusion of features that have non-smooth distributions. We introduce a new pre-processing method and re-distribute the features to facilitate the training. The evaluation of the system on the ITU-T Supplement23 database illustrates that the proposed system outperforms the popular standards and contemporary methods in the literature. The unsupervised learning quality assessment approach presented in this thesis is based on a model that is learnt from clean speech signals. Consequently, it does not need to learn the statistics of any corruption that exists in the degraded speech signals and is trained only with unlabelled clean speech samples. The quality has a new definition, which is based on the divergence between 1) the distribution of the spectrograms of test signals, and 2) the pre-existing model that represents the distribution of the spectrograms of good quality speech. The distribution of the spectrogram of the speech is complex, and hence comparing them is not trivial. To tackle this problem, we propose to map the spectrograms of speech signals to a simple latent space. Generative models that map simple latent distributions into complex distributions are excellent platforms for our work. Generative models that are trained on the spectrograms of clean speech signals learned to map the latent variable $Z$ from a simple distribution $P_Z$ into a spectrogram $X$ from the distribution of good quality speech. Consequently, an inference model is developed by inverting the pre-trained generator, which maps spectrograms of the signal under the test, $X_t$, into its relevant latent variable, $Z_t$, in the latent space. We postulate the divergence between the distribution of the latent variable and the prior distribution $P_Z$ is a good measure of the quality of speech. Generative adversarial nets (GAN) are an effective training method and work well in this application. The proposed system is a novel application for a GAN. The experimental results with the TIMIT and NOIZEUS databases show that the proposed measure correlates positively with the objective quality scores.

Download Full-text

A New Method of Objective Speech Quality Assessment in Communication System

Journal of Multimedia ◽

10.4304/jmm.8.3.291-298 ◽

2013 ◽

Vol 8 (3) ◽

Cited By ~ 9

Author(s):

Weiwei Zhang ◽

Yongyu Chang ◽

Yitong Liu ◽

Leilei Xiao

Keyword(s):

Quality Assessment ◽

Communication System ◽

Speech Quality ◽

New Method ◽

Speech Quality Assessment ◽

Objective Speech Quality Assessment ◽

Objective Speech Quality

Download Full-text

Machine Learning for Non-Intrusive Speech Quality Assessment

10.26686/wgtn.16985584.v1 ◽

2021 ◽

Author(s):

◽

Mouna Hakami

Keyword(s):

Machine Learning ◽

Quality Assessment ◽

Unsupervised Learning ◽

Supervised Learning ◽

Latent Variable ◽

Generative Models ◽

Speech Quality ◽

Speech Signals ◽

Latent Space ◽

Speech Quality Assessment

This thesis presents two studies on non-intrusive speech quality assessment methods. The first applies supervised learning methods to speech quality assessment, which is a common approach in machine learning based quality assessment. To outperform existing methods, we concentrate on enhancing the feature set. In the second study, we analyse quality assessment from a different point of view inspired by the biological brain and present the first unsupervised learning based non-intrusive quality assessment that removes the need for labelled training data. Supervised learning based, non-intrusive quality predictors generally involve the development of a regressor that maps signal features to a representation of perceived quality. The performance of the predictor largely depends on 1) how sensitive the features are to the different types of distortion, and 2) how well the model learns the relation between the features and the quality score. We improve the performance of the quality estimation by enhancing the feature set and using a contemporary machine learning model that fits this objective. We propose an augmented feature set that includes raw features that are presumably redundant. The speech quality assessment system benefits from this redundancy as it results in reducing the impact of unwanted noise in the input. Feature set augmentation generally leads to the inclusion of features that have non-smooth distributions. We introduce a new pre-processing method and re-distribute the features to facilitate the training. The evaluation of the system on the ITU-T Supplement23 database illustrates that the proposed system outperforms the popular standards and contemporary methods in the literature. The unsupervised learning quality assessment approach presented in this thesis is based on a model that is learnt from clean speech signals. Consequently, it does not need to learn the statistics of any corruption that exists in the degraded speech signals and is trained only with unlabelled clean speech samples. The quality has a new definition, which is based on the divergence between 1) the distribution of the spectrograms of test signals, and 2) the pre-existing model that represents the distribution of the spectrograms of good quality speech. The distribution of the spectrogram of the speech is complex, and hence comparing them is not trivial. To tackle this problem, we propose to map the spectrograms of speech signals to a simple latent space. Generative models that map simple latent distributions into complex distributions are excellent platforms for our work. Generative models that are trained on the spectrograms of clean speech signals learned to map the latent variable $Z$ from a simple distribution $P_Z$ into a spectrogram $X$ from the distribution of good quality speech. Consequently, an inference model is developed by inverting the pre-trained generator, which maps spectrograms of the signal under the test, $X_t$, into its relevant latent variable, $Z_t$, in the latent space. We postulate the divergence between the distribution of the latent variable and the prior distribution $P_Z$ is a good measure of the quality of speech. Generative adversarial nets (GAN) are an effective training method and work well in this application. The proposed system is a novel application for a GAN. The experimental results with the TIMIT and NOIZEUS databases show that the proposed measure correlates positively with the objective quality scores.

Download Full-text

Assessment of Speech Quality during Speech Rehabilitation Based on the Solution of the Classification Problem

10.20944/preprints202112.0196.v1 ◽

2021 ◽

Author(s):

Evgeny Kostyuchenko ◽

Ivan Rakhmanenko ◽

Alexander Shelupanov ◽

Lidiya Balatskaya ◽

Ivan Sidorov

Keyword(s):

Neural Network ◽

Classification Problem ◽

Speech Quality ◽

Speech Signals ◽

Reference Class ◽

Experimental Assessment ◽

Speech Rehabilitation ◽

Expert Assessments

The article considers an approach to the problem of assessing the quality of speech during speech rehabilitation as a classification problem. For this, a classifier is built on the basis of an LSTM neural network for dividing speech signals into two classes: before the operation and immediately after. At the same time, speech before the operation is the standard to which it is necessary to approach in the process of rehabilitation. The metric of belonging of the evaluated signal to the reference class acts as an assessment of speech. An experimental assessment of rehabilitation sessions and a comparison of the resulting assessments with expert assessments of phrasal intelligibility were carried out.

Download Full-text

Communication system capable of improving a speech quality by a pair of pulse producing units

The Journal of the Acoustical Society of America ◽

10.1121/1.406759 ◽

1993 ◽

Vol 93 (3) ◽

pp. 1677-1677

Author(s):

Kazunori Ozawa

Keyword(s):

Communication System ◽

Speech Quality

Download Full-text

Audio and Speech Watermarking and Quality Evaluation

Advances in Audio and Speech Signal Processing ◽

10.4018/978-1-59904-132-2.ch006 ◽

2011 ◽

pp. 161-188 ◽

Cited By ~ 1

Author(s):

Ronghui Tu ◽

Jiying Zhao

Keyword(s):

Digital Watermarking ◽

Copyright Protection ◽

Quality Evaluation ◽

Speech Quality ◽

Speech Signals ◽

Evaluation Standards ◽

Speech Watermarking ◽

Generic Framework ◽

Copyright Information ◽

Host Signal

In this chapter, we will introduce digital watermarking systems and quality evaluation algorithms for audio and speech signals. Originally designed for copyright protection, digital watermarking is a technique that embeds some data to the host signal to show the copyright information. The embedded data, or the watermark, cannot be eliminated from the host signal with normal use or even after some attacks. In the first part of this chapter, we introduce a generic framework of digital watermarking systems. Several properties and application of the watermarking systems are then described. Focused on audio and speech signals, we divide the watermarking algorithms into robust and fragile ones. Different concerns and techniques are described for each category. In the second part of this chapter, a novel watermarking algorithm for audio and speech quality evaluation is introduced together with some conventional quality evaluation standards.

Download Full-text