MASS: Microphone Array Speech Simulator in Room Acoustic Environment for Multi-Channel Speech Coding and Enhancement

Rui Cheng; Changchun Bao; Zihao Cui

doi:10.3390/app10041484

MASS: Microphone Array Speech Simulator in Room Acoustic Environment for Multi-Channel Speech Coding and Enhancement

Applied Sciences ◽

10.3390/app10041484 ◽

2020 ◽

Vol 10 (4) ◽

pp. 1484 ◽

Cited By ~ 3

Author(s):

Rui Cheng ◽

Changchun Bao ◽

Zihao Cui

Keyword(s):

Research And Development ◽

Speech Signal ◽

Channel Coding ◽

Speech Coding ◽

Microphone Array ◽

Experimental Results ◽

Speech Communication ◽

Noisy Speech ◽

Acoustic Environment

Multi-channel speech coding and enhancement is an indispensable technology in speech communication. In order to verify the effectiveness of multi-channel speech coding and enhancement methods in the research and development, a microphone array speech simulator (MASS) used in room acoustic environment is proposed. The proposed MASS is the improvement and extension of the existing multi-channel speech simulator. It aims to simulate clean speech, noisy speech, clean speech with reverberation, noisy speech with reverberation, and noise signals by microphone array used for multi-channel coding and enhancement of speech signal in room acoustic environment. The experimental results of the multi-channel speech coding and enhancement prove that the MASS could well simulate the signals used in real room acoustic environment and can be applied to the research of the related fields.

Download Full-text

Speech enhancement methods based on binaural cue coding

EURASIP Journal on Audio Speech and Music Processing ◽

10.1186/s13636-019-0164-x ◽

2019 ◽

Vol 2019 (1) ◽

Author(s):

Xianyun Wang ◽

Changchun Bao

Keyword(s):

Neural Network ◽

Speech Enhancement ◽

Speech Signal ◽

Deep Neural Network ◽

Experimental Results ◽

Channel Correlation ◽

Noisy Speech ◽

Time Frequency ◽

Improve Accuracy ◽

Better Than

AbstractAccording to the encoding and decoding mechanism of binaural cue coding (BCC), in this paper, the speech and noise are considered as left channel signal and right channel signal of the BCC framework, respectively. Subsequently, the speech signal is estimated from noisy speech when the inter-channel level difference (ICLD) and inter-channel correlation (ICC) between speech and noise are given. In this paper, exact inter-channel cues and the pre-enhanced inter-channel cues are used for speech restoration. The exact inter-channel cues are extracted from clean speech and noise, and the pre-enhanced inter-channel cues are extracted from the pre-enhanced speech and estimated noise. After that, they are combined one by one to form a codebook. Once the pre-enhanced cues are extracted from noisy speech, the exact cues are estimated by a mapping between the pre-enhanced cues and a prior codebook. Next, the estimated exact cues are used to obtain a time-frequency (T-F) mask for enhancing noisy speech based on the decoding of BCC. In addition, in order to further improve accuracy of the T-F mask based on the inter-channel cues, the deep neural network (DNN)-based method is proposed to learn the mapping relationship between input features of noisy speech and the T-F masks. Experimental results show that the codebook-driven method can achieve better performance than conventional methods, and the DNN-based method performs better than the codebook-driven method.

Download Full-text

Nonlinear modeling and filtering by RBF network with application to noisy speech signal

Information Sciences ◽

10.1016/s0020-0255(97)00008-x ◽

1997 ◽

Vol 101 (3-4) ◽

pp. 177-185 ◽

Cited By ~ 2

Author(s):

Eiji Uchino ◽

Shin Nakamura ◽

Takeshi Yamakawa

Keyword(s):

Speech Signal ◽

Nonlinear Modeling ◽

Noisy Speech ◽

Rbf Network

Download Full-text

A Machine Learning Approach for Speech Detection in Modern Wireless Communication Environment

International Journal of Machine Learning and Networked Collaborative Engineering ◽

10.30991/ijmlnce.2018v02i04.004 ◽

2018 ◽

Vol 2 (4) ◽

Author(s):

Shibanee Dash . ◽

Mihir Narayan Mohanty .

Keyword(s):

Wireless Communication ◽

Speech Signal ◽

Symbol Error Rate ◽

Speech Communication ◽

Noise Levels ◽

Speech Detection ◽

Voice Signal ◽

Proposed Model ◽

Machine Learning Approach ◽

The Voice

Modern wireless communication has gained a improved position as compared to previous time. Similarly, speech communication is the major focus area of research in respective applications. Many developments are done in this field. In this work, we have chosen the OFDM modulation based communication system, as it has importance in both licensed and unlicensed wireless communication platform. The voice signal is passed though the proposed model to obtain at the receiver end. Due to different circumstances, the signal may be corrupted partially at the user end. Authors try to achieve a better signal for reception using a neural network model of RBFN. The parameters are chosen for the RBFN model, as energy, ZCR, ACF, and fundamental frequency of the speech signal. In one part these parameters have eligibility to eliminate noise partially, where as in other part the RBFN model with these parameters proves its efficacy for both noisy speech signals with noisy channel as Gaussian channel. The efficiency of OFDM model is verified in terms of symbol error rate and the transmitted speech signal is evaluated in term of SNR that shows the reduction of noise. For visual inspection, a sample of signal, noisy signal and received signal is also shown. The experiment is performed with 5dB, 10dB, 15dB noise levels. The result proves the performance of RBFN model as the filter.The performance is measured as the listener’s voice in each condition. The results show that, at the time of the voice in noise environment, proposed technique improves the intelligibility on speech quality.

Download Full-text

Enhancement of the perceptive quality of the noisy speech signal by using of DFF-FBC algorithm

Facta universitatis - series Electronics and Energetics ◽

10.2298/fuee0903391m ◽

2009 ◽

Vol 22 (3) ◽

pp. 391-404

Author(s):

Zoran Milivojevic ◽

Dragisa Balaneskovic

Keyword(s):

Speech Signal ◽

Mean Opinion Score ◽

Signal Quality ◽

Test Results ◽

Frequency Filtering ◽

Noisy Speech ◽

Voice Signal ◽

Opinion Score

This paper presents an algorithm for enhancement of the noisy speech signal quality. This algorithm is based on the dissonant frequency filtering (DFF), F#, B and C# in relation to the frequency of the primary tone C (DFF-FBC algorithm). By means of the subjective Mean Opinion Score (MOS) test, the effect of the enhancement of the speech signal quality was analyzed. The analysis of the MOS test results, presented in the second part of this paper, points out to the enhancement of the noisy speech signal quality in the presence of superimposed noises. Especially good results have been found with Husky Voice signal. .

Download Full-text

Evaluation of speech signal features extraction methods

Journal of Applied Science, Engineering, Technology, and Education ◽

10.35877/454ri.asci2151 ◽

2020 ◽

Vol 2 (1) ◽

pp. 69-78

Author(s):

Ismail Shayeb ◽

Naseem Asad ◽

Ziad Alqadi ◽

Qazem Jaber

Keyword(s):

Speech Signal ◽

High Speed ◽

Research Paper ◽

Extraction Methods ◽

Features Extraction ◽

Experimental Results ◽

Extraction Time ◽

Digital Signals ◽

Human Speech ◽

Signal Features

Human speech digital signals are famous and important digital types, they are used in many vital applications which require a high speed processing, so creating a speech signal features is a needed issue. In this research paper we will study more widely used methods of features extraction, we will implement them, and the obtained experimental results will be compared, efficiency parameters such as extraction time and throughput will be obtained and a speedup of each method will be calculated. Speech signal histogram will be used to improve some methods efficiency.

Download Full-text

Spectral Multi-scale Product Analysis for Pitch Estimation from Noisy Speech Signal

Advances in Nonlinear Speech Processing - Lecture Notes in Computer Science ◽

10.1007/978-3-642-11509-7_13 ◽

2010 ◽

pp. 95-102 ◽

Cited By ~ 2

Author(s):

Mohamed Anouar Ben Messaoud ◽

Aïcha Bouzid ◽

Noureddine Ellouze

Keyword(s):

Speech Signal ◽

Product Analysis ◽

Pitch Estimation ◽

Noisy Speech ◽

Multi Scale

Download Full-text

Regularized spherical harmonics-domain spatial active noise cancellation in a reverberant room

INTER-NOISE and NOISE-CON Congress and Conference Proceedings ◽

10.3397/in-2021-2820 ◽

2021 ◽

Vol 263 (2) ◽

pp. 4733-4742

Author(s):

Shoken Kaneko ◽

Nirupam Roy ◽

Nail Gumerov ◽

Ramani Duraiswami

Keyword(s):

Cost Function ◽

Spherical Harmonics ◽

Open Space ◽

Microphone Array ◽

Noise Cancellation ◽

Target Area ◽

Acoustic Environment ◽

Active Noise Cancellation ◽

Active Noise ◽

Global Cost

Active Noise Cancellation (ANC) at a target area in an open space, as opposed to cancellation in the ears through headphones, can lead to future applications. For instance, a personal acoustic environment in an airplane seat or inside a car, or a quiet zone in a noisy shared workspace can be possible using such open-space ANC without any uncomfortable on-body audio equipment. Recent advancements reinforce the practicality of such systems. However, regularization of the cancellation signal has been a crucial challenge in open-space ANC as it causes amplification of noise at locations away from the target area. This work presents a spherical harmonics-domain feed-forward spatial ANC method with a room-wide global cost function to address this issue. This room-wide global cost function is used for optimizing the set of regularization hyperparameters, while at run time only local information captured by a microphone array surrounding the target listening zone is required. Numerical experiments applying the proposed method in a simulated reverberant room show the effectiveness of the proposed method in creating a specific zone of silence with low to moderate noise amplification in the rest of the room.

Download Full-text

Complex spectrum circle centroid for microphone-array-based noisy speech recognition

10.21437/interspeech.2004-306 ◽

2004 ◽

Author(s):

Shigeki Sagayama ◽

Okajima Takashi ◽

Kamamoto Yutaka ◽

Nishimoto Takuya

Keyword(s):

Speech Recognition ◽

Microphone Array ◽

Complex Spectrum ◽

Noisy Speech ◽

Noisy Speech Recognition

Download Full-text

Communication of Speech Sounds by a Tactual Vocoder

Journal of Speech and Hearing Research ◽

10.1044/jshr.0603.207 ◽

1963 ◽

Vol 6 (3) ◽

pp. 207-222 ◽

Cited By ~ 21

Author(s):

J. M. Pickett ◽

B. Horenstein Pickett

Keyword(s):

Speech Perception ◽

Energy Distribution ◽

Frequency Analysis ◽

Speech Signal ◽

Frequency Region ◽

Speech Communication ◽

Royal Institute ◽

Speech Transmission ◽

Institute Of Technology ◽

Speech Information

Tests of tactual speech perception were conducted using a special frequency-analyzing vocoder. The vocoder presented a running frequency analysis of speech mapped into a spatial array of tactual vibrations which were applied to the fingers of the receiving subject. Ten vibrators were used, one for each finger. The position of a vibrator represented a given frequency region of speech energy; the total range covered was 210 to 7 700 cps; all the vibrations had a frequency of 300 cps; the vibration amplitudes represented the energy distribution over the various frequencies. Discrimination and identification tests were performed with various sets of test vowels; consonant discrimination tests were performed with certain consonants including those that might be difficult to lipread. Performance with vowels appeared to be related to formant structure and duration as measured on the test vowels, and to tactual masking effects. Consonant discrimination was good between stops and continuants; consonant features of nasality, voicing, and affrication were also discriminated to some extent. It is concluded that the skin offers certain capacities for transmitting speech information which may be used to complement speech communication where only an impoverished speech signal is normally received. This research was conducted at the Speech Transmission Laboratory, Royal Institute of Technology, Stockholm, Sweden.

Download Full-text

Current Status of Advanced Reheat Gas Turbine AGTJ-100A: Part 3 — Experimental Results of Shop Tests

Volume 4: Heat Transfer; Electric Power ◽

10.1115/84-gt-57 ◽

1984 ◽

Cited By ~ 1

Author(s):

Kazuo Takeya ◽

Yasuo Oteki ◽

Hajime Yasui

Keyword(s):

International Trade ◽

Research And Development ◽

Gas Turbine ◽

Pilot Plant ◽

Science And Technology ◽

Experimental Results ◽

Current Status ◽

Technical Problems ◽

Industrial Science

The outline of plans for the research and development of an advanced reheat gas turbine under the Moonlight Project (Agency of Industrial Science and Technology, Ministry of International Trade and Industry) has already been announced in 1981 at Houston (81-GT-28), while technical problems related to the pilot plant (Paper No. 83-TOKYO-IGTC-117) as well as performance and characteristics (Paper No. 83-TOKYO-IGTC-40) have been announced at the 1983 Tokyo International Gas Turbine Congress. No-load shop tests conducted on the pilot reheat gas turbine during the period of May to July, 1983, were consummated with highly satisfactory results, so this paper is dedicated primarily to giving a description of the shop tests.

Download Full-text