scholarly journals MASS: Microphone Array Speech Simulator in Room Acoustic Environment for Multi-Channel Speech Coding and Enhancement

2020 ◽  
Vol 10 (4) ◽  
pp. 1484 ◽  
Author(s):  
Rui Cheng ◽  
Changchun Bao ◽  
Zihao Cui

Multi-channel speech coding and enhancement is an indispensable technology in speech communication. In order to verify the effectiveness of multi-channel speech coding and enhancement methods in the research and development, a microphone array speech simulator (MASS) used in room acoustic environment is proposed. The proposed MASS is the improvement and extension of the existing multi-channel speech simulator. It aims to simulate clean speech, noisy speech, clean speech with reverberation, noisy speech with reverberation, and noise signals by microphone array used for multi-channel coding and enhancement of speech signal in room acoustic environment. The experimental results of the multi-channel speech coding and enhancement prove that the MASS could well simulate the signals used in real room acoustic environment and can be applied to the research of the related fields.

Author(s):  
Xianyun Wang ◽  
Changchun Bao

AbstractAccording to the encoding and decoding mechanism of binaural cue coding (BCC), in this paper, the speech and noise are considered as left channel signal and right channel signal of the BCC framework, respectively. Subsequently, the speech signal is estimated from noisy speech when the inter-channel level difference (ICLD) and inter-channel correlation (ICC) between speech and noise are given. In this paper, exact inter-channel cues and the pre-enhanced inter-channel cues are used for speech restoration. The exact inter-channel cues are extracted from clean speech and noise, and the pre-enhanced inter-channel cues are extracted from the pre-enhanced speech and estimated noise. After that, they are combined one by one to form a codebook. Once the pre-enhanced cues are extracted from noisy speech, the exact cues are estimated by a mapping between the pre-enhanced cues and a prior codebook. Next, the estimated exact cues are used to obtain a time-frequency (T-F) mask for enhancing noisy speech based on the decoding of BCC. In addition, in order to further improve accuracy of the T-F mask based on the inter-channel cues, the deep neural network (DNN)-based method is proposed to learn the mapping relationship between input features of noisy speech and the T-F masks. Experimental results show that the codebook-driven method can achieve better performance than conventional methods, and the DNN-based method performs better than the codebook-driven method.


1997 ◽  
Vol 101 (3-4) ◽  
pp. 177-185 ◽  
Author(s):  
Eiji Uchino ◽  
Shin Nakamura ◽  
Takeshi Yamakawa

Author(s):  
Shibanee Dash . ◽  
Mihir Narayan Mohanty .

Modern wireless communication has gained a improved position as compared to previous time. Similarly, speech communication is the major focus area of research in respective applications. Many developments are done in this field. In this work, we have chosen the OFDM modulation based communication system, as it has importance in both licensed and unlicensed wireless communication platform. The voice signal is passed though the proposed model to obtain at the receiver end. Due to different circumstances, the signal may be corrupted partially at the user end. Authors try to achieve a better signal for reception using a neural network model of RBFN. The parameters are chosen for the RBFN model, as energy, ZCR, ACF, and fundamental frequency of the speech signal. In one part these parameters have eligibility to eliminate noise partially, where as in other part the RBFN model with these parameters proves its efficacy for both noisy speech signals with noisy channel as Gaussian channel. The efficiency of OFDM model is verified in terms of symbol error rate and the transmitted speech signal is evaluated in term of SNR that shows the reduction of noise. For visual inspection, a sample of signal, noisy signal and received signal is also shown. The experiment is performed with 5dB, 10dB, 15dB noise levels. The result proves the performance of RBFN model as the filter.The performance is measured as the listener’s voice in each condition. The results show that, at the time of the voice in noise environment, proposed technique improves the intelligibility on speech quality.


2009 ◽  
Vol 22 (3) ◽  
pp. 391-404
Author(s):  
Zoran Milivojevic ◽  
Dragisa Balaneskovic

This paper presents an algorithm for enhancement of the noisy speech signal quality. This algorithm is based on the dissonant frequency filtering (DFF), F#, B and C# in relation to the frequency of the primary tone C (DFF-FBC algorithm). By means of the subjective Mean Opinion Score (MOS) test, the effect of the enhancement of the speech signal quality was analyzed. The analysis of the MOS test results, presented in the second part of this paper, points out to the enhancement of the noisy speech signal quality in the presence of superimposed noises. Especially good results have been found with Husky Voice signal. .


Author(s):  
Ismail Shayeb ◽  
Naseem Asad ◽  
Ziad Alqadi ◽  
Qazem Jaber

Human speech digital signals are famous and important digital types, they are used in many vital applications which require a high speed processing, so creating a speech signal features is a needed issue. In this research paper we will study more widely used methods of features extraction, we will implement them, and the obtained experimental results will be compared, efficiency parameters such as extraction time and throughput will be obtained and a speedup of each method will be calculated. Speech signal histogram will be used to improve some methods efficiency.


2021 ◽  
Vol 263 (2) ◽  
pp. 4733-4742
Author(s):  
Shoken Kaneko ◽  
Nirupam Roy ◽  
Nail Gumerov ◽  
Ramani Duraiswami

Active Noise Cancellation (ANC) at a target area in an open space, as opposed to cancellation in the ears through headphones, can lead to future applications. For instance, a personal acoustic environment in an airplane seat or inside a car, or a quiet zone in a noisy shared workspace can be possible using such open-space ANC without any uncomfortable on-body audio equipment. Recent advancements reinforce the practicality of such systems. However, regularization of the cancellation signal has been a crucial challenge in open-space ANC as it causes amplification of noise at locations away from the target area. This work presents a spherical harmonics-domain feed-forward spatial ANC method with a room-wide global cost function to address this issue. This room-wide global cost function is used for optimizing the set of regularization hyperparameters, while at run time only local information captured by a microphone array surrounding the target listening zone is required. Numerical experiments applying the proposed method in a simulated reverberant room show the effectiveness of the proposed method in creating a specific zone of silence with low to moderate noise amplification in the rest of the room.


2004 ◽  
Author(s):  
Shigeki Sagayama ◽  
Okajima Takashi ◽  
Kamamoto Yutaka ◽  
Nishimoto Takuya

1963 ◽  
Vol 6 (3) ◽  
pp. 207-222 ◽  
Author(s):  
J. M. Pickett ◽  
B. Horenstein Pickett

Tests of tactual speech perception were conducted using a special frequency-analyzing vocoder. The vocoder presented a running frequency analysis of speech mapped into a spatial array of tactual vibrations which were applied to the fingers of the receiving subject. Ten vibrators were used, one for each finger. The position of a vibrator represented a given frequency region of speech energy; the total range covered was 210 to 7 700 cps; all the vibrations had a frequency of 300 cps; the vibration amplitudes represented the energy distribution over the various frequencies. Discrimination and identification tests were performed with various sets of test vowels; consonant discrimination tests were performed with certain consonants including those that might be difficult to lipread. Performance with vowels appeared to be related to formant structure and duration as measured on the test vowels, and to tactual masking effects. Consonant discrimination was good between stops and continuants; consonant features of nasality, voicing, and affrication were also discriminated to some extent. It is concluded that the skin offers certain capacities for transmitting speech information which may be used to complement speech communication where only an impoverished speech signal is normally received. This research was conducted at the Speech Transmission Laboratory, Royal Institute of Technology, Stockholm, Sweden.


Author(s):  
Kazuo Takeya ◽  
Yasuo Oteki ◽  
Hajime Yasui

The outline of plans for the research and development of an advanced reheat gas turbine under the Moonlight Project (Agency of Industrial Science and Technology, Ministry of International Trade and Industry) has already been announced in 1981 at Houston (81-GT-28), while technical problems related to the pilot plant (Paper No. 83-TOKYO-IGTC-117) as well as performance and characteristics (Paper No. 83-TOKYO-IGTC-40) have been announced at the 1983 Tokyo International Gas Turbine Congress. No-load shop tests conducted on the pilot reheat gas turbine during the period of May to July, 1983, were consummated with highly satisfactory results, so this paper is dedicated primarily to giving a description of the shop tests.


Sign in / Sign up

Export Citation Format

Share Document