Evaluation of Speech Processing Innovations to Improve Voice Quality and Intelligibility in Narrow‐Band Voice Communication

Caldwell P. Smith

doi:10.1121/1.1982315

Regression-Based Noise Modeling for Speech Signal Processing

Fluctuation and Noise Letters ◽

10.1142/s021947752150022x ◽

2021 ◽

pp. 2150022

Author(s):

Caio Cesar Enside de Abreu ◽

Marco Aparecido Queiroz Duarte ◽

Bruno Rodrigues de Oliveira ◽

Jozue Vieira Filho ◽

Francisco Villarreal

Keyword(s):

Speech Enhancement ◽

Speech Processing ◽

Acoustic Analysis ◽

Voice Quality ◽

Wiener Filter ◽

Processing System ◽

Speech Quality ◽

Speech Signals ◽

Speech Signal Processing ◽

Acoustic Environment

Speech processing systems are very important in different applications involving speech and voice quality such as automatic speech recognition, forensic phonetics and speech enhancement, among others. In most of them, the acoustic environmental noise is added to the original signal, decreasing the signal-to-noise ratio (SNR) and the speech quality by consequence. Therefore, estimating noise is one of the most important steps in speech processing whether to reduce it before processing or to design robust algorithms. In this paper, a new approach to estimate noise from speech signals is presented and its effectiveness is tested in the speech enhancement context. For this purpose, partial least squares (PLS) regression is used to model the acoustic environment (AE) and a Wiener filter based on a priori SNR estimation is implemented to evaluate the proposed approach. Six noise types are used to create seven acoustically modeled noises. The basic idea is to consider the AE model to identify the noise type and estimate its power to be used in a speech processing system. Speech signals processed using the proposed method and classical noise estimators are evaluated through objective measures. Results show that the proposed method produces better speech quality than state-of-the-art noise estimators, enabling it to be used in real-time applications in the field of robotic, telecommunications and acoustic analysis.

Download Full-text

The Design of Voice Communication and Data Collection System Based on Double CAN Bus

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.331.348 ◽

2013 ◽

Vol 331 ◽

pp. 348-351

Author(s):

Hui Dong Li ◽

Jian Min Zhang ◽

Hao Zhou Wang

Keyword(s):

Data Collection ◽

Can Bus ◽

Voice Quality ◽

Experimental Tests ◽

Accurate Information ◽

Collection System ◽

Time Data ◽

Voice Communication ◽

Data Collection System ◽

Transmission Distance

Aiming at the disadvantages of traditional coal mine voice communication system such as short transmission distance, weak anti-jamming capability and poor voice quality, designed a voice communication and data collection system based on double CAN bus. The system adopts double CAN bus transmission design, CAN 1 bus is responsible for transmitting voice data, CAN 2 bus is responsible for real-time data acquisition. Double CAN bus design make voice information and control information does not interfere with each other, ensured the timely and accurate information transmission. Introduced hardware and software design of the system in detail, experimental tests showed that the system is stable with long transmission distance and high voice quality, and has certain application value.

Download Full-text

EC Application in Speech Processing - Voice Quality Conversion Using Interactive Evolution of Prosodic Control

Evolutionary Computation ◽

10.5772/9612 ◽

2009 ◽

Author(s):

Yuji Sato

Keyword(s):

Speech Processing ◽

Voice Quality

Download Full-text

A Statistical Approach to Adaptive Playout Scheduling in Voice Over Internet Protocol Communication

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v8i5.pp2926-2933 ◽

2018 ◽

Vol 8 (5) ◽

pp. 2926

Author(s):

Priya Chandran ◽

Chelpa Lingam

Keyword(s):

Packet Loss ◽

Loss Rate ◽

Voice Quality ◽

Internet Protocol ◽

Packet Loss Rate ◽

Voice Over Internet Protocol ◽

Network Delay ◽

Voice Communication ◽

Voice Packets

Factors like network delay, latency and bandwidth significantly affect the quality of communication using Voice over Internet Protocol. The use of jitter buffer at the receiving end compensates the effect of varying network delay up to some extent. But the extra buffer delay given for each packet plays a major role in playing late packets and thereby improving voice quality. As the buffer delay increases packet loss rate decreases, which in general is a very good sign. However, an increase of buffer delay beyond a certain limit affects the interactive quality of voice communication. In this paper, we propose a statistical framework for adaptive playout scheduling of voice packets based on network statistics, packet loss rate and availability of packets in the buffer. Experimental results show that the proposed model allocates optimal buffer delay with the lowest packet loss rate when compared with other algorithms.

Download Full-text

Emotional Speech Processing in 3- to 12-Month-Old Infants: Influences of Emotion Categories and Acoustic Parameters

Journal of Speech Language and Hearing Research ◽

10.1044/2021_jslhr-21-00234 ◽

2022 ◽

pp. 1-14

Author(s):

Chieh Kao ◽

Maria D. Sera ◽

Yang Zhang

Keyword(s):

Speech Processing ◽

Speech Rate ◽

Voice Quality ◽

Intensity Variation ◽

Acoustic Properties ◽

Emotional Speech ◽

Spectral Centroid ◽

Acoustic Correlates ◽

Spoken Words ◽

Word Duration

Purpose: The aim of this study was to investigate infants' listening preference for emotional prosodies in spoken words and identify their acoustic correlates. Method: Forty-six 3- to-12-month-old infants ( M age = 7.6 months) completed a central fixation (or look-to-listen) paradigm in which four emotional prosodies (happy, sad, angry, and neutral) were presented. Infants' looking time to the string of words was recorded as a proxy of their listening attention. Five acoustic variables—mean fundamental frequency (F0), word duration, intensity variation, harmonics-to-noise ratio (HNR), and spectral centroid—were also analyzed to account for infants' attentiveness to each emotion. Results: Infants generally preferred affective over neutral prosody, with more listening attention to the happy and sad voices. Happy sounds with breathy voice quality (low HNR) and less brightness (low spectral centroid) maintained infants' attention more. Sad speech with shorter word duration (i.e., faster speech rate), less breathiness, and more brightness gained infants' attention more than happy speech did. Infants listened less to angry than to happy and sad prosodies, and none of the acoustic variables were associated with infants' listening interests in angry voices. Neutral words with a lower F0 attracted infants' attention more than those with a higher F0. Neither age nor sex effects were observed. Conclusions: This study provides evidence for infants' sensitivity to the prosodic patterns for the basic emotion categories in spoken words and how the acoustic properties of emotional speech may guide their attention. The results point to the need to study the interplay between early socioaffective and language development.

Download Full-text

Simulated cochlear-implant processing reveals major loss of acoustic information relevant to talkers’ voice quality differencesa

10.1101/2020.06.29.20142885 ◽

2020 ◽

Author(s):

Meisam K. Arjmandi ◽

Hamzeh Ghasemzadeh ◽

Laura C. Dilley

Keyword(s):

Cochlear Implant ◽

Speech Processing ◽

Voice Quality ◽

Channel Noise ◽

Support Vector ◽

Mel Frequency Cepstral Coefficients ◽

Acoustic Information ◽

Major Loss ◽

Quality Distinction ◽

Robust Speech Processing

ABSTRACTThe ability to discern variations in talkers’ voice quality is important for effective talker identification and robust speech processing; yet, little is known about how faithfully acoustic information relevant to variations in talkers’ voice quality is transmitted through cochlear implant (CI) speech processing. This study analyzed unprocessed and CI-simulated versions of sustained vowel sounds /a/ from two groups of individuals with normal and disordered voice qualities to investigate the effects of CI speech processing on acoustic information relevant to the talkers’ voice quality distinction. The CI-simulated stimuli were created by processing the vowel sounds using 4-, 8-, 12-, 16-, 22-, and 32-channel noise-vocoders. The voice quality for each stimulus was characterized by calculating mel-frequency cepstral coefficients (MFCCs). Then, the effects of CI speech processing on the acoustic distinctiveness between normal and disordered voices was measured by calculating the Mahalanobis distance and classification accuracy of support vector machines (SVMs) on their MFCC features. The results showed that CI noise vocoding is substantially detrimental to acoustic information involved in voice quality distinction, suggesting that CI listeners likely experience difficulties in perceiving voice quality variations. The results underscore challenges CI users may face for effective recognition of talkers and processing their speech.

Download Full-text

High Pass Filter Application to Reduce Voice Communication Delays on IP Phones

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.38716 ◽

2021 ◽

Vol 9 (10) ◽

pp. 1876-1879

Author(s):

Martono Dwi Atmadja

Keyword(s):

Voice Quality ◽

Average Delay ◽

Low Frequencies ◽

Voice Communication ◽

Pass Filter ◽

High Pass Filter ◽

Telephone Set ◽

Ip Network ◽

Ip Phone ◽

High Pass

Abstract: VoIP technology is equipped with several functions, namely, the signalling function, meaning the VoIP is in charge of receiving the network from the caller, after which the conversation delivered. This technology is capable to pass voice traffic in the form of packets over an IP network. Packets of sound undergo a long process or delay to get to the destination it can damage the voice’s quality being heard. It happened because there is continuous delay in the communication between IP phone set on VoIP technology that causes echo in the receiver’s voice. Echo can occur during communication with IP phone set with average delay capacity above 5ms. The delay on the delivery and reception as in the results of communication between telephones can be adjusted for sound frequency of 300 Hz to 1000 Hz as a cut off by adding High Pass Filter (HPF) application. HPF filter application is able to stabilize the amplitude of about -21 dBm from the set of transmission test on the receiver when there is weakening at low frequencies, but when the frequency is raised to 1500-2250 Hz the amplitude strengthens to -12,2 dBm or increases the value of about 8.8 dB. Meanwhile, for the lower frequency such as 300 – 950 Hz, the filter would not pass it since the frequency is designed to be cut-off at 1000 Hz. The value of delay is narrower to 0.08ms by HPF application at the frequency of sound upper limit received by any 1000-3400 Hz telephone set. Keywords: VoIP technology, echo cancelling, HPF filter, delay, voice quality

Download Full-text

Target Performance Analysis of Tactical Voice Communication on VHF Narrow-band in Combat Network Radio System

Journal of the Korea Institute of Military Science and Technology ◽

10.9766/kimst.2021.24.1.107 ◽

2021 ◽

Vol 24 (1) ◽

pp. 107-114

Author(s):

JaeUk Kim ◽

Joonhah Park ◽

Chulho Lee ◽

Byungkyu Lee ◽

Hayeon Jung

Keyword(s):

Performance Analysis ◽

Narrow Band ◽

Voice Communication ◽

Radio System ◽

Target Performance

Download Full-text

Comfort Noise Mechanism for Narrow Band Secure Voice Communication

2020 28th Signal Processing and Communications Applications Conference (SIU) ◽

10.1109/siu49456.2020.9302406 ◽

2020 ◽

Keyword(s):

Narrow Band ◽

Voice Communication

Download Full-text

Effects of Noise and a Speaker's Impaired Voice Quality on Spoken Language Processing in School-Aged Children: A Systematic Review and Meta-Analysis

Journal of Speech Language and Hearing Research ◽

10.1044/2021_jslhr-21-00183 ◽

2021 ◽

pp. 1-31

Author(s):

Isabel S. Schiller ◽

Angélique Remacle ◽

Nancy Durieux ◽

Dominique Morsomme

Keyword(s):

Systematic Review ◽

Language Processing ◽

Speech Processing ◽

Listening Comprehension ◽

Signal To Noise Ratio ◽

Meta Analysis ◽

Voice Quality ◽

Spoken Language ◽

Spoken Language Processing ◽

The Impact

Purpose: Background noise and voice problems among teachers can degrade listening conditions in classrooms. The aim of this literature review is to understand how these acoustic degradations affect spoken language processing in 6- to 18-year-old children. Method: In a narrative report and meta-analysis, we systematically review studies that examined the effects of noise and/or impaired voice on children's response accuracy and response time (RT) in listening tasks. We propose the Speech Processing under Acoustic DEgradations (SPADE) framework to classify relevant findings according to three processing dimensions—speech perception, listening comprehension, and auditory working memory—and highlight potential moderators. Results: Thirty-one studies are included in this systematic review. Our meta-analysis shows that noise can impede children's accuracy in listening tasks across all processing dimensions (Cohen's d between −0.67 and −2.65, depending on signal-to-noise ratio) and that impaired voice lowers children's accuracy in listening comprehension tasks ( d = −0.35). A handful of studies assessed RT, but results are inconclusive. The impact of noise and impaired voice can be moderated by listener, task, environmental, and exposure factors. The interaction between noise and impaired voice remains underinvestigated. Conclusions: Overall, this review suggests that children have more trouble perceiving speech, processing verbal messages, and recalling verbal information when listening to speech in noise or to a speaker with dysphonia. Impoverished speech input could impede pupils' motivation and academic performance at school. Supplemental Material https://doi.org/10.23641/asha.17139377

Download Full-text