Evaluation of Speech Processing Innovations to Improve Voice Quality and Intelligibility in Narrow‐Band Voice Communication

1973 ◽  
Vol 53 (1) ◽  
pp. 321-321
Author(s):  
Caldwell P. Smith
2021 ◽  
pp. 2150022
Author(s):  
Caio Cesar Enside de Abreu ◽  
Marco Aparecido Queiroz Duarte ◽  
Bruno Rodrigues de Oliveira ◽  
Jozue Vieira Filho ◽  
Francisco Villarreal

Speech processing systems are very important in different applications involving speech and voice quality such as automatic speech recognition, forensic phonetics and speech enhancement, among others. In most of them, the acoustic environmental noise is added to the original signal, decreasing the signal-to-noise ratio (SNR) and the speech quality by consequence. Therefore, estimating noise is one of the most important steps in speech processing whether to reduce it before processing or to design robust algorithms. In this paper, a new approach to estimate noise from speech signals is presented and its effectiveness is tested in the speech enhancement context. For this purpose, partial least squares (PLS) regression is used to model the acoustic environment (AE) and a Wiener filter based on a priori SNR estimation is implemented to evaluate the proposed approach. Six noise types are used to create seven acoustically modeled noises. The basic idea is to consider the AE model to identify the noise type and estimate its power to be used in a speech processing system. Speech signals processed using the proposed method and classical noise estimators are evaluated through objective measures. Results show that the proposed method produces better speech quality than state-of-the-art noise estimators, enabling it to be used in real-time applications in the field of robotic, telecommunications and acoustic analysis.


2013 ◽  
Vol 331 ◽  
pp. 348-351
Author(s):  
Hui Dong Li ◽  
Jian Min Zhang ◽  
Hao Zhou Wang

Aiming at the disadvantages of traditional coal mine voice communication system such as short transmission distance, weak anti-jamming capability and poor voice quality, designed a voice communication and data collection system based on double CAN bus. The system adopts double CAN bus transmission design, CAN 1 bus is responsible for transmitting voice data, CAN 2 bus is responsible for real-time data acquisition. Double CAN bus design make voice information and control information does not interfere with each other, ensured the timely and accurate information transmission. Introduced hardware and software design of the system in detail, experimental tests showed that the system is stable with long transmission distance and high voice quality, and has certain application value.


Author(s):  
Priya Chandran ◽  
Chelpa Lingam

Factors like network delay, latency and bandwidth significantly affect the quality of communication using Voice over Internet Protocol. The use of jitter buffer at the receiving end compensates the effect of varying network delay up to some extent. But the extra buffer delay given for each packet plays a major role in playing late packets and thereby improving voice quality. As the buffer delay increases packet loss rate decreases, which in general is a very good sign. However, an increase of buffer delay beyond a certain limit affects the interactive quality of voice communication. In this paper, we propose a statistical framework for adaptive playout scheduling of voice packets based on network statistics, packet loss rate and availability of packets in the buffer. Experimental results show that the proposed model allocates optimal buffer delay with the lowest packet loss rate when compared with other algorithms.


Author(s):  
Chieh Kao ◽  
Maria D. Sera ◽  
Yang Zhang

Purpose: The aim of this study was to investigate infants' listening preference for emotional prosodies in spoken words and identify their acoustic correlates. Method: Forty-six 3- to-12-month-old infants ( M age = 7.6 months) completed a central fixation (or look-to-listen) paradigm in which four emotional prosodies (happy, sad, angry, and neutral) were presented. Infants' looking time to the string of words was recorded as a proxy of their listening attention. Five acoustic variables—mean fundamental frequency (F0), word duration, intensity variation, harmonics-to-noise ratio (HNR), and spectral centroid—were also analyzed to account for infants' attentiveness to each emotion. Results: Infants generally preferred affective over neutral prosody, with more listening attention to the happy and sad voices. Happy sounds with breathy voice quality (low HNR) and less brightness (low spectral centroid) maintained infants' attention more. Sad speech with shorter word duration (i.e., faster speech rate), less breathiness, and more brightness gained infants' attention more than happy speech did. Infants listened less to angry than to happy and sad prosodies, and none of the acoustic variables were associated with infants' listening interests in angry voices. Neutral words with a lower F0 attracted infants' attention more than those with a higher F0. Neither age nor sex effects were observed. Conclusions: This study provides evidence for infants' sensitivity to the prosodic patterns for the basic emotion categories in spoken words and how the acoustic properties of emotional speech may guide their attention. The results point to the need to study the interplay between early socioaffective and language development.


2020 ◽  
Author(s):  
Meisam K. Arjmandi ◽  
Hamzeh Ghasemzadeh ◽  
Laura C. Dilley

ABSTRACTThe ability to discern variations in talkers’ voice quality is important for effective talker identification and robust speech processing; yet, little is known about how faithfully acoustic information relevant to variations in talkers’ voice quality is transmitted through cochlear implant (CI) speech processing. This study analyzed unprocessed and CI-simulated versions of sustained vowel sounds /a/ from two groups of individuals with normal and disordered voice qualities to investigate the effects of CI speech processing on acoustic information relevant to the talkers’ voice quality distinction. The CI-simulated stimuli were created by processing the vowel sounds using 4-, 8-, 12-, 16-, 22-, and 32-channel noise-vocoders. The voice quality for each stimulus was characterized by calculating mel-frequency cepstral coefficients (MFCCs). Then, the effects of CI speech processing on the acoustic distinctiveness between normal and disordered voices was measured by calculating the Mahalanobis distance and classification accuracy of support vector machines (SVMs) on their MFCC features. The results showed that CI noise vocoding is substantially detrimental to acoustic information involved in voice quality distinction, suggesting that CI listeners likely experience difficulties in perceiving voice quality variations. The results underscore challenges CI users may face for effective recognition of talkers and processing their speech.


Author(s):  
Martono Dwi Atmadja

Abstract: VoIP technology is equipped with several functions, namely, the signalling function, meaning the VoIP is in charge of receiving the network from the caller, after which the conversation delivered. This technology is capable to pass voice traffic in the form of packets over an IP network. Packets of sound undergo a long process or delay to get to the destination it can damage the voice’s quality being heard. It happened because there is continuous delay in the communication between IP phone set on VoIP technology that causes echo in the receiver’s voice. Echo can occur during communication with IP phone set with average delay capacity above 5ms. The delay on the delivery and reception as in the results of communication between telephones can be adjusted for sound frequency of 300 Hz to 1000 Hz as a cut off by adding High Pass Filter (HPF) application. HPF filter application is able to stabilize the amplitude of about -21 dBm from the set of transmission test on the receiver when there is weakening at low frequencies, but when the frequency is raised to 1500-2250 Hz the amplitude strengthens to -12,2 dBm or increases the value of about 8.8 dB. Meanwhile, for the lower frequency such as 300 – 950 Hz, the filter would not pass it since the frequency is designed to be cut-off at 1000 Hz. The value of delay is narrower to 0.08ms by HPF application at the frequency of sound upper limit received by any 1000-3400 Hz telephone set. Keywords: VoIP technology, echo cancelling, HPF filter, delay, voice quality


Author(s):  
Isabel S. Schiller ◽  
Angélique Remacle ◽  
Nancy Durieux ◽  
Dominique Morsomme

Purpose: Background noise and voice problems among teachers can degrade listening conditions in classrooms. The aim of this literature review is to understand how these acoustic degradations affect spoken language processing in 6- to 18-year-old children. Method: In a narrative report and meta-analysis, we systematically review studies that examined the effects of noise and/or impaired voice on children's response accuracy and response time (RT) in listening tasks. We propose the Speech Processing under Acoustic DEgradations (SPADE) framework to classify relevant findings according to three processing dimensions—speech perception, listening comprehension, and auditory working memory—and highlight potential moderators. Results: Thirty-one studies are included in this systematic review. Our meta-analysis shows that noise can impede children's accuracy in listening tasks across all processing dimensions (Cohen's d between −0.67 and −2.65, depending on signal-to-noise ratio) and that impaired voice lowers children's accuracy in listening comprehension tasks ( d = −0.35). A handful of studies assessed RT, but results are inconclusive. The impact of noise and impaired voice can be moderated by listener, task, environmental, and exposure factors. The interaction between noise and impaired voice remains underinvestigated. Conclusions: Overall, this review suggests that children have more trouble perceiving speech, processing verbal messages, and recalling verbal information when listening to speech in noise or to a speaker with dysphonia. Impoverished speech input could impede pupils' motivation and academic performance at school. Supplemental Material https://doi.org/10.23641/asha.17139377


Sign in / Sign up

Export Citation Format

Share Document