scholarly journals An Analysis of the Impact of Playout Delay Adjustments introduced by VoIP Jitter Buffers on Listening Speech Quality

2015 ◽  
Vol 101 (3) ◽  
pp. 616-631 ◽  
Author(s):  
Peter Počta ◽  
Hugh Melvin ◽  
Andrew Hines
2007 ◽  
Vol 2007 ◽  
pp. 1-9 ◽  
Author(s):  
Peter Počta ◽  
Peter Kortiš ◽  
Martin Vaculík

This paper describes measurements of the impact of background traffic on speech quality in an environment of WLANs (IEEE 802.11). The simulated background traffic consists of three types of current traffics in telecommunication networks such as data transfer service, multimedia streaming service, and Web service. The background traffic was generated by means of the accomplished Distributed Internet Traffic Generator (D-ITG). The impact of these types of traffic and traffic load on speech quality using the test sequence and speech sequences is the aim of this paper. The assessment of speech quality is carried out by means of the accomplished Perceptual Evaluation of Speech Quality (PESQ) algorithm. The proposal of a new method for improved detection of the critical conditions in wireless telecommunication networks from the speech quality point of view is presented in this paper. Conclusion implies the next application of the method of improved detection of critical conditions for the purpose of algorithms for link adaptation from the speech quality point of view in an environment of WLANs. The primary goal of these algorithms is improving speech quality in the VoWLAN connections, which are established in the competent link.


2012 ◽  
Vol 98 (3) ◽  
pp. 461-474 ◽  
Author(s):  
Oliver Jung

This study considers the influences of room acoustics and driving noises in vehicle interiors on the subjectively perceived acoustical quality of conversations between passengers. A listening test with 25 participants was performed inside a laboratory to assess the impact of different vehicle interior transfer functions on the speech quality assessment in four predetermined dimensions. Idealized driving noises at three different vehicle speeds were presented simultaneously with speech samples to quantify the interferences of these noise conditions with varied signal-to-noise ratios. To minimize the influence of different human speakers, four talkers (two male and two female) were selected from commercially available audio books. The respective speech samples were adjusted in level and long-term average speech spectrum to the common values of conversational speech. The automatic reflex of raising one's voice in noisy environments, called “Lombard Effect” [1], was taken into account for an additional adjustment of speech levels while driving noises were present. A strong relationship between the speech-to-noise ratio and the test participants' evaluations was found. Thus, one can assume that the speech signals' attenuation or amplification caused by the different room acoustics of the tested vehicles play a more important role for a sufficient speech quality than the varied speech timbre or other parameters. Only at very high speech-to-noise ratios ( ≥ 20 dB with A-weighting), room-acoustical parameters such as IACC or the reverberation time are more determining for the speech quality appreciation than the speech's sound pressure level.


2021 ◽  
Vol 14 (4) ◽  
pp. 1-35
Author(s):  
Linda Kozma-Spytek ◽  
Christian Vogler

This paper describes four studies with a total of 114 individuals with hearing loss and 12 hearing controls that investigate the impact of audio quality parameters on voice telecommunications. These studies were first informed by a survey of 439 individuals with hearing loss on their voice telecommunications experiences. While voice telephony was very important, with high usage of wireless mobile phones, respondents reported relatively low satisfaction with their hearing devices’ performance for telephone listening, noting that improved telephone audio quality was a significant need. The studies cover three categories of audio quality parameters: (1) narrowband (NB) versus wideband (WB) audio; (2) encoding audio at varying bit rates, from typical rates used in today's mobile networks to the highest quality supported by these audio codecs; and (3) absence of packet loss to worst-case packet loss in both mobile and VoIP networks. Additionally, NB versus WB audio was tested in auditory-only and audiovisual presentation modes and in quiet and noisy environments. With WB audio in a quiet environment, individuals with hearing loss exhibited better speech recognition, expended less perceived mental effort, and rated speech quality higher than with NB audio. WB audio provided a greater benefit when listening alone than when the visual channel also was available. The noisy environment significantly degraded performance for both presentation modes, but particularly for listening alone. Bit rate affected speech recognition for NB audio, and speech quality ratings for both NB and WB audio. Packet loss affected all of speech recognition, mental effort, and speech quality ratings. WB versus NB audio also affected hearing individuals, especially under packet loss. These results are discussed in terms of the practical steps they suggest for the implementation of telecommunications systems and related technical standards and policy considerations to improve the accessibility of voice telephony for people with hearing loss.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Yann Kowalczuk ◽  
Jan Holub

AbstractNew methods of securing the distribution of audio content have been widely deployed in the last twenty years. Their impact on perceptive quality has, however, only been seldomly the subject of recent extensive research. We review digital speech watermarking state of the art and provide subjective testing of watermarked speech samples. Latest speech watermarking techniques are listed, with their specifics and potential for further development. Their current and possible applications are evaluated. Open-source software designed to embed watermarking patterns in audio files is used to produce a set of samples that satisfies the requirements of modern speech-quality subjective assessments. The patchwork algorithm that is coded in the application is mainly considered in this analysis. Different watermark robustness levels are used, which allow determining the threshold of detection to human listeners. The subjective listening tests are conducted following ITU-T P.800 Recommendation, which precisely defines the conditions and requirements for subjective testing. Further analysis tries to determine the effects of noise and various disturbances on watermarked speech’s perceived quality. A threshold of intelligibility is estimated to allow further openings on speech compression techniques with watermarking. The impact of language or social background is evaluated through an additional experiment involving two groups of listeners. Results show significant robustness of the watermarking implementation, retaining both a reasonable net subjective audio quality and security attributes, despite mild levels of distortion and noise. Extended experiments with Chinese listeners open the door to formulate a hypothesis on perception variations with geographical and social backgrounds.


2011 ◽  
Vol 97 (5) ◽  
pp. 852-868 ◽  
Author(s):  
Peter Počta ◽  
Jan Holub

This paper investigates the impact of independent and dependent losses and coding on speech quality predictions provided by PESQ (also known as ITU-T P.862) and P.563 models, when both naturally-produced and synthesized speech are used. Two synthesized speech samples generated with two different Text-to-Speech systems and one naturally-produced sample are investigated. In addition, we assess the variability of PESQ's and P.563's predictions with respect to the type of speech used (naturally-produced or synthesized) and loss conditions as well as their accuracy, by comparing the predictions with subjective assessments. The results show that there is no difference between the impact of packet loss on naturally-produced speech and synthesized speech. On the other hand, the impact of coding is different for the two types of stimuli. In addition, synthesized speech seems to be insensitive to degradations provided by most of the codecs investigated here. The reasons for those findings are particularly discussed. Finally, it is concluded that both models are capable of predicting the quality of transmitted synthesized speech under the investigated conditions to a certain degree. As expected, PESQ achieves the best performance over almost all of the investigated conditions.


Sign in / Sign up

Export Citation Format

Share Document