scholarly journals Speech Identification and Comprehension in the Urban Soundscape

Environments ◽  
2018 ◽  
Vol 5 (5) ◽  
pp. 56 ◽  
Author(s):  
Letizia Marchegiani ◽  
Xenofon Fafoutis ◽  
Sahar Abbaspour

Urban environments are characterised by the presence of copious and unstructured noise. This noise continuously challenges speech intelligibility both in normal-hearing and hearing-impaired individuals. In this paper, we investigate the impact of urban noise, such as traffic, on speech identification and, more generally, speech understanding. With this purpose, we perform listening experiments to evaluate the ability of individuals with normal hearing to detect words and interpret conversational speech in the presence of urban noise (e.g., street drilling, traffic jams). Our experiments confirm previous findings in different acoustic environments and demonstrate that speech identification is influenced by the similarity between the target speech and the masking noise also in urban scenarios. More specifically, we propose the use of the structural similarity index to quantify this similarity. Our analysis confirms that speech identification is more successful in presence of noise with tempo-spectral characteristics different from speech. Moreover, our results show that speech comprehension is not as challenging as word identification in urban sound environments that are characterised by the presence of severe noise. Indeed, our experiments demonstrate that speech comprehension can be fairly successful even in acoustic scenes where the ability to identify speech is highly reduced.

Electronics ◽  
2021 ◽  
Vol 10 (22) ◽  
pp. 2855
Author(s):  
Rabia Naseem ◽  
Faouzi Alaya Cheikh ◽  
Azeddine Beghdadi ◽  
Khan Muhammad ◽  
Muhammad Sajjad

Cross-modal medical imaging techniques are predominantly being used in the clinical suite. The ensemble learning methods using cross-modal medical imaging adds reliability to several medical image analysis tasks. Motivated by the performance of deep learning in several medical imaging tasks, a deep learning-based denoising method Cross-Modality Guided Denoising Network CMGDNet for removing Rician noise in T1-weighted (T1-w) Magnetic Resonance Images (MRI) is proposed in this paper. CMGDNet uses a guidance image, which is a cross-modal (T2-w) image of better perceptual quality to guide the model in denoising its noisy T1-w counterpart. This cross-modal combination allows the network to exploit complementary information existing in both images and therefore improve the learning capability of the model. The proposed framework consists of two components: Paired Hierarchical Learning (PHL) module and Cross-Modal Assisted Reconstruction (CMAR) module. PHL module uses Siamese network to extract hierarchical features from dual images, which are then combined in a densely connected manner in the CMAR module to finally reconstruct the image. The impact of using registered guidance data is investigated in removing noise as well as retaining structural similarity with the original image. Several experiments were conducted on two publicly available brain imaging datasets available on the IXI database. The quantitative assessment using Peak Signal to noise ratio (PSNR), Structural Similarity Index (SSIM), and Feature Similarity Index (FSIM) demonstrates that the proposed method exhibits 4.7% and 2.3% gain (average), respectively, in SSIM and FSIM values compared to other state-of-the-art denoising methods that do not integrate cross-modal image information in removing various levels of noise.


2020 ◽  
pp. 1351010X2092358
Author(s):  
Zakariyya Uzeyirli ◽  
Aslı Özçevik Bilen

The inclusive education method has substantial contributions to hearing-impaired individuals’ education and socialization. However, the poor physical environment and acoustic comfort conditions negatively affect speech intelligibility at such places and therefore, the quality of education. Upon determining that there are very few subjective evaluation studies, we conducted a study regarding the impact of acoustic comfort conditions on speech intelligibility at inclusive education places. Within the scope of the study, first, a classroom was determined, and the current acoustic conditions of the class were evaluated objectively by field acoustic measurements. A calibrated model was created in the simulation software of the relevant class and then two more models with optimum reverberation time values of 0.4 s and 0.8 s as suggested in the literature, and auralizations were performed for the models. For subjective evaluation, a subject group of hearing-impaired and normal hearing individuals fulfilling equal conditions were tested by speech discrimination test in real-time in the classroom and from auralization recordings in a laboratory setting. Regarding the results obtained, it was observed that speech intelligibility percentage of normal hearing individuals increased as expected while in hearing-impaired individuals, contrary to the expectations, percentage differed from one another, and there was no increase. Following the discussions with experts, it was concluded that different hearing aids used by hearing-impaired individuals might lead to this situation. Accordingly, it occurs that the possibility to achieve a good speech intelligibility for hearing-impaired individuals even if optimum acoustic values suggested are fulfilled in education places remains unclear.


2005 ◽  
Vol 48 (1) ◽  
pp. 204-223 ◽  
Author(s):  
Miranda Cleary ◽  
David B. Pisoni ◽  
Karen Iler Kirk

The perception of voice similarity was examined in 5-year-old children with normal hearing sensitivity and in pediatric cochlear implant users, 5–12 years of age. Recorded sentences were manipulated to form a continuum of similar-sounding voices. An adaptive procedure was then used to determine how acoustically different, in terms of average fundamental and formant frequencies, 2 sentences needed to be for a child to categorize the sentences as spoken by 2 different talkers. The average spectral characteristics of 2 utterances (including their fundamental frequencies) needed to differ by at least 11%–16% (2–2.5 semitones) for normal-hearing children to perceive the voices as belonging to different talkers. Introducing differences in the linguistic content of the 2 sentences to be compared did not change performance. Although several children with cochlear implants performed similarly to normal-hearing children, most found the task very difficult. Pediatric cochlear implant users who scored above the group mean of 64% of words correct on a monosyllabic open-set word identification task categorized the voices more like children with normal hearing sensitivity.


2020 ◽  
Vol 63 (4) ◽  
pp. 1299-1311 ◽  
Author(s):  
Timothy Beechey ◽  
Jörg M. Buchholz ◽  
Gitte Keidser

Objectives This study investigates the hypothesis that hearing aid amplification reduces effort within conversation for both hearing aid wearers and their communication partners. Levels of effort, in the form of speech production modifications, required to maintain successful spoken communication in a range of acoustic environments are compared to earlier reported results measured in unaided conversation conditions. Design Fifteen young adult normal-hearing participants and 15 older adult hearing-impaired participants were tested in pairs. Each pair consisted of one young normal-hearing participant and one older hearing-impaired participant. Hearing-impaired participants received directional hearing aid amplification, according to their audiogram, via a master hearing aid with gain provided according to the NAL-NL2 fitting formula. Pairs of participants were required to take part in naturalistic conversations through the use of a referential communication task. Each pair took part in five conversations, each of 5-min duration. During each conversation, participants were exposed to one of five different realistic acoustic environments presented through highly open headphones. The ordering of acoustic environments across experimental blocks was pseudorandomized. Resulting recordings of conversational speech were analyzed to determine the magnitude of speech modifications, in terms of vocal level and spectrum, produced by normal-hearing talkers as a function of both acoustic environment and the degree of high-frequency average hearing impairment of their conversation partner. Results The magnitude of spectral modifications of speech produced by normal-hearing talkers during conversations with aided hearing-impaired interlocutors was smaller than the speech modifications observed during conversations between the same pairs of participants in the absence of hearing aid amplification. Conclusions The provision of hearing aid amplification reduces the effort required to maintain communication in adverse conditions. This reduction in effort provides benefit to hearing-impaired individuals and also to the conversation partners of hearing-impaired individuals. By considering the impact of amplification on both sides of dyadic conversations, this approach contributes to an increased understanding of the likely impact of hearing impairment on everyday communication.


2008 ◽  
Vol 18 (1) ◽  
pp. 31-40 ◽  
Author(s):  
David J. Zajac

Abstract The purpose of this opinion article is to review the impact of the principles and technology of speech science on clinical practice in the area of craniofacial disorders. Current practice relative to (a) speech aerodynamic assessment, (b) computer-assisted single-word speech intelligibility testing, and (c) behavioral management of hypernasal resonance are reviewed. Future directions and/or refinement of each area are also identified. It is suggested that both challenging and rewarding times are in store for clinical researchers in craniofacial disorders.


1994 ◽  
Vol 110 (1) ◽  
pp. 75-83 ◽  
Author(s):  
C SPEAKS ◽  
T TRINE ◽  
T CRAIN ◽  
N NICCUM

Author(s):  
Seong Hee Lee ◽  
Hyun Joon Shim ◽  
Sang Won Yoon ◽  
Kyoung Won Lee

2020 ◽  
Vol 25 (2) ◽  
pp. 86-97
Author(s):  
Sandy Suryo Prayogo ◽  
Tubagus Maulana Kusuma

DVB merupakan standar transmisi televisi digital yang paling banyak digunakan saat ini. Unsur terpenting dari suatu proses transmisi adalah kualitas gambar dari video yang diterima setelah melalui proses transimisi tersebut. Banyak faktor yang dapat mempengaruhi kualitas dari suatu gambar, salah satunya adalah struktur frame dari video. Pada tulisan ini dilakukan pengujian sensitifitas video MPEG-4 berdasarkan struktur frame pada transmisi DVB-T. Pengujian dilakukan menggunakan simulasi matlab dan simulink. Digunakan juga ffmpeg untuk menyediakan format dan pengaturan video akan disimulasikan. Variabel yang diubah dari video adalah bitrate dan juga group-of-pictures (GOP), sedangkan variabel yang diubah dari transmisi DVB-T adalah signal-to-noise-ratio (SNR) pada kanal AWGN di antara pengirim (Tx) dan penerima (Rx). Hasil yang diperoleh dari percobaan berupa kualitas rata-rata gambar pada video yang diukur menggunakan metode pengukuran structural-similarity-index (SSIM). Dilakukan juga pengukuran terhadap jumlah bit-error-rate BER pada bitstream DVB-T. Percobaan yang dilakukan dapat menunjukkan seberapa besar sensitifitas bitrate dan GOP dari video pada transmisi DVB-T dengan kesimpulan semakin besar bitrate maka akan semakin buruk nilai kualitas gambarnya, dan semakin kecil nilai GOP maka akan semakin baik nilai kualitasnya. Penilitian diharapkan dapat dikembangkan menggunakan deep learning untuk memperoleh frame struktur yang tepat di kondisi-kondisi tertentu dalam proses transmisi televisi digital.


2010 ◽  
Vol 10 ◽  
pp. 329-339 ◽  
Author(s):  
Torsten Rahne ◽  
Michael Ziese ◽  
Dorothea Rostalski ◽  
Roland Mühler

This paper describes a logatome discrimination test for the assessment of speech perception in cochlear implant users (CI users), based on a multilingual speech database, the Oldenburg Logatome Corpus, which was originally recorded for the comparison of human and automated speech recognition. The logatome discrimination task is based on the presentation of 100 logatome pairs (i.e., nonsense syllables) with balanced representations of alternating “vowel-replacement” and “consonant-replacement” paradigms in order to assess phoneme confusions. Thirteen adult normal hearing listeners and eight adult CI users, including both good and poor performers, were included in the study and completed the test after their speech intelligibility abilities were evaluated with an established sentence test in noise. Furthermore, the discrimination abilities were measured electrophysiologically by recording the mismatch negativity (MMN) as a component of auditory event-related potentials. The results show a clear MMN response only for normal hearing listeners and CI users with good performance, correlating with their logatome discrimination abilities. Higher discrimination scores for vowel-replacement paradigms than for the consonant-replacement paradigms were found. We conclude that the logatome discrimination test is well suited to monitor the speech perception skills of CI users. Due to the large number of available spoken logatome items, the Oldenburg Logatome Corpus appears to provide a useful and powerful basis for further development of speech perception tests for CI users.


Sign in / Sign up

Export Citation Format

Share Document