scholarly journals SINTETINĖS ŠNEKOS KOKYBĖS VERTINIMAS: KELIŲ KOMPIUTERINIŲ SINTEZATORIŲ LYGINAMASIS TYRIMAS

Psichologija ◽  
2002 ◽  
Vol 25 ◽  
pp. 72-96 ◽  
Author(s):  
Albinas Bagdonas ◽  
Feliksas Laugalys

Straipsnyje pateikiami kelių versijų lietuviškos ir rusiškos sintetinės šnekos suprantamumo ir lietuviškos, rusiškos, vengriškos bei itališkos sintetinių šnekų patrauklumo duomenys. Lietuvių ir rusų diktorių kalba yra suprantamesnė nei atitinkama sintetinė. Ankstesnė rusiškos šnekos sintezė blogesnė nei lietuviška ar patobulinta rusiška sintezė (PRS). Pagal sintetinamų garsų charakteristikas aiškėja dvi priešingos PRS tendencijos - pagal bendrą atpažinimo klaidų mažėjimą ji artėja prie natūralios šnekos, tačiau pagal klaidų homogeniškumą nuo pastarosios tolsta. Kadangi pirmoji tendencija vyrauja, bendra atstojamoji rodo PRS gerėjimą.PRS suprantamumo ir patrauklumo koreliacija taip pat rodo jos didesnį artumą natūraliai šnekai. Tiriamiesiems PRS yra patrauklesnė nei ankstesnė rusiškos sintezės versija. Pastaroji, tiriamųjų nuomone, panašesne į roboto šneką, o PRS - į blogą, tačiau jau žmogaus šnekos versiją.Pagal patrauklumo duomenis natūralią šneką labiausiai vertina vengrų klausytojai, o kritiškiausi jos atžvilgiu yra italai. Visos tirtos sintetinių šnekų versijos vertinamos kaip mažiau patrauklios nei natūrali šneka, tačiau jas patobulinus šis vertinimas švelnėja. EVALUATION OF SYNTHETIC SPEECH QUALITY: A COMPARATIVE STUDY OF SEVERAL COMPUTER-BASED SPEECH SYNTHESIZERS Albinas Bagdonas, Feliksas Laugalys SummaryThis paper examines some versions of Lithuanian and Russian synthetic speech intelligibility and Lithuanian, Russian, Hungarian and Italian synthetic speech acceptability. The speech of both Russian and Lithuanian speaker is more intelligible than Russian or Lithuanian synthesis. Previous version of Russian synthesis is worse than Lithuanian and improved Russian synthesis (IRS). Study of characteristics of IRS sounds shows two opposite tendencies - according to the general quantity of mistake reduction this version is tending towards the natural speech, but according to the homogeneity of mistakes, it moves away. As the first tendency is clearly dominant, the general resultant in the new version shows a tend to improve. Correlation between intelligibility and acceptability of IRS deals possibility of small progress towards the natural speech. The IRS is more acceptable to subjects than previous version. The old synthesis is viewed as a rather decent instance of a robot's speech, while the IRS - as a poor variant of human speech. Acceptability studies showed natural speech more enjoyed by Hungarian listeners and more critical by Italian. All versions of synthetic speech were judged as less acceptable than natural but after improvement most of listeners changed their mind.

Author(s):  
Jefferson B. Hardee ◽  
Christopher B. Mayhorn

Synthetic speech is a technology that can be utilized to convey information and aid people in their tasks. Older adults in particular are a population that may be able to benefit from synthetic speech, and they are a population that has been investigated in a limited capacity. The current researchers intended to elucidate lingering conflicts in previous research on the intelligibility and recall of word and stories in synthetic speech for older and younger adults and how that compared to similar conditions in natural speech. Twenty-four older and 24 younger adults completed intelligibility and recall tasks with word lists and stories. Results indicated that older adults had a more difficult time with all speech, natural speech was easier to understand and remember than synthetic speech, and stories were easier to recall than words. Results also indicated that older adults had a more difficult time understanding synthetic words as compared to natural words than younger adults. In addition, older adults improved differentially with the recall of stories as opposed to words when compared to the younger adult group. Potential directions for synthetic speech software design and future research are discussed.


Sensors ◽  
2021 ◽  
Vol 21 (5) ◽  
pp. 1878
Author(s):  
Yi Zhou ◽  
Haiping Wang ◽  
Yijing Chu ◽  
Hongqing Liu

The use of multiple spatially distributed microphones allows performing spatial filtering along with conventional temporal filtering, which can better reject the interference signals, leading to an overall improvement of the speech quality. In this paper, we propose a novel dual-microphone generalized sidelobe canceller (GSC) algorithm assisted by a bone-conduction (BC) sensor for speech enhancement, which is named BC-assisted GSC (BCA-GSC) algorithm. The BC sensor is relatively insensitive to the ambient noise compared to the conventional air-conduction (AC) microphone. Hence, BC speech can be analyzed to generate very accurate voice activity detection (VAD), even in a high noise environment. The proposed algorithm incorporates the VAD information obtained by the BC speech into the adaptive blocking matrix (ABM) and adaptive noise canceller (ANC) in GSC. By using VAD to control ABM and combining VAD with signal-to-interference ratio (SIR) to control ANC, the proposed method could suppress interferences and improve the overall performance of GSC significantly. It is verified by experiments that the proposed GSC system not only improves speech quality remarkably but also boosts speech intelligibility.


2015 ◽  
Vol 120 (6) ◽  
pp. 670-678 ◽  
Author(s):  
Stephan Christian Möhlhenrich ◽  
Nicole Heussen ◽  
Mohammad Kamal ◽  
Ulrike Fritz ◽  
Frank Hölzle ◽  
...  

1995 ◽  
Vol 38 (3) ◽  
pp. 714-725 ◽  
Author(s):  
Jill E. Preminger ◽  
Dianne J. Van Tasell

The purpose of the present research was to examine the relation between speech quality and speech intelligibility. Speech quality measurements were made using continuous discourse and a category rating procedure for the following dimensions: intelligibility, pleasantness, loudness, effort, and total impression. Measurements were made using a group of listeners with normal hearing for a set of stimulus conditions in which intelligibility varied, and for a set of stimulus conditions in which intelligibility was held constant near 100%. When ratings were made for a set of stimulus conditions in which intelligibility was allowed to vary (a) intersubject reliability was high (i.e., different listeners interpreted the dimensions in a similar manner); and (b) the speech quality dimensions of intelligibility, effort, and loudness were indistinguishable. When ratings were made for a set of stimulus conditions in which intelligibility was held constant (a) intersubject reliability was reduced, indicating that different listeners interpreted the dimensions in different ways; (b) most listeners rated each dimension differently, indicating that the dimensions were unique; and (c) across listeners, no single dimension was highly correlated with total impression. These results can be used in order to examine the relation between speech quality and speech intelligibility.


1991 ◽  
Vol 19 (1) ◽  
pp. 139-146 ◽  
Author(s):  
Louis C.W. Pols ◽  
Renée van Bezooijen

Author(s):  
Mahbubur R. Syed ◽  
Shuvro Chakrobartty ◽  
Robert J. Bignall

Speech synthesis is the process of producing natural-sounding, highly intelligible synthetic speech simulated by a machine in such a way that it sounds as if it was produced by a human vocal system. A text-to-speech (TTS) synthesis system is a computer-based system where the input is text and the output is a simulated vocalization of that text. Before the 1970s, most speech synthesis was achieved with hardware, but this was costly and it proved impossible to properly simulate natural speech production. Since the 1970s, the use of computers has made the practical application of speech synthesis more feasible.


Sign in / Sign up

Export Citation Format

Share Document