Voice Quality Modelling for Expressive Speech Synthesis

The Scientific World JOURNAL ◽

10.1155/2014/627189 ◽

2014 ◽

Vol 2014 ◽

pp. 1-12 ◽

Cited By ~ 1

Author(s):

Carlos Monzo ◽

Ignasi Iriondo ◽

Joan Claudi Socoró

Keyword(s):

Speech Synthesis ◽

Voice Quality ◽

Speech Quality ◽

Noise Model ◽

Synthetic Speech ◽

Test Results ◽

Speech Corpus ◽

Expressive Speech ◽

Speech Styles

This paper presents the perceptual experiments that were carried out in order to validate the methodology of transforming expressive speech styles using voice quality (VoQ) parameters modelling, along with the well-known prosody (F0, duration, and energy), from a neutral style into a number of expressive ones. The main goal was to validate the usefulness of VoQ in the enhancement of expressive synthetic speech in terms of speech quality and style identification. A harmonic plus noise model (HNM) was used to modify VoQ and prosodic parameters that were extracted from an expressive speech corpus. Perception test results indicated the improvement of obtained expressive speech styles using VoQ modelling along with prosodic characteristics.

Download Full-text

Constructing a Phonetic-Rich Speech Corpus While Controlling Time-Dependent Voice Quality Variability for English Speech Synthesis

2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings ◽

10.1109/icassp.2006.1660162 ◽

2006 ◽

Author(s):

Jinfu Ni ◽

T. Hirai ◽

H. Kawai

Keyword(s):

Speech Synthesis ◽

Voice Quality ◽

Time Dependent ◽

Speech Corpus

Download Full-text

Method for Conversational Voice Quality Evaluation in Cellular Networks

Elektronika ir Elektrotechnika ◽

10.5755/j01.eee.109.3.182 ◽

1970 ◽

Vol 109 (3) ◽

pp. 105-108 ◽

Cited By ~ 3

Author(s):

A. Kajackas ◽

A. Anskaitis ◽

D. Gursnys

Keyword(s):

Cellular Networks ◽

Quality Evaluation ◽

Voice Quality ◽

Speech Quality ◽

Conversational Speech ◽

Test Results ◽

Average Correlation ◽

Mobile Stations ◽

Voice Activity ◽

Paper Method

In this paper, method for evaluation of varying conversational speech quality in wireless communications is proposed. The proposed algorithm evaluates quality degradations using indicators based on count of lost frames and voice activity indications. The correctness of proposed algorithm is investigated by comparison of test results with results obtained using PESQ algorithm under same conditions. The achieved average correlation coefficient is 0.975. This result is independent of frame loss model and percentage of silence in test sentences. Proposed algorithm can be implemented in mobile stations and used for speech quality evaluation by real conversation. Ill. 3, bibl. 13, tabl. 3 (in English; abstracts in English and Lithuanian).http://dx.doi.org/10.5755/j01.eee.109.3.182

Download Full-text

Study on time-dependent voice quality variation in a large-scale single speaker speech corpus used for speech synthesis

Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002. ◽

10.1109/wss.2002.1224362 ◽

2004 ◽

Cited By ~ 3

Author(s):

H. Kawai ◽

M. Tsuzaki

Keyword(s):

Speech Synthesis ◽

Large Scale ◽

Voice Quality ◽

Time Dependent ◽

Speech Corpus ◽

Single Speaker ◽

Quality Variation

Download Full-text

Voice and Speech Synthesis—Highlighting the Control of Prosody

The Oxford Handbook of Voice Perception ◽

10.1093/oxfordhb/9780198743187.013.35 ◽

2018 ◽

pp. 756-776

Author(s):

Keikichi Hirose

Keyword(s):

Speech Synthesis ◽

Process Model ◽

Voice Conversion ◽

Synthetic Speech ◽

Synthesis Process ◽

Generation Process ◽

Sound Generation ◽

Prosodic Features ◽

Speech Corpus ◽

Parametric Speech Synthesis

After starting as an effort to mimic the human process of speech sound generation, the quality of synthetic speech has reached a level that makes it difficult to notice that it is synthetic. This owes to the development of waveform concatenation methods which select the most appropriate speech segments from a huge speech corpus. Although the lack of flexibility in producing various speech qualities/styles has been pointed out, this problem is about to be solved by introducing statistical frameworks into parametric speech synthesis. Now, a speaker can even speak a foreign language in his/her voice using advanced voice-conversion techniques. However, if we consider prosodic features of speech, current technologies are not appropriate to handle their hierarchical structure over a long time span. Introduction of prosody modelling into the speech-synthesis process is necessary. In this chapter, after viewing the history of voice/speech synthesis, technologies are explained, starting from text-to-speech and concept-to-speech conversion. Then, methods of sound generation are introduced. Statistical parametric speech synthesis, especially HMM-based speech synthesis, is introduced as a technology that enables flexible speech synthesis—that is, synthetic speech with various qualities/styles requiring a smaller amount of speech corpus. After that, the problem of frame-by-frame processing for prosodic features is addressed and the importance of prosody modelling is pointed out. Prosodic (fundamental frequency) modelling is surveyed and, finally, the generation process model is introduced with some experimental results when applied to HMM-based speech synthesis.

Download Full-text

Rapid Style Adaptation Using Residual Error Embedding for Expressive Speech Synthesis

10.21437/interspeech.2018-1991 ◽

2018 ◽

Cited By ~ 2

Author(s):

Xixin Wu ◽

Yuewen Cao ◽

Mu Wang ◽

Songxiang Liu ◽

Shiyin Kang ◽

...

Keyword(s):

Speech Synthesis ◽

Residual Error ◽

Expressive Speech

Download Full-text

Expressive speech synthesis in MARY TTS using audiobook data and emotionML

10.21437/interspeech.2013-395 ◽

2013 ◽

Author(s):

Marcela Charfuelan ◽

Ingmar Steiner

Keyword(s):

Speech Synthesis ◽

Expressive Speech

Download Full-text

Feasibility of constructing an expressive speech corpus from television soap opera dialogue

10.21437/interspeech.2007-401 ◽

2007 ◽

Author(s):

Peter Rutten

Keyword(s):

Soap Opera ◽

Speech Corpus ◽

Expressive Speech

Download Full-text

Development and evaluation of Polish speech corpus for unit selection speech synthesis systems

10.21437/interspeech.2008-458 ◽

2008 ◽

Author(s):

Grazyna Demenko ◽

J. Bachan ◽

Bernd Möbius ◽

K. Klessa ◽

M. Szymański ◽

...

Keyword(s):

Speech Synthesis ◽

Speech Corpus ◽

Unit Selection

Download Full-text

Regression-Based Noise Modeling for Speech Signal Processing

Fluctuation and Noise Letters ◽

10.1142/s021947752150022x ◽

2021 ◽

pp. 2150022

Author(s):

Caio Cesar Enside de Abreu ◽

Marco Aparecido Queiroz Duarte ◽

Bruno Rodrigues de Oliveira ◽

Jozue Vieira Filho ◽

Francisco Villarreal

Keyword(s):

Speech Enhancement ◽

Speech Processing ◽

Acoustic Analysis ◽

Voice Quality ◽

Wiener Filter ◽

Processing System ◽

Speech Quality ◽

Speech Signals ◽

Speech Signal Processing ◽

Acoustic Environment

Speech processing systems are very important in different applications involving speech and voice quality such as automatic speech recognition, forensic phonetics and speech enhancement, among others. In most of them, the acoustic environmental noise is added to the original signal, decreasing the signal-to-noise ratio (SNR) and the speech quality by consequence. Therefore, estimating noise is one of the most important steps in speech processing whether to reduce it before processing or to design robust algorithms. In this paper, a new approach to estimate noise from speech signals is presented and its effectiveness is tested in the speech enhancement context. For this purpose, partial least squares (PLS) regression is used to model the acoustic environment (AE) and a Wiener filter based on a priori SNR estimation is implemented to evaluate the proposed approach. Six noise types are used to create seven acoustically modeled noises. The basic idea is to consider the AE model to identify the noise type and estimate its power to be used in a speech processing system. Speech signals processed using the proposed method and classical noise estimators are evaluated through objective measures. Results show that the proposed method produces better speech quality than state-of-the-art noise estimators, enabling it to be used in real-time applications in the field of robotic, telecommunications and acoustic analysis.

Download Full-text

Interaction of Speech Coders and Atypical Speech II

Journal of Speech Language and Hearing Research ◽

10.1044/1092-4388(2002/055) ◽

2002 ◽

Vol 45 (4) ◽

pp. 689-699 ◽

Cited By ~ 2

Author(s):

Donald G. Jamieson ◽

Vijay Parsa ◽

Moneca C. Price ◽

James Till

Keyword(s):

Communication Systems ◽

Speech Rate ◽

Voice Quality ◽

Voice Disorders ◽

Normal Hearing ◽

Speech Quality ◽

Degraded Speech ◽

Before And After ◽

Subband Processing

We investigated how standard speech coders, currently used in modern communication systems, affect the quality of the speech of persons who have common speech and voice disorders. Three standardized speech coders (GSM 6.10 RPELTP, FS1016 CELP, and FS1015 LPC) and two speech coders based on subband processing were evaluated for their performance. Coder effects were assessed by measuring the quality of speech samples both before and after processing by the speech coders. Speech quality was rated by 10 listeners with normal hearing on 28 different scales representing pitch and loudness changes, speech rate, laryngeal and resonatory dysfunction, and coder-induced distortions. Results showed that (a) nine scale items were consistently and reliably rated by the listeners; (b) all coders degraded speech quality on these nine scales, with the GSM and CELP coders providing the better quality speech; and (c) interactions between coders and individual voices did occur on several voice quality scales.

Download Full-text