Synthesized speech quality of Indonesian natural text-to-speech by using HTS and CLUSTERGEN

Predicting the Quality of Synthesized and Natural Speech Impaired by Packet Loss and Coding Using PESQ and P.563 Models

Acta Acustica united with Acustica ◽

10.3813/aaa.918465 ◽

2011 ◽

Vol 97 (5) ◽

pp. 852-868 ◽

Cited By ~ 7

Author(s):

Peter Počta ◽

Jan Holub

Keyword(s):

Packet Loss ◽

Speech Quality ◽

The Other ◽

Natural Speech ◽

Text To Speech ◽

Synthesized Speech ◽

Subjective Assessments ◽

Almost All ◽

The Impact

This paper investigates the impact of independent and dependent losses and coding on speech quality predictions provided by PESQ (also known as ITU-T P.862) and P.563 models, when both naturally-produced and synthesized speech are used. Two synthesized speech samples generated with two different Text-to-Speech systems and one naturally-produced sample are investigated. In addition, we assess the variability of PESQ's and P.563's predictions with respect to the type of speech used (naturally-produced or synthesized) and loss conditions as well as their accuracy, by comparing the predictions with subjective assessments. The results show that there is no difference between the impact of packet loss on naturally-produced speech and synthesized speech. On the other hand, the impact of coding is different for the two types of stimuli. In addition, synthesized speech seems to be insensitive to degradations provided by most of the codecs investigated here. The reasons for those findings are particularly discussed. Finally, it is concluded that both models are capable of predicting the quality of transmitted synthesized speech under the investigated conditions to a certain degree. As expected, PESQ achieves the best performance over almost all of the investigated conditions.

Download Full-text

Order-Variable Multi-Pulses Linear Prediction Speech Coding

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.44-47.3672 ◽

2010 ◽

Vol 44-47 ◽

pp. 3672-3676

Author(s):

Jian Lei Li ◽

Zhen Ma ◽

Ming Zhao Wu

Keyword(s):

Speech Coding ◽

Linear Prediction ◽

Speech Quality ◽

Variable Model ◽

Synthesized Speech ◽

Order Variable

On the base of all-poles model, this paper provides order-variable all-poles model according to instability of track complexity and applies this model in Multi-pulses linear prediction speech coding. This method is simulated in Matlab and quality of synthesized speech is evaluated, order-variable model is founded to keep better speech quality on the base of decreasing coding rates.

Download Full-text

Comparative evaluation of the speech quality of speech coders and text-to-speech synthesizers

10.1109/icassp.1986.1168979 ◽

2005 ◽

Author(s):

L. Pols ◽

G. Boxelaar

Keyword(s):

Comparative Evaluation ◽

Speech Quality ◽

Text To Speech

Download Full-text

Latent factor analysis for synthesized speech quality-of-experience assessment

Quality and User Experience ◽

10.1007/s41233-017-0005-6 ◽

2017 ◽

Vol 2 (1) ◽

Cited By ~ 1

Author(s):

Rishabh Gupta ◽

Tiago H. Falk

Keyword(s):

Factor Analysis ◽

Quality Of Experience ◽

Speech Quality ◽

Latent Factor ◽

Synthesized Speech

Download Full-text

Di-Diphone Arabic Speech Synthesis Concatenation

INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY ◽

10.24297/ijct.v3i2a.2810 ◽

2012 ◽

Vol 3 (2) ◽

pp. 218-222 ◽

Cited By ~ 1

Author(s):

Abdelkader Chabchoub ◽

Salah Alahmadi ◽

Adnan Cherif ◽

Wahid Barkouti

Keyword(s):

Speech Synthesis ◽

Arabic Text ◽

Text To Speech ◽

Synthesis System ◽

Voice Source ◽

Synthesized Speech ◽

Different Types ◽

The Voice

This work describes the new Arabic Text-to-speech (TTS) synthesis system. This system based on di-Diphone concatenation with TD-PSOLA modifier synthesizer. The quality of a synthesized speech is improved by analyzing the spectrum features of voice source in various F0 ranges and timbres in detail and new unites concatenation. It generates speech synthesis based on analysis and estimation of formant by classifying the voice source into different types. The developed model enhances the quality of the naturalness, and the intelligibility of speech synthesis in various speaking environment.

Download Full-text

Speech Quality of Computer‐Simulated Voice‐Switched Amplifiers

The Journal of the Acoustical Society of America ◽

10.1121/1.1982317 ◽

1973 ◽

Vol 53 (1) ◽

pp. 322-322

Author(s):

Herman R. Silbiger ◽

Richard E. Cullingford ◽

Linda Pierce

Keyword(s):

Speech Quality

Download Full-text

Predicting the quality of text-to-speech systems from a large-scale feature set

10.21437/interspeech.2013-105 ◽

2013 ◽

Author(s):

Florian Hinterleitner ◽

Christoph R. Norrenbrock ◽

Sebastian Möller ◽

Ulrich Heute

Keyword(s):

Large Scale ◽

Text To Speech ◽

Scale Feature

Download Full-text

Interaction of Speech Coders and Atypical Speech II

Journal of Speech Language and Hearing Research ◽

10.1044/1092-4388(2002/055) ◽

2002 ◽

Vol 45 (4) ◽

pp. 689-699 ◽

Cited By ~ 2

Author(s):

Donald G. Jamieson ◽

Vijay Parsa ◽

Moneca C. Price ◽

James Till

Keyword(s):

Communication Systems ◽

Speech Rate ◽

Voice Quality ◽

Voice Disorders ◽

Normal Hearing ◽

Speech Quality ◽

Degraded Speech ◽

Before And After ◽

Subband Processing

We investigated how standard speech coders, currently used in modern communication systems, affect the quality of the speech of persons who have common speech and voice disorders. Three standardized speech coders (GSM 6.10 RPELTP, FS1016 CELP, and FS1015 LPC) and two speech coders based on subband processing were evaluated for their performance. Coder effects were assessed by measuring the quality of speech samples both before and after processing by the speech coders. Speech quality was rated by 10 listeners with normal hearing on 28 different scales representing pitch and loudness changes, speech rate, laryngeal and resonatory dysfunction, and coder-induced distortions. Results showed that (a) nine scale items were consistently and reliably rated by the listeners; (b) all coders degraded speech quality on these nine scales, with the GSM and CELP coders providing the better quality speech; and (c) interactions between coders and individual voices did occur on several voice quality scales.

Download Full-text

Comparing a magnitude estimation technique and a pair comparisons technique for use in assessing quality of text‐to‐speech synthesis systems

The Journal of the Acoustical Society of America ◽

10.1121/1.2026707 ◽

1989 ◽

Vol 85 (S1) ◽

pp. S125-S125

Author(s):

Chaslav Pavlovic ◽

Christel Sorin ◽

Jean Pierre Roumiguiere ◽

Jean Pierre Lucas

Keyword(s):

Magnitude Estimation ◽

Speech Synthesis ◽

Estimation Technique ◽

Text To Speech ◽

Pair Comparisons ◽

Text To Speech Synthesis

Download Full-text

Improving speech quality of CELP coder

Electronics Letters ◽

10.1049/el:19890854 ◽

1989 ◽

Vol 25 (19) ◽

pp. 1275 ◽

Cited By ~ 1

Author(s):

J.I. Lee ◽

C.K. Un

Keyword(s):

Speech Quality

Download Full-text