Korean text-to-speech system using a formant synthesis method.

Seung-Kwon Ahn; Koeng-Mo Sung

doi:10.1250/ast.13.151

An end-to-end synthesis method for Korean text-to-speech systems

Phonetics and Speech Sciences ◽

10.13064/ksss.2018.10.1.039 ◽

2018 ◽

Vol 10 (1) ◽

pp. 39-48 ◽

Cited By ~ 1

Author(s):

Yeunju Choi ◽

Youngmoon Jung ◽

Younggwan Kim ◽

Youngjoo Suh ◽

Hoirin Kim

Keyword(s):

Korean Text ◽

Synthesis Method ◽

Text To Speech ◽

End To End

Download Full-text

A prosodic phrasing model for a Korean text-to-speech synthesis system

10.21437/interspeech.2004-463 ◽

2004 ◽

Author(s):

Kyuchul Yoon

Keyword(s):

Korean Text ◽

Speech Synthesis ◽

Text To Speech ◽

Synthesis System ◽

Prosodic Phrasing ◽

Text To Speech Synthesis

Download Full-text

Building a natural sounding Text-To-Speech system for the Nepali language: research and development challenges and solutions

Gipan ◽

10.3126/gipan.v4i0.35461 ◽

2019 ◽

Vol 4 ◽

pp. 106-116

Author(s):

Roop Shree Ratna Bajracharya ◽

Santosh Regmi ◽

Bal Krishna Bal ◽

Balaram Prasain

Keyword(s):

Research And Development ◽

Visually Impaired ◽

Speech Synthesis ◽

Selection Process ◽

Synthesis Method ◽

Text To Speech ◽

Synthesis System ◽

Unit Selection ◽

Language Research ◽

Shed Light

Text-to-Speech (TTS) synthesis has come far from its primitive synthetic monotone voices to more natural and intelligible sounding voices. One of the direct applications of a natural sounding TTS systems is the screen reader applications for the visually impaired and the blind community. The Festival Speech Synthesis System uses a concatenative speech synthesis method together with the unit selection process to generate a natural sounding voice. This work primarily gives an account of the efforts put towards developing a Natural sounding TTS system for Nepali using the Festival system. We also shed light on the issues faced and the solutions derived which can be quite overlapping across other similar under-resourced languages in the region.

Download Full-text

A prosodic phrasing model for a Korean text-to-speech synthesis system

Computer Speech & Language ◽

10.1016/j.csl.2005.01.001 ◽

2006 ◽

Vol 20 (1) ◽

pp. 69-79 ◽

Cited By ~ 11

Author(s):

Kyuchul Yoon

Keyword(s):

Korean Text ◽

Speech Synthesis ◽

Text To Speech ◽

Synthesis System ◽

Prosodic Phrasing ◽

Text To Speech Synthesis

Download Full-text

Combining concatenation and formant synthesis for improved intelligibility and naturalness in text-to-speech systems

International Journal of Speech Technology ◽

10.1007/bf02277191 ◽

1997 ◽

Vol 1 (2) ◽

pp. 103-107

Author(s):

Steve Pearson ◽

Frode Holm ◽

Kazue Hata

Keyword(s):

Text To Speech ◽

Formant Synthesis

Download Full-text

A text analyzer for Korean text-to-speech systems

Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96 ◽

10.1109/icslp.1996.607952 ◽

2002 ◽

Cited By ~ 1

Author(s):

Sangho lee ◽

Yung-Hwan Oh

Keyword(s):

Korean Text ◽

Text To Speech

Download Full-text

Text-to-Speech Synthesis

The Oxford Handbook of Computational Linguistics 2nd edition ◽

10.1093/oxfordhb/9780199573691.013.38 ◽

2018 ◽

Author(s):

Thierry Dutoit ◽

Yannis Stylianou

Keyword(s):

Speech Synthesis ◽

Markov Models ◽

Text To Speech ◽

Functional Perspective ◽

Formant Synthesis ◽

Engineering Costs ◽

Text To Speech Synthesis ◽

Major Shift ◽

Learning Architectures ◽

Real Challenge

Text-to-speech (TTS) synthesis is the art of designing talking machines. Seen from this functional perspective, the task looks simple, but this chapter shows that delivering intelligible, natural-sounding, and expressive speech, while also taking into account engineering costs, is a real challenge. Speech synthesis has made a long journey from the big controversy in the 1980s, between MIT’s formant synthesis and Bell Labs’ diphone-based concatenative synthesis. While unit selection technology, which appeared in the mid-1990s, can be seen as an extension of diphone-based approaches, the appearance of Hidden Markov Models (HMM) synthesis around 2005 resulted in a major shift back to models. More recently, the statistical approaches, supported by advanced deep learning architectures, have been shown to advance text analysis and normalization as well as the generation of the waveforms. Important recent milestones have been Google’s Wavenet (September 2016) and the sequence-to-sequence models referred to as Tacotron (I and II).

Download Full-text

Formant Speech Synthesis Based on Trainable Model

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.303-306.1334 ◽

2013 ◽

Vol 303-306 ◽

pp. 1334-1337

Author(s):

Zhi Ping Zhang ◽

Xi Hong Wu

Keyword(s):

Speech Synthesis ◽

Synthesis Method ◽

Experimental Results ◽

Trajectory Model ◽

Formant Synthesis ◽

Speech Data ◽

Model Training

The authors proposed a trainable formant synthesis method based on the multi-channel Hidden Trajectory Model (HTM). In the method, the phonetic targets, formant trajectories and spectrum states from the oral, nasal, voiceless and background channels were designed to construct hierarchical hidden layers, and then spectrum were generated as observable features. In model training, the phonemic targets were learned from one-hour training speech data and the boundaries of phonemes were also aligned. The experimental results showed that the speech could be reconstructed with the formant trainable model by a source-filter synthesizer.

Download Full-text

Modern speech synthesis for phonetic sciences: a discussion and an evaluation

10.31234/osf.io/dxvhc ◽

2020 ◽

Author(s):

Zofia Malisz ◽

Gustav Eje Henter ◽

Cassia Valentini-Botinhao ◽

Oliver Watts ◽

Jonas Beskow ◽

...

Keyword(s):

Speech Synthesis ◽

State Of The Art ◽

Reaction Times ◽

Natural Speech ◽

Decision Task ◽

Synthesis Reaction ◽

Text To Speech ◽

Rule Based ◽

Quantum Leap ◽

Formant Synthesis

Decades of gradual advances in speech synthesis have recently culminated in exponential improvements fuelled by deep learning. This quantum leap has the potential to finally deliver realistic, controllable, and robust synthetic stimuli for speech experiments. In this article, we discuss these and other implications for phonetic sciences. We substantiate our argument by evaluating classic rule-based formant synthesis against state-of-the-art synthesisers on a) subjective naturalness ratings and b) a behavioural measure (reaction times in a lexical decision task). We also differentiate between text-to-speech and speech-to-speech methods. Naturalness ratings indicate that all modern systems are substantially closer to natural speech than formant synthesis. Reaction times for several modern systems do not differ substantially from natural speech, meaning that the processing gap observed in older systems, and reproduced with our formant synthesiser, is no longer evident. Importantly, some speech-to-speech methods are nearly indistinguishable from natural speech on both measures.

Download Full-text

The Problems and Improvement of Loanwords Pronunciation in Korean Text-to-speech(TTS) System

The Journal of Language & Literature ◽

10.15565/jll.2020.09.83.35 ◽

2020 ◽

Vol 83 ◽

pp. 35-62

Author(s):

Hyeonyeol Im

Keyword(s):

Korean Text ◽

Text To Speech

Download Full-text