Acoustic model training based on linear transformation and MAP modification for HSMM-based speech synthesis

Wasserstein GAN and Waveform Loss-Based Acoustic Model Training for Multi-Speaker Text-to-Speech Synthesis Systems Using a WaveNet Vocoder

IEEE Access ◽

10.1109/access.2018.2872060 ◽

2018 ◽

Vol 6 ◽

pp. 60478-60488 ◽

Cited By ~ 17

Author(s):

Yi Zhao ◽

Shinji Takaki ◽

Hieu-Thi Luong ◽

Junichi Yamagishi ◽

Daisuke Saito ◽

...

Keyword(s):

Speech Synthesis ◽

Acoustic Model ◽

Text To Speech ◽

Text To Speech Synthesis ◽

Model Training

Download Full-text

Tree-based context clustering using speech recognition features for acoustic model training of speech synthesis

2015 12th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON) ◽

10.1109/ecticon.2015.7207094 ◽

2015 ◽

Author(s):

Supadaech Chanjaradwichai ◽

Atiwong Suchato ◽

Proadpran Punyabukkana

Keyword(s):

Speech Recognition ◽

Speech Synthesis ◽

Acoustic Model ◽

Model Training

Download Full-text

Lithuanian Broadcast Speech Transcription Using Semi-supervised Acoustic Model Training

Procedia Computer Science ◽

10.1016/j.procs.2016.04.037 ◽

2016 ◽

Vol 81 ◽

pp. 107-113 ◽

Cited By ~ 6

Author(s):

Rasa Lileikytė ◽

Arseniy Gorin ◽

Lori Lamel ◽

Jean-Luc Gauvain ◽

Thiago Fraga-Silva

Keyword(s):

Acoustic Model ◽

Model Training ◽

Speech Transcription

Download Full-text

Investigating lightly supervised acoustic model training

2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221) ◽

10.1109/icassp.2001.940871 ◽

2002 ◽

Cited By ~ 15

Author(s):

L. Lamel ◽

J.L. Gauvain ◽

G. Adda

Keyword(s):

Acoustic Model ◽

Model Training

Download Full-text

Language diarization for semi-supervised bilingual acoustic model training

2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) ◽

10.1109/asru.2017.8268921 ◽

2017 ◽

Cited By ~ 1

Author(s):

Emre Yilmaz ◽

Mitchell McLaren ◽

Henk van den Heuvel ◽

David A. van Leeuwen

Keyword(s):

Acoustic Model ◽

Model Training

Download Full-text

Acoustic model training with detecting transcription errors in the training data

10.21437/interspeech.2011-183 ◽

2011 ◽

Author(s):

Gakuto Kurata ◽

Nobuyasu Itoh ◽

Masafumi Nishimura

Keyword(s):

Training Data ◽

Acoustic Model ◽

Model Training

Download Full-text

Robust Acoustic Model Training Against Phoneme Variations for Large Vocabulary Continuous Speech Recognition

Signal and Image Processing ◽

10.2316/p.2012.759-070 ◽

2012 ◽

Author(s):

Gil Ho Lee ◽

Nam Soo Kim

Keyword(s):

Speech Recognition ◽

Acoustic Model ◽

Continuous Speech ◽

Continuous Speech Recognition ◽

Large Vocabulary ◽

Model Training

Download Full-text

MAKEDONKA: Applied Deep Learning Model for Text-to-Speech Synthesis in Macedonian Language

Applied Sciences ◽

10.3390/app10196882 ◽

2020 ◽

Vol 10 (19) ◽

pp. 6882

Author(s):

Kostadin Mishev ◽

Aleksandra Karovska Ristovska ◽

Dimitar Trajanov ◽

Tome Eftimov ◽

Monika Simjanoska

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Speech Synthesis ◽

Feature Engineering ◽

Learning Approach ◽

Acoustic Model ◽

Text To Speech ◽

Text To Speech Synthesis ◽

Smooth Transitions ◽

Deep Learning Model

This paper presents MAKEDONKA, the first open-source Macedonian language synthesizer that is based on the Deep Learning approach. The paper provides an overview of the numerous attempts to achieve a human-like reproducible speech, which has unfortunately shown to be unsuccessful due to the work invisibility and lack of integration examples with real software tools. The recent advances in Machine Learning, the Deep Learning-based methodologies, provide novel methods for feature engineering that allow for smooth transitions in the synthesized speech, making it sound natural and human-like. This paper presents a methodology for end-to-end speech synthesis that is based on a fully-convolutional sequence-to-sequence acoustic model with a position-augmented attention mechanism—Deep Voice 3. Our model directly synthesizes Macedonian speech from characters. We created a dataset that contains approximately 20 h of speech from a native Macedonian female speaker, and we use it to train the text-to-speech (TTS) model. The achieved MOS score of 3.93 makes our model appropriate for application in any kind of software that needs text-to-speech service in the Macedonian language. Our TTS platform is publicly available for use and ready for integration.

Download Full-text