scholarly journals Wasserstein GAN and Waveform Loss-Based Acoustic Model Training for Multi-Speaker Text-to-Speech Synthesis Systems Using a WaveNet Vocoder

IEEE Access ◽  
2018 ◽  
Vol 6 ◽  
pp. 60478-60488 ◽  
Author(s):  
Yi Zhao ◽  
Shinji Takaki ◽  
Hieu-Thi Luong ◽  
Junichi Yamagishi ◽  
Daisuke Saito ◽  
...  
2020 ◽  
Vol 10 (19) ◽  
pp. 6882
Author(s):  
Kostadin Mishev ◽  
Aleksandra Karovska Ristovska ◽  
Dimitar Trajanov ◽  
Tome Eftimov ◽  
Monika Simjanoska

This paper presents MAKEDONKA, the first open-source Macedonian language synthesizer that is based on the Deep Learning approach. The paper provides an overview of the numerous attempts to achieve a human-like reproducible speech, which has unfortunately shown to be unsuccessful due to the work invisibility and lack of integration examples with real software tools. The recent advances in Machine Learning, the Deep Learning-based methodologies, provide novel methods for feature engineering that allow for smooth transitions in the synthesized speech, making it sound natural and human-like. This paper presents a methodology for end-to-end speech synthesis that is based on a fully-convolutional sequence-to-sequence acoustic model with a position-augmented attention mechanism—Deep Voice 3. Our model directly synthesizes Macedonian speech from characters. We created a dataset that contains approximately 20 h of speech from a native Macedonian female speaker, and we use it to train the text-to-speech (TTS) model. The achieved MOS score of 3.93 makes our model appropriate for application in any kind of software that needs text-to-speech service in the Macedonian language. Our TTS platform is publicly available for use and ready for integration.


Author(s):  
Beiming Cao ◽  
Myungjong Kim ◽  
Jan van Santen ◽  
Ted Mau ◽  
Jun Wang

2019 ◽  
Author(s):  
Elshadai Tesfaye Biru ◽  
Yishak Tofik Mohammed ◽  
David Tofu ◽  
Erica Cooper ◽  
Julia Hirschberg

Sign in / Sign up

Export Citation Format

Share Document