Acoustic model training based on linear transformation and MAP modification for HSMM-based speech synthesis

Author(s):  
Katsumi Ogata ◽  
Makoto Tachibana ◽  
Junichi Yamagishi ◽  
Takao Kobayashi
IEEE Access ◽  
2018 ◽  
Vol 6 ◽  
pp. 60478-60488 ◽  
Author(s):  
Yi Zhao ◽  
Shinji Takaki ◽  
Hieu-Thi Luong ◽  
Junichi Yamagishi ◽  
Daisuke Saito ◽  
...  

2016 ◽  
Vol 81 ◽  
pp. 107-113 ◽  
Author(s):  
Rasa Lileikytė ◽  
Arseniy Gorin ◽  
Lori Lamel ◽  
Jean-Luc Gauvain ◽  
Thiago Fraga-Silva

2020 ◽  
Vol 10 (19) ◽  
pp. 6882
Author(s):  
Kostadin Mishev ◽  
Aleksandra Karovska Ristovska ◽  
Dimitar Trajanov ◽  
Tome Eftimov ◽  
Monika Simjanoska

This paper presents MAKEDONKA, the first open-source Macedonian language synthesizer that is based on the Deep Learning approach. The paper provides an overview of the numerous attempts to achieve a human-like reproducible speech, which has unfortunately shown to be unsuccessful due to the work invisibility and lack of integration examples with real software tools. The recent advances in Machine Learning, the Deep Learning-based methodologies, provide novel methods for feature engineering that allow for smooth transitions in the synthesized speech, making it sound natural and human-like. This paper presents a methodology for end-to-end speech synthesis that is based on a fully-convolutional sequence-to-sequence acoustic model with a position-augmented attention mechanism—Deep Voice 3. Our model directly synthesizes Macedonian speech from characters. We created a dataset that contains approximately 20 h of speech from a native Macedonian female speaker, and we use it to train the text-to-speech (TTS) model. The achieved MOS score of 3.93 makes our model appropriate for application in any kind of software that needs text-to-speech service in the Macedonian language. Our TTS platform is publicly available for use and ready for integration.


Sign in / Sign up

Export Citation Format

Share Document