Development and Evaluation of Speech Synthesis System Based on Deep Learning Models

Alakbar Valizada; Sevil Jafarova; Emin Sultanov; Samir Rustamov

doi:10.3390/sym13050819

Development and Evaluation of Speech Synthesis System Based on Deep Learning Models

Symmetry ◽

10.3390/sym13050819 ◽

2021 ◽

Vol 13 (5) ◽

pp. 819

Author(s):

Alakbar Valizada ◽

Sevil Jafarova ◽

Emin Sultanov ◽

Samir Rustamov

Keyword(s):

Deep Learning ◽

Speech Synthesis ◽

Online Survey ◽

Subjective Evaluation ◽

Learning Models ◽

Text To Speech ◽

Opinion Score ◽

News Website ◽

Text To Speech Synthesis ◽

The Mean

This study concentrates on the investigation, development, and evaluation of Text-to-Speech Synthesis systems based on Deep Learning models for the Azerbaijani Language. We have selected and compared state-of-the-art models-Tacotron and Deep Convolutional Text-to-Speech (DC TTS) systems to achieve the most optimal model. Both systems were trained on the 24 h speech dataset of the Azerbaijani language collected and processed from the news website. To analyze the quality and intelligibility of the speech signals produced by two systems, 34 listeners participated in an online survey containing subjective evaluation tests. The results of the study indicated that according to the Mean Opinion Score, Tacotron demonstrated better results for the In-Vocabulary words; however, DC TTS indicated a higher performance of the Out-Of-Vocabulary words synthesis.

Download Full-text

Robust deep-learning models for text-to-speech synthesis support on embedded devices

Proceedings of the 7th International Conference on Management of computational and collective intElligence in Digital EcoSystems - MEDES '15 ◽

10.1145/2857218.2857234 ◽

2015 ◽

Cited By ~ 1

Author(s):

Tiberiu Boroş ◽

Stefan Daniel Dumitrescu

Keyword(s):

Deep Learning ◽

Speech Synthesis ◽

Learning Models ◽

Text To Speech ◽

Embedded Devices ◽

Text To Speech Synthesis

Download Full-text

Integrating Articulatory Information in Deep Learning-Based Text-to-Speech Synthesis

10.21437/interspeech.2017-1762 ◽

2017 ◽

Cited By ~ 1

Author(s):

Beiming Cao ◽

Myungjong Kim ◽

Jan van Santen ◽

Ted Mau ◽

Jun Wang

Keyword(s):

Deep Learning ◽

Speech Synthesis ◽

Text To Speech ◽

Text To Speech Synthesis

Download Full-text

MAKEDONKA: Applied Deep Learning Model for Text-to-Speech Synthesis in Macedonian Language

Applied Sciences ◽

10.3390/app10196882 ◽

2020 ◽

Vol 10 (19) ◽

pp. 6882

Author(s):

Kostadin Mishev ◽

Aleksandra Karovska Ristovska ◽

Dimitar Trajanov ◽

Tome Eftimov ◽

Monika Simjanoska

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Speech Synthesis ◽

Feature Engineering ◽

Learning Approach ◽

Acoustic Model ◽

Text To Speech ◽

Text To Speech Synthesis ◽

Smooth Transitions ◽

Deep Learning Model

This paper presents MAKEDONKA, the first open-source Macedonian language synthesizer that is based on the Deep Learning approach. The paper provides an overview of the numerous attempts to achieve a human-like reproducible speech, which has unfortunately shown to be unsuccessful due to the work invisibility and lack of integration examples with real software tools. The recent advances in Machine Learning, the Deep Learning-based methodologies, provide novel methods for feature engineering that allow for smooth transitions in the synthesized speech, making it sound natural and human-like. This paper presents a methodology for end-to-end speech synthesis that is based on a fully-convolutional sequence-to-sequence acoustic model with a position-augmented attention mechanism—Deep Voice 3. Our model directly synthesizes Macedonian speech from characters. We created a dataset that contains approximately 20 h of speech from a native Macedonian female speaker, and we use it to train the text-to-speech (TTS) model. The achieved MOS score of 3.93 makes our model appropriate for application in any kind of software that needs text-to-speech service in the Macedonian language. Our TTS platform is publicly available for use and ready for integration.

Download Full-text

Deep Learning Techniques in Tandem with Signal Processing Cues for Phonetic Segmentation for Text to Speech Synthesis in Indian Languages

10.21437/interspeech.2017-666 ◽

2017 ◽

Cited By ~ 6

Author(s):

Arun Baby ◽

Jeena J. Prakash ◽

Rupak Vignesh ◽

Hema A. Murthy

Keyword(s):

Signal Processing ◽

Deep Learning ◽

Speech Synthesis ◽

Indian Languages ◽

Text To Speech ◽

Learning Techniques ◽

Text To Speech Synthesis

Download Full-text

Subset Selection, Adaptation, Gemination and Prosody Prediction for Amharic Text-to-Speech Synthesis

10.21437/ssw.2019-37 ◽

2019 ◽

Author(s):

Elshadai Tesfaye Biru ◽

Yishak Tofik Mohammed ◽

David Tofu ◽

Erica Cooper ◽

Julia Hirschberg

Keyword(s):

Speech Synthesis ◽

Subset Selection ◽

Text To Speech ◽

Text To Speech Synthesis ◽

Prosody Prediction

Download Full-text

“I Can’t Talk Now”: Speaking with Voice Output Communication Aid Using Text-to-Speech Synthesis During Multiparty Video Conference

Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems ◽

10.1145/3411763.3451745 ◽

2021 ◽

Author(s):

Wooseok Kim ◽

Sangsu Lee

Keyword(s):

Speech Synthesis ◽

Video Conference ◽

Text To Speech ◽

Voice Output Communication Aid ◽

Communication Aid ◽

Text To Speech Synthesis ◽

Voice Output

Download Full-text

Comparative Study on Neural Vocoders for Multispeaker Text-To-Speech Synthesis

2020 IEEE Recent Advances in Intelligent Computational Systems (RAICS) ◽

10.1109/raics51191.2020.9332514 ◽

2020 ◽

Author(s):

Rajeev Rajan ◽

Ashish Roopan ◽

Sachin Prakash ◽

Elisa Jose ◽

Sati P.

Keyword(s):

Comparative Study ◽

Speech Synthesis ◽

Text To Speech ◽

Text To Speech Synthesis

Download Full-text

Cotton Stand Counting from Unmanned Aerial System Imagery Using MobileNet and CenterNet Deep Learning Models

Remote Sensing ◽

10.3390/rs13142822 ◽

2021 ◽

Vol 13 (14) ◽

pp. 2822

Author(s):

Zhe Lin ◽

Wenxuan Guo

Keyword(s):

Deep Learning ◽

Cotton Plant ◽

Unmanned Aerial System ◽

Learning Models ◽

Training Images ◽

Testing Dataset ◽

Cotton Plants ◽

Detection And Counting ◽

Different Dimensions ◽

The Mean

An accurate stand count is a prerequisite to determining the emergence rate, assessing seedling vigor, and facilitating site-specific management for optimal crop production. Traditional manual counting methods in stand assessment are labor intensive and time consuming for large-scale breeding programs or production field operations. This study aimed to apply two deep learning models, the MobileNet and CenterNet, to detect and count cotton plants at the seedling stage with unmanned aerial system (UAS) images. These models were trained with two datasets containing 400 and 900 images with variations in plant size and soil background brightness. The performance of these models was assessed with two testing datasets of different dimensions, testing dataset 1 with 300 by 400 pixels and testing dataset 2 with 250 by 1200 pixels. The model validation results showed that the mean average precision (mAP) and average recall (AR) were 79% and 73% for the CenterNet model, and 86% and 72% for the MobileNet model with 900 training images. The accuracy of cotton plant detection and counting was higher with testing dataset 1 for both CenterNet and MobileNet models. The results showed that the CenterNet model had a better overall performance for cotton plant detection and counting with 900 training images. The results also indicated that more training images are required when applying object detection models on images with different dimensions from training datasets. The mean absolute percentage error (MAPE), coefficient of determination (R2), and the root mean squared error (RMSE) values of the cotton plant counting were 0.07%, 0.98 and 0.37, respectively, with testing dataset 1 for the CenterNet model with 900 training images. Both MobileNet and CenterNet models have the potential to accurately and timely detect and count cotton plants based on high-resolution UAS images at the seedling stage. This study provides valuable information for selecting the right deep learning tools and the appropriate number of training images for object detection projects in agricultural applications.

Download Full-text