Deep Learning Techniques in Tandem with Signal Processing Cues for Phonetic Segmentation for Text to Speech Synthesis in Indian Languages

Integrating Articulatory Information in Deep Learning-Based Text-to-Speech Synthesis

10.21437/interspeech.2017-1762 ◽

2017 ◽

Cited By ~ 1

Author(s):

Beiming Cao ◽

Myungjong Kim ◽

Jan van Santen ◽

Ted Mau ◽

Jun Wang

Keyword(s):

Deep Learning ◽

Speech Synthesis ◽

Text To Speech ◽

Text To Speech Synthesis

Download Full-text

MAKEDONKA: Applied Deep Learning Model for Text-to-Speech Synthesis in Macedonian Language

Applied Sciences ◽

10.3390/app10196882 ◽

2020 ◽

Vol 10 (19) ◽

pp. 6882

Author(s):

Kostadin Mishev ◽

Aleksandra Karovska Ristovska ◽

Dimitar Trajanov ◽

Tome Eftimov ◽

Monika Simjanoska

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Speech Synthesis ◽

Feature Engineering ◽

Learning Approach ◽

Acoustic Model ◽

Text To Speech ◽

Text To Speech Synthesis ◽

Smooth Transitions ◽

Deep Learning Model

This paper presents MAKEDONKA, the first open-source Macedonian language synthesizer that is based on the Deep Learning approach. The paper provides an overview of the numerous attempts to achieve a human-like reproducible speech, which has unfortunately shown to be unsuccessful due to the work invisibility and lack of integration examples with real software tools. The recent advances in Machine Learning, the Deep Learning-based methodologies, provide novel methods for feature engineering that allow for smooth transitions in the synthesized speech, making it sound natural and human-like. This paper presents a methodology for end-to-end speech synthesis that is based on a fully-convolutional sequence-to-sequence acoustic model with a position-augmented attention mechanism—Deep Voice 3. Our model directly synthesizes Macedonian speech from characters. We created a dataset that contains approximately 20 h of speech from a native Macedonian female speaker, and we use it to train the text-to-speech (TTS) model. The achieved MOS score of 3.93 makes our model appropriate for application in any kind of software that needs text-to-speech service in the Macedonian language. Our TTS platform is publicly available for use and ready for integration.

Download Full-text

Recent Trends in Text to Speech Synthesis of Indian Languages

Helix ◽

10.29042/2019-4931-4936 ◽

2019 ◽

Vol 9 (3) ◽

pp. 4931-4936

Author(s):

Sarang L. Joshi ◽

Vinayak K. Bairagi

Keyword(s):

Speech Synthesis ◽

Indian Languages ◽

Text To Speech ◽

Text To Speech Synthesis ◽

Recent Trends

Download Full-text

Development and Evaluation of Speech Synthesis System Based on Deep Learning Models

Symmetry ◽

10.3390/sym13050819 ◽

2021 ◽

Vol 13 (5) ◽

pp. 819

Author(s):

Alakbar Valizada ◽

Sevil Jafarova ◽

Emin Sultanov ◽

Samir Rustamov

Keyword(s):

Deep Learning ◽

Speech Synthesis ◽

Online Survey ◽

Subjective Evaluation ◽

Learning Models ◽

Text To Speech ◽

Opinion Score ◽

News Website ◽

Text To Speech Synthesis ◽

The Mean

This study concentrates on the investigation, development, and evaluation of Text-to-Speech Synthesis systems based on Deep Learning models for the Azerbaijani Language. We have selected and compared state-of-the-art models-Tacotron and Deep Convolutional Text-to-Speech (DC TTS) systems to achieve the most optimal model. Both systems were trained on the 24 h speech dataset of the Azerbaijani language collected and processed from the news website. To analyze the quality and intelligibility of the speech signals produced by two systems, 34 listeners participated in an online survey containing subjective evaluation tests. The results of the study indicated that according to the Mean Opinion Score, Tacotron demonstrated better results for the In-Vocabulary words; however, DC TTS indicated a higher performance of the Out-Of-Vocabulary words synthesis.

Download Full-text

Using polysyllabic units for text to speech synthesis in Indian languages

2010 National Conference On Communications (NCC) ◽

10.1109/ncc.2010.5430193 ◽

2010 ◽

Cited By ~ 6

Author(s):

M. V. Vinodh ◽

Ashwin Bellur ◽

K. Badri Narayan ◽

Deepali M. Thakare ◽

Anila Susan ◽

...

Keyword(s):

Speech Synthesis ◽

Indian Languages ◽

Text To Speech ◽

Text To Speech Synthesis

Download Full-text

Text to Speech Synthesis System for Punjabi language using Statistical Parametric Speech Synthesis Technique

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.i1042.0789s19 ◽

2019 ◽

Vol 8 (9S) ◽

pp. 268-272

Keyword(s):

Speech Synthesis ◽

Gaussian Mixture ◽

Indian Languages ◽

Text To Speech ◽

Synthesis Technique ◽

Text To Speech Synthesis ◽

Statistical Parametric Speech Synthesis ◽

Parametric Speech Synthesis ◽

Traditional Approaches

Statistical Parametric Speech Synthesis has been most growing technique rather than the traditional approaches that we are used to synthesizing the speech. The shortcoming of traditional approaches will be overcome with latest statistical techniques. The main advantages of SPSS from traditional synthesis technique are that it has more flexibility to change the characteristics of voice and support more multiple languages i.e. multilingual, has good coverage of acoustic ` and robustness. It generates high quality of speech from small training database. Deep Neural network and Hidden Morkov model are basic statistical parametric speech synthesis techniques. Gaussian mixture model, sinusoidal model are also under this categories. Features were extracted in two type spectral features like spectral bandwidth, spectral centroid etc. and excitation features like F0 frequencies etc. We are using 722 Punjabi phonemes. Using sound forge software we extracted the 200 wave file from 1 hour pre-recording wave file related to those phonemes. Each and every phonemes feature was extracted and saved in database. We were extracting 28 features of each phoneme. TTS text-to-speech system generates sounds or speech as a output when provided the text of Punjabi language. There were already many TTS are developed on different Indian languages. The system that we are trying to build is based only on Punjabi language.

Download Full-text

Deep Learning based NLP Techniques In Text to Speech Synthesis for Communication Recognition

Journal of Soft Computing Paradigm - September 2019 ◽

10.36548/jscp.2020.4.002 ◽

2020 ◽

Vol 2 (4) ◽

pp. 209-215

Author(s):

Eriss Eisa Babikir Adam

Keyword(s):

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Speech Synthesis ◽

Large Scale ◽

Feature Representations ◽

Large Scale Data ◽

Learning Techniques ◽

Text To Speech Synthesis

The computer system is developing the model for speech synthesis of various aspects for natural language processing. The speech synthesis explores by articulatory, formant and concatenate synthesis. These techniques lead more aperiodic distortion and give exponentially increasing error rate during process of the system. Recently, advances on speech synthesis are tremendously moves towards deep learning process in order to achieve better performance. Due to leverage of large scale data gives effective feature representations to speech synthesis. The main objective of this research article is that implements deep learning techniques into speech synthesis and compares the performance in terms of aperiodic distortion with prior model of algorithms in natural language processing.

Download Full-text