MAKEDONKA: Applied Deep Learning Model for Text-to-Speech Synthesis in Macedonian Language

Kostadin Mishev; Aleksandra Karovska Ristovska; Dimitar Trajanov; Tome Eftimov; Monika Simjanoska

doi:10.3390/app10196882

MAKEDONKA: Applied Deep Learning Model for Text-to-Speech Synthesis in Macedonian Language

Applied Sciences ◽

10.3390/app10196882 ◽

2020 ◽

Vol 10 (19) ◽

pp. 6882

Author(s):

Kostadin Mishev ◽

Aleksandra Karovska Ristovska ◽

Dimitar Trajanov ◽

Tome Eftimov ◽

Monika Simjanoska

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Speech Synthesis ◽

Feature Engineering ◽

Learning Approach ◽

Acoustic Model ◽

Text To Speech ◽

Text To Speech Synthesis ◽

Smooth Transitions ◽

Deep Learning Model

This paper presents MAKEDONKA, the first open-source Macedonian language synthesizer that is based on the Deep Learning approach. The paper provides an overview of the numerous attempts to achieve a human-like reproducible speech, which has unfortunately shown to be unsuccessful due to the work invisibility and lack of integration examples with real software tools. The recent advances in Machine Learning, the Deep Learning-based methodologies, provide novel methods for feature engineering that allow for smooth transitions in the synthesized speech, making it sound natural and human-like. This paper presents a methodology for end-to-end speech synthesis that is based on a fully-convolutional sequence-to-sequence acoustic model with a position-augmented attention mechanism—Deep Voice 3. Our model directly synthesizes Macedonian speech from characters. We created a dataset that contains approximately 20 h of speech from a native Macedonian female speaker, and we use it to train the text-to-speech (TTS) model. The achieved MOS score of 3.93 makes our model appropriate for application in any kind of software that needs text-to-speech service in the Macedonian language. Our TTS platform is publicly available for use and ready for integration.

Download Full-text

Integrating Articulatory Information in Deep Learning-Based Text-to-Speech Synthesis

10.21437/interspeech.2017-1762 ◽

2017 ◽

Cited By ~ 1

Author(s):

Beiming Cao ◽

Myungjong Kim ◽

Jan van Santen ◽

Ted Mau ◽

Jun Wang

Keyword(s):

Deep Learning ◽

Speech Synthesis ◽

Text To Speech ◽

Text To Speech Synthesis

Download Full-text

A Survey on Intrusion Detection System for Software Defined Networks (SDN)

Research Anthology on Artificial Intelligence Applications in Security ◽

10.4018/978-1-7998-7705-9.ch023 ◽

2021 ◽

pp. 467-489

Author(s):

Yogita Hande ◽

Akkalashmi Muddana

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Intrusion Detection ◽

Network Architecture ◽

Detection System ◽

Learning Approach ◽

Distinctive Features ◽

Detection Systems ◽

Security Challenges ◽

Deep Learning Model

Presently, the advances of the internet towards a wide-spread growth and the static nature of traditional networks has limited capacity to cope with organizational business needs. The new network architecture software defined networking (SDN) appeared to address these challenges and provides distinctive features. However, these programmable and centralized approaches of SDN face new security challenges which demand innovative security mechanisms like intrusion detection systems (IDS's). The IDS of SDN are designed currently with a machine learning approach; however, a deep learning approach is also being explored to achieve better efficiency and accuracy. In this article, an overview of the SDN with its security concern and IDS as a security solution is explained. A survey of existing security solutions designed to secure the SDN, and a comparative study of various IDS approaches based on a deep learning model and machine learning methods are discussed in the article. Finally, we describe future directions for SDN security.

Download Full-text

Deep Learning Techniques in Tandem with Signal Processing Cues for Phonetic Segmentation for Text to Speech Synthesis in Indian Languages

10.21437/interspeech.2017-666 ◽

2017 ◽

Cited By ~ 6

Author(s):

Arun Baby ◽

Jeena J. Prakash ◽

Rupak Vignesh ◽

Hema A. Murthy

Keyword(s):

Signal Processing ◽

Deep Learning ◽

Speech Synthesis ◽

Indian Languages ◽

Text To Speech ◽

Learning Techniques ◽

Text To Speech Synthesis

Download Full-text

Development and Evaluation of Speech Synthesis System Based on Deep Learning Models

Symmetry ◽

10.3390/sym13050819 ◽

2021 ◽

Vol 13 (5) ◽

pp. 819

Author(s):

Alakbar Valizada ◽

Sevil Jafarova ◽

Emin Sultanov ◽

Samir Rustamov

Keyword(s):

Deep Learning ◽

Speech Synthesis ◽

Online Survey ◽

Subjective Evaluation ◽

Learning Models ◽

Text To Speech ◽

Opinion Score ◽

News Website ◽

Text To Speech Synthesis ◽

The Mean

This study concentrates on the investigation, development, and evaluation of Text-to-Speech Synthesis systems based on Deep Learning models for the Azerbaijani Language. We have selected and compared state-of-the-art models-Tacotron and Deep Convolutional Text-to-Speech (DC TTS) systems to achieve the most optimal model. Both systems were trained on the 24 h speech dataset of the Azerbaijani language collected and processed from the news website. To analyze the quality and intelligibility of the speech signals produced by two systems, 34 listeners participated in an online survey containing subjective evaluation tests. The results of the study indicated that according to the Mean Opinion Score, Tacotron demonstrated better results for the In-Vocabulary words; however, DC TTS indicated a higher performance of the Out-Of-Vocabulary words synthesis.

Download Full-text

Acoustic model-based subword tokenization and prosodic-context extraction without language knowledge for text-to-speech synthesis

Speech Communication ◽

10.1016/j.specom.2020.09.003 ◽

2020 ◽

Vol 125 ◽

pp. 53-60

Author(s):

Masashi Aso ◽

Shinnosuke Takamichi ◽

Norihiro Takamune ◽

Hiroshi Saruwatari

Keyword(s):

Speech Synthesis ◽

Acoustic Model ◽

Text To Speech ◽

Model Based ◽

Language Knowledge ◽

Text To Speech Synthesis ◽

Context Extraction

Download Full-text

Wasserstein GAN and Waveform Loss-Based Acoustic Model Training for Multi-Speaker Text-to-Speech Synthesis Systems Using a WaveNet Vocoder

IEEE Access ◽

10.1109/access.2018.2872060 ◽

2018 ◽

Vol 6 ◽

pp. 60478-60488 ◽

Cited By ~ 17

Author(s):

Yi Zhao ◽

Shinji Takaki ◽

Hieu-Thi Luong ◽

Junichi Yamagishi ◽

Daisuke Saito ◽

...

Keyword(s):

Speech Synthesis ◽

Acoustic Model ◽

Text To Speech ◽

Text To Speech Synthesis ◽

Model Training

Download Full-text

A transformation-based learning approach to language identification for mixed-lingual text-to-speech synthesis

10.21437/interspeech.2005-711 ◽

2005 ◽

Cited By ~ 2

Author(s):

J. C. Marcadet ◽

V. Fischer ◽

C. Waast-Richard

Keyword(s):

Speech Synthesis ◽

Language Identification ◽

Learning Approach ◽

Text To Speech ◽

Text To Speech Synthesis

Download Full-text

Robust deep-learning models for text-to-speech synthesis support on embedded devices

Proceedings of the 7th International Conference on Management of computational and collective intElligence in Digital EcoSystems - MEDES '15 ◽

10.1145/2857218.2857234 ◽

2015 ◽

Cited By ~ 1

Author(s):

Tiberiu Boroş ◽

Stefan Daniel Dumitrescu

Keyword(s):

Deep Learning ◽

Speech Synthesis ◽

Learning Models ◽

Text To Speech ◽

Embedded Devices ◽

Text To Speech Synthesis

Download Full-text

Efficiencies of Feature Engineering in the Machine Learning approach for Fake News Classification

10.20944/preprints202111.0024.v1 ◽

2021 ◽

Author(s):

Katrin Donetski

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Global Network ◽

Feature Engineering ◽

Learning Approach ◽

Fake News ◽

Learning Models ◽

Substantial Impact ◽

Machine Learning Approach ◽

Classification Tasks

The rapid infiltration of fake news is a flaw to the otherwise valuable internet, a virtually global network that allows for the simultaneous exchange of information. While a common, and normally effective, approach to such classification tasks is designing a deep learning-based model, the subjectivity behind the writing and production of misleading news invalidates this technique. Deep learning models are unexplainable in nature, making the contextualization of results impossible because it lacks explicit features used in traditional machine learning. This paper emphasizes the need for feature engineering to effectively address this problem: containing the spread of fake news at the source, not after it has become globally prevalent. Insights from extracted features were used to manipulate the text, which was then tested on deep learning models. The original unknown yet substantial impact that the original features had on deep learning models was successfully depicted in this study.

Download Full-text

Design of English text-to-speech conversion algorithm based on machine learning

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189238 ◽

2020 ◽

pp. 1-12

Author(s):

Li Dongmei

Keyword(s):

Machine Learning ◽

Speech Synthesis ◽

Feature Recognition ◽

Learning Algorithm ◽

Morphological Structure ◽

English Text ◽

Text To Speech ◽

Part Of Speech ◽

Modern Computer ◽

Conversion Algorithm

English text-to-speech conversion is the key content of modern computer technology research. Its difficulty is that there are large errors in the conversion process of text-to-speech feature recognition, and it is difficult to apply the English text-to-speech conversion algorithm to the system. In order to improve the efficiency of the English text-to-speech conversion, based on the machine learning algorithm, after the original voice waveform is labeled with the pitch, this article modifies the rhythm through PSOLA, and uses the C4.5 algorithm to train a decision tree for judging pronunciation of polyphones. In order to evaluate the performance of pronunciation discrimination method based on part-of-speech rules and HMM-based prosody hierarchy prediction in speech synthesis systems, this study constructed a system model. In addition, the waveform stitching method and PSOLA are used to synthesize the sound. For words whose main stress cannot be discriminated by morphological structure, label learning can be done by machine learning methods. Finally, this study evaluates and analyzes the performance of the algorithm through control experiments. The results show that the algorithm proposed in this paper has good performance and has a certain practical effect.

Download Full-text