Speech Recognition Using Energy Parameters to Classify Syllables in the Spanish Language

Improving Aphasic Speech Recognition by Using Novel Semi-Supervised Learning Methods on AphasiaBank for English and Spanish

Applied Sciences ◽

10.3390/app11198872 ◽

2021 ◽

Vol 11 (19) ◽

pp. 8872

Author(s):

Iván G. Torre ◽

Mónica Romero ◽

Aitor Álvarez

Keyword(s):

Speech Recognition ◽

Supervised Learning ◽

Automatic Speech Recognition ◽

English Language ◽

Spanish Language ◽

Learning Methods ◽

Text Data ◽

Lower Performance ◽

Recognition Systems ◽

Fine Tune

Automatic speech recognition in patients with aphasia is a challenging task for which studies have been published in a few languages. Reasonably, the systems reported in the literature within this field show significantly lower performance than those focused on transcribing non-pathological clean speech. It is mainly due to the difficulty of recognizing a more unintelligible voice, as well as due to the scarcity of annotated aphasic data. This work is mainly focused on applying novel semi-supervised learning methods to the AphasiaBank dataset in order to deal with these two major issues, reporting improvements for the English language and providing the first benchmark for the Spanish language for which less than one hour of transcribed aphasic speech was used for training. In addition, the influence of reinforcing the training and decoding processes with out-of-domain acoustic and text data is described by using different strategies and configurations to fine-tune the hyperparameters and the final recognition systems. The interesting results obtained encourage extending this technological approach to other languages and scenarios where the scarcity of annotated data to train recognition models is a challenging reality.

Download Full-text

Speech Recognition Using Energy, MFCCs and Rho Parameters to Classify Syllables in the Spanish Language

Lecture Notes in Computer Science - MICAI 2006: Advances in Artificial Intelligence ◽

10.1007/11925231_101 ◽

2006 ◽

pp. 1057-1066

Author(s):

Sergio Suárez Guerra ◽

José Luis Oropeza Rodríguez ◽

Edgardo Manuel Felipe Riveron ◽

Jesús Figueroa Nazuno

Keyword(s):

Speech Recognition ◽

Spanish Language

Download Full-text

Assistant robot through deep learning

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v10i1.pp1053-1062 ◽

2020 ◽

Vol 10 (1) ◽

pp. 1053

Author(s):

Robinson Jiménez-Moreno ◽

Javier Orlando Pinzón-Arenas ◽

César Giovany Pachón-Suescún

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Speech Recognition ◽

Real Time ◽

Convolutional Neural Networks ◽

Spanish Language ◽

Assistive Robotics ◽

The One

This article presents a work oriented to assistive robotics, where a scenario is established for a robot to reach a tool in the hand of a user, when they have verbally requested it by his name. For this, three convolutional neural networks are trained, one for recognition of a group of tools, which obtained an accuracy of 98% identifying the tools established for the application, that are scalpel, screwdriver and scissors; one for speech recognition, trained with the names of the tools in Spanish language, where its validation accuracy reach a 97.5% in the recognition of the words; and another for recognition of the user's hand, taking in consideration the classification of 2 gestures: Open and Closed hand, where a 96.25% accuracy was achieved. With those networks, tests in real time are performed, presenting results in the delivery of each tool with a 100% of accuracy, i.e. the robot was able to identify correctly what the user requested, recognize correctly each tool and deliver the one need when the user opened their hand, taking an average time of 45 seconds in the execution of the application.

Download Full-text

Modelo Acústico y de Lenguaje del Idioma Español para el dialecto Cucuteño, Orientado al Reconocimiento Automático del Habla

Ingeniería ◽

10.14483/23448393.11616 ◽

2017 ◽

Vol 22 (3) ◽

pp. 362 ◽

Cited By ~ 1

Author(s):

Juan David Celis Nuñez ◽

Rodrigo Andres Llanos Castro ◽

Byron Medina Delgado ◽

Sergio Basilio Sepúlveda Mora ◽

Sergio Alexander Castro Casadiego

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Language Model ◽

Spanish Language ◽

San Jose ◽

Training Process ◽

Accuracy Rate ◽

Statistical Language Model ◽

Acoustic Models ◽

The Voice

Context: Automatic speech recognition requires the development of language and acoustic models for different existing dialects. The purpose of this research is the training of an acoustic model, a statistical language model and a grammar language model for the Spanish language, specifically for the dialect of the city of San Jose de Cucuta, Colombia, that can be used in a command control system. Existing models for the Spanish language have problems in the recognition of the fundamental frequency and the spectral content, the accent, pronunciation, tone or simply the language model for Cucuta's dialect.Method: in this project, we used Raspberry Pi B+ embedded system with Raspbian operating system which is a Linux distribution and two open source software, namely CMU-Cambridge Statistical Language Modeling Toolkit from the University of Cambridge and CMU Sphinx from Carnegie Mellon University; these software are based on Hidden Markov Models for the calculation of voice parameters. Besides, we used 1913 recorded audios with the voice of people from San Jose de Cucuta and Norte de Santander department. These audios were used for training and testing the automatic speech recognition system.Results: we obtained a language model that consists of two files, one is the statistical language model (.lm), and the other is the jsgf grammar model (.jsgf). Regarding the acoustic component, two models were trained, one of them with an improved version which had a 100 % accuracy rate in the training results and 83 % accuracy rate in the audio tests for command recognition. Finally, we elaborated a manual for the creation of acoustic and language models with CMU Sphinx software.Conclusions: The number of participants in the training process of the language and acoustic models has a significant influence on the quality of the voice processing of the recognizer. The use of a large dictionary for the training process and a short dictionary with the command words for the implementation is important to get a better response of the automatic speech recognition system. Considering the accuracy rate above 80 % in the voice recognition tests, the proposed models are suitable for applications oriented to the assistance of visual or motion impairment people.

Download Full-text

Effects of the Dynamic and Energy based Feature Extraction on Hindi Speech Recognition

Recent Advances in Computer Science and Communications ◽

10.2174/2213275912666191001215916 ◽

2019 ◽

Vol 13 ◽

Author(s):

Shobha Bhatt ◽

Amita Dev ◽

Anurag Jain

Keyword(s):

Feature Extraction ◽

Comparative Analysis ◽

Speech Recognition ◽

Recognition System ◽

Speech Recognition System ◽

Extraction Techniques ◽

Energy Parameters ◽

Dynamic Coefficients ◽

Research Findings ◽

Cepstral Coefficients

Background: Speech Recognition is the most effective and suitable way of communication. Extracted features play an important role in speech recognition. Previous research works for Hindi speech recognition lack detailed comparative analysis of the feature extraction methods using dynamic and energy parameters. Objective: The research work presents experimental work done to explore the effects of integrating dynamic coefficients and energy parameters with different feature extraction techniques on Connected word Hindi Speech recognition. As extracted features play a significant role in speech recognition, a comparative analysis is presented to show the effects of integration of dynamic and energy parameters to basic extracted features. Method: Speaker dependent system was proposed with monophones based five states Hidden Markov Model (HMM) using HTK Tool kit. Speech data set of connected words in Hindi was created. The feature extraction techniques such as Linear Predictive Coding Cepstral coefficients (LPCCs), Mel Frequency Cepstral Coefficients (MFCCs), and Perceptual Linear Prediction (PLPs) coefficients were applied integrating delta, delta2, and energy parameters to evaluate the performance of the proposed methodology for speaker dependent recognition. Results: Experimental results show that the system achieved the highest recognition word accuracy of 89.97% using PLP coefficients. The PLP coefficients achieved 4% increment in word accuracy than original MFCCs and 16% increment in word accuracy than LPCCs. Adding energy parameters to original MFCCs increased word accuracy by 1.5% only while adding dynamic coefficients delta and double delta has no significant effect on speech recognition accuracy. Conclusion: Research findings reveal that PLP coefficients outperformed. Explorations reveal that the integration of energy parameters are better than original MFCCs. Investgations also reveal that adding energy parametres improved recognition score while adding delta and delta2 coefficients to basic features did not improve the recognition scores. Research findings could be used to enhance the performance of a speech recognition system by using a suitable feature extraction technique and combining the different feature extraction techniques. Further, investigations can be used to develop language resources for refining speech recognition. The work can be extended to develop a continuous Hindi speech recognition system

Download Full-text

Measuring Mandarin Speech Recognition Thresholds Using the Method of Adaptive Tracking

Journal of Speech Language and Hearing Research ◽

10.1044/2019_jslhr-h-18-0162 ◽

2019 ◽

Vol 62 (6) ◽

pp. 2009-2017

Author(s):

Yuxia Wang ◽

Zhaoyu Lu ◽

Xiaohu Yang ◽

Chang Liu

Keyword(s):

Speech Recognition ◽

Adaptive Tracking ◽

Mandarin Speech Recognition

Download Full-text

Selecting the Optimal FM System for Children With Cochlear Implants

Perspectives on Hearing and Hearing Disorders in Childhood ◽

10.1044/hhdc18.1.19 ◽

2008 ◽

Vol 18 (1) ◽

pp. 19-24

Author(s):

Erin C. Schafer

Keyword(s):

Speech Recognition ◽

Cochlear Implants ◽

Empirical Research ◽

Background Noise ◽

Signal To Noise Ratio ◽

Evidence Based ◽

Signal To Noise ◽

Speech Processor ◽

System Input ◽

Optimal Type

Children who use cochlear implants experience significant difficulty hearing speech in the presence of background noise, such as in the classroom. To address these difficulties, audiologists often recommend frequency-modulated (FM) systems for children with cochlear implants. The purpose of this article is to examine current empirical research in the area of FM systems and cochlear implants. Discussion topics will include selecting the optimal type of FM receiver, benefits of binaural FM-system input, importance of DAI receiver-gain settings, and effects of speech-processor programming on speech recognition. FM systems significantly improve the signal-to-noise ratio at the child's ear through the use of three types of FM receivers: mounted speakers, desktop speakers, or direct-audio input (DAI). This discussion will aid audiologists in making evidence-based recommendations for children using cochlear implants and FM systems.

Download Full-text

Effects of Aging on Response Criteria in Speech-Recognition Tasks

Journal of Speech Language and Hearing Research ◽

10.1044/jshr.2902.155 ◽

1986 ◽

Vol 29 (2) ◽

pp. 155-162 ◽

Cited By ~ 21

Author(s):

Sandra Gordon-Salant

Keyword(s):

Speech Recognition ◽

Response Criteria ◽

Effects Of Aging

Download Full-text

Aided Speech Recognition Abilities of Adults With a Severe or Severe-to-Profound Hearing Loss

Journal of Speech Language and Hearing Research ◽

10.1044/jslhr.4102.285 ◽

1998 ◽

Vol 41 (2) ◽

pp. 285-299 ◽

Cited By ~ 18

Author(s):

Mark C. Flynn ◽

Richard C. Dowell ◽

Graeme M. Clark

Keyword(s):

Hearing Loss ◽

Speech Recognition ◽

Profound Hearing Loss

Download Full-text

Factorial Invariance of Child Self-Report Across English and Spanish Language Groups in a Hispanic Population Utilizing the PedsQL™ 4.0 Generic Core Scales

European Journal of Psychological Assessment ◽

10.1027/1015-5759/a000026 ◽

2010 ◽

Vol 26 (3) ◽

pp. 194-202 ◽

Cited By ~ 11

Author(s):

Daniel A. Newman ◽

Christine A. Limbers ◽

James W. Varni

Keyword(s):

Health Care Services ◽

Factor Model ◽

Five Factor Model ◽

Well Being ◽

Factorial Invariance ◽

Spanish Language ◽

Self Report ◽

Hispanic Population ◽

Language Groups ◽

Related Quality

The measurement of health-related quality of life (HRQOL) in children has witnessed significant international growth over the past decade in an effort to improve pediatric health and well-being, and to determine the value of health-care services. In order to compare international HRQOL research findings across language groups, it is important to demonstrate factorial invariance, i.e., that the items have an equivalent meaning across the language groups studied. This study examined the factorial invariance of child self-reported HRQOL across English- and Spanish-language groups in a Hispanic population of 2,899 children ages 8–18 utilizing the 23-item PedsQL™ 4.0 Generic Core Scales. Multigroup confirmatory factor analysis (CFA) was performed specifying a five-factor model across language groups. The findings support an equivalent 5-factor structure across English- and Spanish-language groups. Based on these data, it can be concluded that children across the two languages studied interpreted the instrument in a similar manner. The multigroup CFA statistical methods utilized in the present study have important implications for cross-cultural assessment research in children in which different language groups are compared.

Download Full-text