scholarly journals Speech Recognition Using Energy Parameters to Classify Syllables in the Spanish Language

Author(s):  
Sergio Suárez Guerra ◽  
José Luis Oropeza Rodríguez ◽  
Edgardo M. Felipe Riveron ◽  
Jesús Figueroa Nazuno
2021 ◽  
Vol 11 (19) ◽  
pp. 8872
Author(s):  
Iván G. Torre ◽  
Mónica Romero ◽  
Aitor Álvarez

Automatic speech recognition in patients with aphasia is a challenging task for which studies have been published in a few languages. Reasonably, the systems reported in the literature within this field show significantly lower performance than those focused on transcribing non-pathological clean speech. It is mainly due to the difficulty of recognizing a more unintelligible voice, as well as due to the scarcity of annotated aphasic data. This work is mainly focused on applying novel semi-supervised learning methods to the AphasiaBank dataset in order to deal with these two major issues, reporting improvements for the English language and providing the first benchmark for the Spanish language for which less than one hour of transcribed aphasic speech was used for training. In addition, the influence of reinforcing the training and decoding processes with out-of-domain acoustic and text data is described by using different strategies and configurations to fine-tune the hyperparameters and the final recognition systems. The interesting results obtained encourage extending this technological approach to other languages and scenarios where the scarcity of annotated data to train recognition models is a challenging reality.


Author(s):  
Sergio Suárez Guerra ◽  
José Luis Oropeza Rodríguez ◽  
Edgardo Manuel Felipe Riveron ◽  
Jesús Figueroa Nazuno

Author(s):  
Robinson Jiménez-Moreno ◽  
Javier Orlando Pinzón-Arenas ◽  
César Giovany Pachón-Suescún

This article presents a work oriented to assistive robotics, where a scenario is established for a robot to reach a tool in the hand of a user, when they have verbally requested it by his name. For this, three convolutional neural networks are trained, one for recognition of a group of tools, which obtained an accuracy of 98% identifying the tools established for the application, that are scalpel, screwdriver and scissors; one for speech recognition, trained with the names of the tools in Spanish language, where its validation accuracy reach a 97.5% in the recognition of the words; and another for recognition of the user's hand, taking in consideration the classification of 2 gestures: Open and Closed hand, where a 96.25% accuracy was achieved. With those networks, tests in real time are performed, presenting results in the delivery of each tool with a 100% of accuracy, i.e. the robot was able to identify correctly what the user requested, recognize correctly each tool and deliver the one need when the user opened their hand, taking an average time of 45 seconds in the execution of the application.


Ingeniería ◽  
2017 ◽  
Vol 22 (3) ◽  
pp. 362 ◽  
Author(s):  
Juan David Celis Nuñez ◽  
Rodrigo Andres Llanos Castro ◽  
Byron Medina Delgado ◽  
Sergio Basilio Sepúlveda Mora ◽  
Sergio Alexander Castro Casadiego

 Context: Automatic speech recognition requires the development of language and acoustic models for different existing dialects. The purpose of this research is the training of an acoustic model, a statistical language model and a grammar language model for the Spanish language, specifically for the dialect of the city of San Jose de Cucuta, Colombia, that can be used in a command control system. Existing models for the Spanish language have problems in the recognition of the fundamental frequency and the spectral content, the accent, pronunciation, tone or simply the language model for Cucuta's dialect.Method: in this project, we used Raspberry Pi B+ embedded system with Raspbian operating system which is a Linux distribution and two open source software, namely CMU-Cambridge Statistical Language Modeling Toolkit from the University of Cambridge and CMU Sphinx from Carnegie Mellon University; these software are based on Hidden Markov Models for the calculation of voice parameters. Besides, we used 1913 recorded audios with the voice of people from San Jose de Cucuta and Norte de Santander department. These audios were used for training and testing the automatic speech recognition system.Results: we obtained a language model that consists of two files, one is the statistical language model (.lm), and the other is the jsgf grammar model (.jsgf). Regarding the acoustic component, two models were trained, one of them with an improved version which had a 100 % accuracy rate in the training results and 83 % accuracy rate in the audio tests for command recognition. Finally, we elaborated a manual for the creation of acoustic and language models with CMU Sphinx software.Conclusions: The number of participants in the training process of the language and acoustic models has a significant influence on the quality of the voice processing of the recognizer. The use of a large dictionary for the training process and a short dictionary with the command words for the implementation is important to get a better response of the automatic speech recognition system. Considering the accuracy rate above 80 % in the voice recognition tests, the proposed models are suitable for applications oriented to the assistance of visual or motion impairment people.


Author(s):  
Shobha Bhatt ◽  
Amita Dev ◽  
Anurag Jain

Background: Speech Recognition is the most effective and suitable way of communication. Extracted features play an important role in speech recognition. Previous research works for Hindi speech recognition lack detailed comparative analysis of the feature extraction methods using dynamic and energy parameters. Objective: The research work presents experimental work done to explore the effects of integrating dynamic coefficients and energy parameters with different feature extraction techniques on Connected word Hindi Speech recognition. As extracted features play a significant role in speech recognition, a comparative analysis is presented to show the effects of integration of dynamic and energy parameters to basic extracted features. Method: Speaker dependent system was proposed with monophones based five states Hidden Markov Model (HMM) using HTK Tool kit. Speech data set of connected words in Hindi was created. The feature extraction techniques such as Linear Predictive Coding Cepstral coefficients (LPCCs), Mel Frequency Cepstral Coefficients (MFCCs), and Perceptual Linear Prediction (PLPs) coefficients were applied integrating delta, delta2, and energy parameters to evaluate the performance of the proposed methodology for speaker dependent recognition. Results: Experimental results show that the system achieved the highest recognition word accuracy of 89.97% using PLP coefficients. The PLP coefficients achieved 4% increment in word accuracy than original MFCCs and 16% increment in word accuracy than LPCCs. Adding energy parameters to original MFCCs increased word accuracy by 1.5% only while adding dynamic coefficients delta and double delta has no significant effect on speech recognition accuracy. Conclusion: Research findings reveal that PLP coefficients outperformed. Explorations reveal that the integration of energy parameters are better than original MFCCs. Investgations also reveal that adding energy parametres improved recognition score while adding delta and delta2 coefficients to basic features did not improve the recognition scores. Research findings could be used to enhance the performance of a speech recognition system by using a suitable feature extraction technique and combining the different feature extraction techniques. Further, investigations can be used to develop language resources for refining speech recognition. The work can be extended to develop a continuous Hindi speech recognition system


2008 ◽  
Vol 18 (1) ◽  
pp. 19-24
Author(s):  
Erin C. Schafer

Children who use cochlear implants experience significant difficulty hearing speech in the presence of background noise, such as in the classroom. To address these difficulties, audiologists often recommend frequency-modulated (FM) systems for children with cochlear implants. The purpose of this article is to examine current empirical research in the area of FM systems and cochlear implants. Discussion topics will include selecting the optimal type of FM receiver, benefits of binaural FM-system input, importance of DAI receiver-gain settings, and effects of speech-processor programming on speech recognition. FM systems significantly improve the signal-to-noise ratio at the child's ear through the use of three types of FM receivers: mounted speakers, desktop speakers, or direct-audio input (DAI). This discussion will aid audiologists in making evidence-based recommendations for children using cochlear implants and FM systems.


2010 ◽  
Vol 26 (3) ◽  
pp. 194-202 ◽  
Author(s):  
Daniel A. Newman ◽  
Christine A. Limbers ◽  
James W. Varni

The measurement of health-related quality of life (HRQOL) in children has witnessed significant international growth over the past decade in an effort to improve pediatric health and well-being, and to determine the value of health-care services. In order to compare international HRQOL research findings across language groups, it is important to demonstrate factorial invariance, i.e., that the items have an equivalent meaning across the language groups studied. This study examined the factorial invariance of child self-reported HRQOL across English- and Spanish-language groups in a Hispanic population of 2,899 children ages 8–18 utilizing the 23-item PedsQL™ 4.0 Generic Core Scales. Multigroup confirmatory factor analysis (CFA) was performed specifying a five-factor model across language groups. The findings support an equivalent 5-factor structure across English- and Spanish-language groups. Based on these data, it can be concluded that children across the two languages studied interpreted the instrument in a similar manner. The multigroup CFA statistical methods utilized in the present study have important implications for cross-cultural assessment research in children in which different language groups are compared.


Sign in / Sign up

Export Citation Format

Share Document