scholarly journals Compiling of Phonetic Database Structure

2021 ◽  
Vol 7 (4) ◽  
pp. 4001
Author(s):  
Maya Heydarova

The voice corpus of language is the essential part of the linguistic resources, and it contains the phonetic database. A phonetic database is a structured collection of software-delivered speech fragments. Nowadays, phonetic database or voice corpus became like a new element in speech technologies, and much investigation has taken place according to this event. The investigators' interest in voice corpus is related to the development of a speech recognition system. Today it is enough to experience in preparation of a phonetic database. Equipped with unique information on the preparation and usage of everyday speech corpus, the development level of speech technologies and the increasing power of computer technologies allow for the investigation of various language materials, largescale, and statistical phonetic research. These developed directions of linguistics were investigated in this article. Speech corpora are a valuable source of information for phonological research and the study of sound patterns. The study of speech corpora is in its infancy compared to other field studies in linguistics. Existing speech corpora form the part of the world's languages and do not fully represent all the dialects and speech forms by phonological aspect. The article analyses the history, structure, and importance of developing speech corpses, a branch of corpus linguistics and has developed in recent years. The article also lists the main features to be considered in the design of the speech corpus.

Author(s):  
Basavaraj N Hiremath ◽  
Malini M Patilb

The voice recognition system is about cognizing the signals, by feature extraction and identification of related parameters. The whole process is referred to as voice analytics. The paper aims at analysing and synthesizing the phonetics of voice using a computer program called “PRAAT”. The work carried out in the paper also supports the analysis of voice segmentation labelling, analyse the unique features of voice cues, understanding physics of voice, further the process is carried out to recognize sarcasm. Different unique features identified in the work are, intensity, pitch, formants related to read, speak, interactive and declarative sentences by using principle component analysis.


Author(s):  
Sonal Anilkumar Tiwari

Abstract: This can be quite interesting when we think that we commanding something to in-animated objects. Yes it is possible with the help of ASR systems. Speech recognition system is a system that can make humans to talk with machineries. Nowadays speech recognition is such a technique that without it, a person cannot do any of his work properly. People get addicted of it. And it has become a habit for humans like we use mobile phones but when we want to type something, then we immediately can pass the voice commands. With which our Efforts are reduced, as well as a lot of our time. Keywords: Speech, Speech Recognition, ASR, Corpus, PRAAT


2019 ◽  
Vol 29 (1) ◽  
pp. 1275-1282
Author(s):  
Shipra J. Arora ◽  
Rishipal Singh

Abstract The paper represents a Punjabi corpus in the agriculture domain. There are various dialects in the Punjabi language and the main concentration is on major dialects, i.e. Majhi, Malwai and Doabi for the present study. A speech corpus of 125 isolated words is taken into consideration. These words are uttered by 100 speakers, i.e. 60 Malwi dialect speakers (30 male and 30 female), 20 Majhi dialect speakers (10 male and 10 female) and 20 Doabi dialect speakers (10 male and 10 female). Tonemes, adhak (geminated) and nasal words are selected from the corpus. Recordings have been processed through two mediums. The paper also elaborates some distinctive features of the corpus. This corpus is of quite significance for the speech recognition system. Prosodic characteristics such as intonation, rhythm and stress create a crucial impact on the speech recognition system. These characteristics vary from language to language as well as various dialects of a language. This paper portrays a comparative analysis of isolated words prosodic features of Malwi, Majhi and Doabi dialects of Punjabi language. Analysis is done using the PRAAT tool. Pitch, intensity, formant I and formant II values are extracted for toneme, adhak, nasal (bindi) and nasal (tippi) words. For all kinds of words, there is a significant variation in pitch (fundamental frequency), intensity, formant I and formant II values of male and female speakers of Malwi, Majhi and Doabi dialects. A detailed analysis has been discussed throughout this paper.


1986 ◽  
Vol 30 (7) ◽  
pp. 638-641
Author(s):  
John P. Zenyuh ◽  
John M. Reising

The objective of this study was to compare the relative effectiveness of three modes of subsystem control: a voice recognition system with visual feedback presented on the head-up display, a standard multifunction control device with tailored switching logic, and a remotely operated multifunction control with feedback presented on the head-up display. Comparisons were based on measures of interference with a loading task and overall speed and accuracy of the control operations performed. The working hypothesis was that the voice system and head-up multifunction control would manifest substantially lower interference with the primary task, while subsystem control operation times would remain unaffected by control mode. The results indicate that performance with the remote touch panel was significantly poorer than with the voice or standard multifunction control systems.


2013 ◽  
Vol 309 ◽  
pp. 260-267
Author(s):  
Laszlo Czap ◽  
Judit Pinter

The most comfortable way of human communication is speech, which is a possible channel of human-machine interface as well. Moreover, a voice driven system can be controlled with busy hands. Performance of a speech recognition system is highly decayed by presence of noise. Logistic systems typically work in noisy environment, so noise reduction is crucial in industrial speech processing systems. Traditional noise reduction procedures (e.g. Wiener and Kalman filters) are effective on stationary or Gaussian noise. The noise of a real workplace can be captured by an additional microphone: The voice microphone takes both speech and noise, while the noise mike takes only the noise signal. Because of the phase shift of the two signals, simple subtraction in time domain is ineffective. In this paper, we discuss a spectral representation modeling the noise and voice signals. A frequency spectrum based noise cancellation method is proposed and verified in real industrial environment.


1989 ◽  
Vol 19 (1) ◽  
pp. 2-15 ◽  
Author(s):  
D. Autesserre ◽  
G. Pérennou ◽  
M. Rossi

SAM, an international project funded by the European ESPRIT initiative, is concerned with the methodology of assessment of the speech interface with computers. The applications at which the SAM Project aims require that recorded-speech corpora be both defined and created on a European scale. Within the present stage of development (called the “extension phase”) some three different aspects of this work have to be specified and developed:


Author(s):  
Ms. Pooja Sahu

In the project an automatic speech system is used in mobile customer care   services. In existing  mobile  customer care  services, customer  have  to  wait for 4 to 5 minutes  to get  into the  option  what   they  want to  inquire. Based on the requirement, we go for filtering the incoming calls. Persons who require particular data are dynamically move to speech recognition system that identifies the type of the enquiry chosen. Speech recognition is the one which dynamically identifies the individual speaking based on analyzing the speech waves. It helps in identifying the voice of the speaker to know the recognized user. It also helps in accessing services like telephone banking, mobile shopping, database services and securing the information which is confidential.


AVITEC ◽  
2019 ◽  
Vol 1 (1) ◽  
Author(s):  
Noor Fita Indri Prayoga

Voice is one of  way to communicate and express yourself. Speaker recognition is a process carried out by a device to recognize the speaker through the voice. This study designed a speaker recognition system that was able to identify speakers based on what was said by using dynamic time warping (DTW) method based in matlab. To design a speaker recognition system begins with the process of reference data and test data. Both processes have the same process, which starts with sound recording, preprocessing, and feature extraction. In this system, the Fast Fourier Transform (FFT) method is used to extract the features. The results of the feature extraction process from the two data will be compared using the DTW method. Calculations using DTW that produce the smallest value will be determined as the output. The test results show that the system can identify the voice with the best level of recognition accuracy of 90%, and the average recognition accuracy of 80%. The results were obtained from 50 tests, carried out by 5 people consisting of 3 men and 2 women, each speaker said a predetermined word


The voice recognition system uses CNN a lot. This is because CNN has the optimized ability to recognize and classify targets. CNN, however, has a problem that the bigger the object to be recognized, the more expensive the computational costs are. In this paper, we are going to solve these problems through MFCC feature extraction and model roll combining CNN and LSTM to present the possibility of performing voice recognition even through low-cost devices.


Sign in / Sign up

Export Citation Format

Share Document