scholarly journals Vowel Classification Based on Fundamental Frequency and Formant Frequencies

1993 ◽  
Vol 36 (4) ◽  
pp. 694-700 ◽  
Author(s):  
James Hillenbrand ◽  
Robert T. Gayvert

A quadratic discriminant classification technique was used to classify spectral measurements from vowels spoken by men, women, and children. The parameters used to train the discriminant classifier consisted of various combinations of fundamental frequency and the three lowest formant frequencies. Several nonlinear auditory transforms were evaluated. Unlike previous studies using a linear discriminant classifier, there was no advantage in category separability for any of the nonlinear auditory transforms over a linear frequency scale, and no advantage for spectral distances over absolute frequencies. However, it was found that parameter sets using nonlinear transforms and spectral differences reduced the differences between phonetically equivalent tokens produced by different groups of talkers.

2014 ◽  
Vol 57 (1) ◽  
pp. 26-45 ◽  
Author(s):  
Carlyn Burris ◽  
Houri K. Vorperian ◽  
Marios Fourakis ◽  
Ray D. Kent ◽  
Daniel M. Bolt

Purpose This study examines accuracy and comparability of 4 trademarked acoustic analysis software packages (AASPs): Praat, WaveSurfer, TF32, and CSL by using synthesized and natural vowels. Features of AASPs are also described. Method Synthesized and natural vowels were analyzed using each of the AASP's default settings to secure 9 acoustic measures: fundamental frequency (F0), formant frequencies (F1–F4), and formant bandwidths (B1–B4). The discrepancy between the software measured values and the input values (synthesized, previously reported, and manual measurements) was used to assess comparability and accuracy. Basic AASP features are described. Results Results indicate that Praat, WaveSurfer, and TF32 generate accurate and comparable F0 and F1–F4 data for synthesized vowels and adult male natural vowels. Results varied by vowel for women and children, with some serious errors. Bandwidth measurements by AASPs were highly inaccurate as compared with manual measurements and published data on formant bandwidths. Conclusions Values of F0 and F1–F4 are generally consistent and fairly accurate for adult vowels and for some child vowels using the default settings in Praat, WaveSurfer, and TF32. Manipulation of default settings yields improved output values in TF32 and CSL. Caution is recommended especially before accepting F1–F4 results for children and B1–B4 results for all speakers.


ALQALAM ◽  
2015 ◽  
Vol 32 (2) ◽  
pp. 284
Author(s):  
Muhammad Subali ◽  
Miftah Andriansyah ◽  
Christanto Sinambela

This article aims to look at the similarities and differences in the fundamental frequency and formant frequencies using the autocorrelation function and LPCfunction in GUI MATLAB 2012b on sound hijaiyah letters for adult male speaker beginner and expert based on makhraj pronunciation and both of speaker will be analysis on matching distance of the sound use DTW method on cepstrum. Subject for speech beginner makhraj pronunciation are taken from college student of Universitas Gunadarma and SITC aged 22 years old Data of the speech beginner makhraj pronunciation is recorded using MATLAB algorithm on GUI Subject for speech expert makhraj pronunciation are taken from previous research. They are 20-30 years old from the time of taking data. The sound will be extracted to get the value of the fundamental frequency and formant frequency. After getting both frequencies, it will be obtained analysis of the similarities and differences in the fundamental frequency and formant frequencies of speech beginner and expert and it will shows matching distance of both speech. The result is all of speech beginner and expert based on makhraj pronunciation have different values of fundamental frequency and formant frequency. Then the results of the analysis matching distance using method DTW showed that obtained in the range of 28.9746 to 136.4 between speech beginner and expert based on makhraj pronunciation. Keywords: fundamental frequency, formant frequency, hijaiyah letters, makhraj


Animals ◽  
2018 ◽  
Vol 8 (10) ◽  
pp. 167 ◽  
Author(s):  
Anton Baotic ◽  
Maxime Garcia ◽  
Markus Boeckle ◽  
Angela Stoeger

African savanna elephants live in dynamic fission–fusion societies and exhibit a sophisticated vocal communication system. Their most frequent call-type is the ‘rumble’, with a fundamental frequency (which refers to the lowest vocal fold vibration rate when producing a vocalization) near or in the infrasonic range. Rumbles are used in a wide variety of behavioral contexts, for short- and long-distance communication, and convey contextual and physical information. For example, maturity (age and size) is encoded in male rumbles by formant frequencies (the resonance frequencies of the vocal tract), having the most informative power. As sound propagates, however, its spectral and temporal structures degrade progressively. Our study used manipulated and resynthesized male social rumbles to simulate large and small individuals (based on different formant values) to quantify whether this phenotypic information efficiently transmits over long distances. To examine transmission efficiency and the potential influences of ecological factors, we broadcasted and re-recorded rumbles at distances of up to 1.5 km in two different habitats at the Addo Elephant National Park, South Africa. Our results show that rumbles were affected by spectral–temporal degradation over distance. Interestingly and unlike previous findings, the transmission of formants was better than that of the fundamental frequency. Our findings demonstrate the importance of formant frequencies for the efficiency of rumble propagation and the transmission of information content in a savanna elephant’s natural habitat.


Author(s):  
Yeptain Leung ◽  
Jennifer Oates ◽  
Siew-Pang Chan ◽  
Viktória Papp

Purpose The aim of the study was to examine associations between speaking fundamental frequency ( f os ), vowel formant frequencies ( F ), listener perceptions of speaker gender, and vocal femininity–masculinity. Method An exploratory study was undertaken to examine associations between f os , F 1 – F 3 , listener perceptions of speaker gender (nominal scale), and vocal femininity–masculinity (visual analog scale). For 379 speakers of Australian English aged 18–60 years, f os mode and F 1 – F 3 (12 monophthongs; total of 36 F s) were analyzed on a standard reading passage. Seventeen listeners rated speaker gender and vocal femininity–masculinity on randomized audio recordings of these speakers. Results Model building using principal component analysis suggested the 36 F s could be succinctly reduced to seven principal components (PCs). Generalized structural equation modeling (with the seven PCs of F and f os as predictors) suggested that only F 2 and f os predicted listener perceptions of speaker gender (male, female, unable to decide). However, listener perceptions of vocal femininity–masculinity behaved differently and were predicted by F 1 , F 3 , and the contrast between monophthongs at the extremities of the F 1 acoustic vowel space, in addition to F 2 and f os . Furthermore, listeners' perceptions of speaker gender also influenced ratings of vocal femininity–masculinity substantially. Conclusion Adjusted odds ratios highlighted the substantially larger contribution of F to listener perceptions of speaker gender and vocal femininity–masculinity relative to f os than has previously been reported.


2019 ◽  
Vol 9 (17) ◽  
pp. 3558 ◽  
Author(s):  
Jinying Yu ◽  
Yuchen Gao ◽  
Yuxin Wu ◽  
Dian Jiao ◽  
Chang Su ◽  
...  

Non-intrusive load monitoring (NILM) is a core technology for demand response (DR) and energy conservation services. Traditional NILM methods are rarely combined with practical applications, and most studies aim to disaggregate the whole loads in a household, which leads to low identification accuracy. In this method, the event detection method is used to obtain the switching event sets of all loads, and the power consumption curves of independent unknown electrical appliances in a period are disaggregated by utilizing comprehensive features. A linear discriminant classifier group based on multi-feature global similarity is used for load identification. The uniqueness of our algorithm is that it designs an event detector based on steady-state segmentation and a linear discriminant classifier group based on multi-feature global similarity. The simulation is carried out on an open source data set. The results demonstrate the effectiveness and high accuracy of the multi-feature integrated classification (MFIC) algorithm by using the state-of-the-art NILM methods as benchmarks.


1992 ◽  
Vol 35 (1) ◽  
pp. 88-95 ◽  
Author(s):  
John Ryalls ◽  
Annie Larouche

Ten normally hearing and 10 age-matched subjects with moderate-to-severe hearing impairment were recorded producing a protocol of 18 basic syllables [/pi/,/pa/,/pu/; /bi/,/ba/,/bu/; /ti/,/ta/,/tu/; /di/,/da/,/du/; /ki/,/ka/,/ku/; /gi/,/ga/,/gu/] repeated five times. The resulting 90 syllables were digitized and measured for (a) total duration; (b) voice-onset time (VOT) of the initial consonant; (c) fundamental frequency (F 0 ) at midpoint of vowel; and (d) formant frequencies (F 1 , F 2 , F 3 ), also measured at midpoint of vowel. Statistical comparisons were conducted on (a) average values for each syllable, and (b) standard deviations. Although there were numerical differences between normally hearing and hearing-impaired groups, few differences were statistically significant.


2013 ◽  
Vol 709 ◽  
pp. 827-831 ◽  
Author(s):  
Chang Zhi Wei

To recognize the stress emotion, a subject was put alternately in periods of high and low stress by configuring the speed and difficulty of a game named Tetris. The respiration (RSP) signal and the electromyogram (EMG) signal with different stress level were then acquired. After preprocessing, the mathematical features were calculated and automatic detection of stress level based on Fisher linear discriminant classifier was realized. The results show that the average correct detection rate of stress level based on the EMG signal can reach 97.8%. That of the RSP signal is only 86.7%. The EMG signal is more effective than the RSP signal in detection of stress level. Union of multiple physiological signals can effectively improve the correct detection rate.


1998 ◽  
Vol 87 (2) ◽  
pp. 595-600 ◽  
Author(s):  
S. P. Whiteside

This experiment assessed whether fundamental frequency or formant frequencies have more perceptual salience in the identification of the sex of the speaker from synthesized vowels. Four sets of ten vowels were synthesized by combining fundamental frequencies and formant frequencies with different permutations 50 listeners took part in a listening test. Analysis of the listening test scores suggested that for 36 vowels, the fundamental frequency (F0) was probably the most salient perceptual cue. For the remaining four vowels, however, this was not the case as either the formant frequencies or the onset-offset patterns of the F0 appeared to have some perceptual salience.


Sign in / Sign up

Export Citation Format

Share Document