Combining temporal and cepstral features for the automatic perceptual categorization of disordered connected speech

Author(s):  
Ali Alpan ◽  
Jean Schoentgen ◽  
Francis Grenez
2010 ◽  
Author(s):  
Ali Alpan ◽  
Jean Schoentgen ◽  
Youri Maryn ◽  
Francis Grenez

2019 ◽  
Vol 62 (7) ◽  
pp. 2099-2117 ◽  
Author(s):  
Jason A. Whitfield ◽  
Zoe Kriegel ◽  
Adam M. Fullenkamp ◽  
Daryush D. Mehta

Purpose Prior investigations suggest that simultaneous performance of more than 1 motor-oriented task may exacerbate speech motor deficits in individuals with Parkinson disease (PD). The purpose of the current investigation was to examine the extent to which performing a low-demand manual task affected the connected speech in individuals with and without PD. Method Individuals with PD and neurologically healthy controls performed speech tasks (reading and extemporaneous speech tasks) and an oscillatory manual task (a counterclockwise circle-drawing task) in isolation (single-task condition) and concurrently (dual-task condition). Results Relative to speech task performance, no changes in speech acoustics were observed for either group when the low-demand motor task was performed with the concurrent reading tasks. Speakers with PD exhibited a significant decrease in pause duration between the single-task (speech only) and dual-task conditions for the extemporaneous speech task, whereas control participants did not exhibit changes in any speech production variable between the single- and dual-task conditions. Conclusions Overall, there were little to no changes in speech production when a low-demand oscillatory motor task was performed with concurrent reading. For the extemporaneous task, however, individuals with PD exhibited significant changes when the speech and manual tasks were performed concurrently, a pattern that was not observed for control speakers. Supplemental Material https://doi.org/10.23641/asha.8637008


2020 ◽  
Vol 29 (2) ◽  
pp. 259-264 ◽  
Author(s):  
Hasan K. Saleh ◽  
Paula Folkeard ◽  
Ewan Macpherson ◽  
Susan Scollie

Purpose The original Connected Speech Test (CST; Cox et al., 1987) is a well-regarded and often utilized speech perception test. The aim of this study was to develop a new version of the CST using a neutral North American accent and to assess the use of this updated CST on participants with normal hearing. Method A female English speaker was recruited to read the original CST passages, which were recorded as the new CST stimuli. A study was designed to assess the newly recorded CST passages' equivalence and conduct normalization. The study included 19 Western University students (11 females and eight males) with normal hearing and with English as a first language. Results Raw scores for the 48 tested passages were converted to rationalized arcsine units, and average passage scores more than 1 rationalized arcsine unit standard deviation from the mean were excluded. The internal reliability of the 32 remaining passages was assessed, and the two-way random effects intraclass correlation was .944. Conclusion The aim of our study was to create new CST stimuli with a more general North American accent in order to minimize accent effects on the speech perception scores. The study resulted in 32 passages of equivalent difficulty for listeners with normal hearing.


Author(s):  
Sarfaraz Jelil ◽  
Rohan Kumar Das ◽  
S.R. Mahadeva Prasanna ◽  
Rohit Sinha

Science ◽  
1968 ◽  
Vol 161 (3847) ◽  
pp. 1259-1260 ◽  
Author(s):  
C. A. Kelsey ◽  
S. J. Ewanowski ◽  
T. J. Hixon ◽  
F. D. Minifie

Sensors ◽  
2021 ◽  
Vol 21 (5) ◽  
pp. 1888
Author(s):  
Juraj Kacur ◽  
Boris Puterka ◽  
Jarmila Pavlovicova ◽  
Milos Oravec

Many speech emotion recognition systems have been designed using different features and classification methods. Still, there is a lack of knowledge and reasoning regarding the underlying speech characteristics and processing, i.e., how basic characteristics, methods, and settings affect the accuracy, to what extent, etc. This study is to extend physical perspective on speech emotion recognition by analyzing basic speech characteristics and modeling methods, e.g., time characteristics (segmentation, window types, and classification regions—lengths and overlaps), frequency ranges, frequency scales, processing of whole speech (spectrograms), vocal tract (filter banks, linear prediction coefficient (LPC) modeling), and excitation (inverse LPC filtering) signals, magnitude and phase manipulations, cepstral features, etc. In the evaluation phase the state-of-the-art classification method and rigorous statistical tests were applied, namely N-fold cross validation, paired t-test, rank, and Pearson correlations. The results revealed several settings in a 75% accuracy range (seven emotions). The most successful methods were based on vocal tract features using psychoacoustic filter banks covering the 0–8 kHz frequency range. Well scoring are also spectrograms carrying vocal tract and excitation information. It was found that even basic processing like pre-emphasis, segmentation, magnitude modifications, etc., can dramatically affect the results. Most findings are robust by exhibiting strong correlations across tested databases.


Sign in / Sign up

Export Citation Format

Share Document