articulatory features
Recently Published Documents


TOTAL DOCUMENTS

128
(FIVE YEARS 9)

H-INDEX

14
(FIVE YEARS 0)

Author(s):  
Defne Abur ◽  
Austeja Subaciute ◽  
Ayoub Daliri ◽  
Rosemary A. Lester-Smith ◽  
Ashling A. Lupiani ◽  
...  

Purpose Unexpected and sustained manipulations of auditory feedback during speech production result in “reflexive” and “adaptive” responses, which can shed light on feedback and feedforward auditory-motor control processes, respectively. Persons with Parkinson's disease (PwPD) have shown aberrant reflexive and adaptive responses, but responses appear to differ for control of vocal and articulatory features. However, these responses have not been examined for both voice and articulation in the same speakers and with respect to auditory acuity and functional speech outcomes (speech intelligibility and naturalness). Method Here, 28 PwPD on their typical dopaminergic medication schedule and 28 age-, sex-, and hearing-matched controls completed tasks yielding reflexive and adaptive responses as well as auditory acuity for both vocal and articulatory features. Results No group differences were found for any measures of auditory-motor control, conflicting with prior findings in PwPD while off medication. Auditory-motor measures were also compared with listener ratings of speech function: first formant frequency acuity was related to speech intelligibility, whereas adaptive responses to vocal fundamental frequency manipulations were related to speech naturalness. Conclusions These results support that auditory-motor processes for both voice and articulatory features are intact for PwPD receiving medication. This work is also the first to suggest associations between measures of auditory-motor control and speech intelligibility and naturalness.


2021 ◽  
Vol 11 (19) ◽  
pp. 9056
Author(s):  
Guolun Sun ◽  
Zhihua Huang ◽  
Li Wang ◽  
Pengyuan Zhang

Articulatory features are proved to be efficient in the area of speech recognition and speech synthesis. However, acquiring articulatory features has always been a difficult research hotspot. A lightweight and accurate articulatory model is of significant meaning. In this study, we propose a novel temporal convolution network-based acoustic-to-articulatory inversion system. The acoustic feature is converted into a high-dimensional hidden space feature map through temporal convolution with frame-level feature correlations taken into account. Meanwhile, we construct a two-part target function combining prediction’s Root Mean Square Error (RMSE) and the sequences’ Pearson Correlation Coefficient (PCC) to jointly optimize the performance of the specific inversion model from both aspects. We also further conducted an analysis on the impact of the weight between the two parts on the final performance of the inversion model. Extensive experiments have shown that our, temporal convolution networks (TCN) model outperformed the Bi-derectional Long Short Term Memory model by 1.18 mm in RMSE and 0.845 in PCC with 14 model parameters when optimizing evenly with RMSE and PCC aspects.


Author(s):  
Marcia S. Suzuki, M.A.

The term uniskript was coined to refer to a class of phonologically engineered alphabets that employ visual-featural indexicality combined with sound-shape congruency to represent speech. In this working paper, I introduce the uniskript methodology, an alphabet generator technique that uses indices instead of symbols to represent the flow of speech. I refer to the Peircean theory of signs to explain the crucial semiotic distinction between uniskript and the traditional alphabets: in uniskript, an icon resembling relevant articulatory features of a given phoneme is used to index sound to shape. I also indicate how the findings in sound-symbolism were incorporated into the indices to facilitate cross-modal correspondences.  I propose that uniskript indexical nature and sensorial mappings can explain why it is so much easier to learn than symbolic and sensory incongruent alphabets. I then briefly discuss how the study of uniskript alphabets can shed some light on the role of cross-modal correspondences in learning. It can also bring a deeper understanding of the role of phonology in sound symbolism. Finally, I refer to some applications of uniskript in the teaching of literacy and in remediating reading issues and illustrate the entire concept by introducing a uniskript alphabet developed as a tool to teach pronunciation in an ESL program. keywords: uniskript, alphabets, sound symbolism, sound-shape iconicity, cross-modal congruency, phonology, second language learning, pronunciation in L2


Languages ◽  
2021 ◽  
Vol 6 (2) ◽  
pp. 81
Author(s):  
Eleanor Jorgensen ◽  
Jennifer Green ◽  
Anastasia Bauer

Spoken languages make up only one aspect of the communicative landscape of Indigenous Australia—sign languages are also an important part of their rich and diverse language ecologies. Australian Indigenous sign languages are predominantly used by hearing people as a replacement for speech in certain cultural contexts. Deaf or hard-of-hearing people are also known to make use of these sign languages. In some circumstances, sign may be used alongside speech, and in others it may replace speech altogether. Alternate sign languages such as those found in Australia occupy a particular place in the diversity of the world’s sign languages. However, the focus of research on sign language phonology has almost exclusively been on sign languages used in deaf communities. This paper takes steps towards deepening understandings of signed language phonology by examining the articulatory features of handshape and body locations in the signing practices of three communities in Central and Northern Australia. We demonstrate that, while Australian Indigenous sign languages have some typologically unusual features, they exhibit the same ‘fundamental’ structural characteristics as other sign languages.


Author(s):  
Panying Rong ◽  
Lindsey Heidrick

Purpose This study examined the articulatory control of speech and speechlike tasks in individuals with amyotrophic lateral sclerosis (ALS) and neurologically healthy individuals with the aim to identify the most useful set of articulatory features and tasks for assessing bulbar motor involvement in ALS. Method Tongue and jaw kinematics were recorded in 12 individuals with bulbar ALS and 10 healthy controls during a speech task and two speechlike tasks (i.e., alternating motion rate [AMR], sequential motion rate [SMR]). Eight articulatory features were derived for each participant per task, including the range, maximum speed, and acceleration time of tongue and jaw movements as well as the coupling and timing between tongue and jaw movements. The effects of task (i.e., AMR, SMR, speech) and group (i.e., ALS, control) on these articulatory features were evaluated. For each feature, the task that yielded the largest difference between the ALS and control groups was identified. The diagnostic efficacy of these task-specific features was assessed using the receiver operating characteristic analysis; the relation of these task-specific features to a well-established bulbar severity index—speaking rate—was determined using Spearman's rank correlation. Results Seven task-specific articulatory features were identified, including (a) tongue and jaw acceleration time during the AMR task, (b) tongue–jaw coupling during the SMR task, and (c) range of tongue movement, maximum tongue and jaw speed, and temporal lag between tongue and jaw movements during the speech task. Among these features, tongue and jaw acceleration time and their temporal lag showed relatively high accuracy (i.e., 0.83–0.95) in differentiating individuals with ALS from healthy controls. Range of tongue movement and maximum tongue and jaw speed showed significant correlations with speaking rate. Conclusion Findings provided preliminary evidence for the utility of task-specific articulatory measurements as a novel quantitative assessment to detect and predict bulbar motor involvement in ALS.


Sensors ◽  
2021 ◽  
Vol 21 (4) ◽  
pp. 1205
Author(s):  
Mohammed Algabri ◽  
Hassan Mathkour ◽  
Mansour M. Alsulaiman ◽  
Mohamed A. Bencherif

This study proposes using object detection techniques to recognize sequences of articulatory features (AFs) from speech utterances by treating AFs of phonemes as multi-label objects in speech spectrogram. The proposed system, called AFD-Obj, recognizes sequence of multi-label AFs in speech signal and localizes them. AFD-Obj consists of two main stages: firstly, we formulate the problem of AFs detection as an object detection problem and prepare the data to fulfill requirement of object detectors by generating a spectral three-channel image from the speech signal and creating the corresponding annotation for each utterance. Secondly, we use annotated images to train the proposed system to detect sequences of AFs and their boundaries. We test the system by feeding spectrogram images to the system, which will recognize and localize multi-label AFs. We investigated using these AFs to detect the utterance phonemes. YOLOv3-tiny detector is selected because of its real-time property and its support for multi-label detection. We test our AFD-Obj system on Arabic and English languages using KAPD and TIMIT corpora, respectively. Additionally, we propose using YOLOv3-tiny as an Arabic phoneme detection system (i.e., PD-Obj) to recognize and localize a sequence of Arabic phonemes from whole speech utterances. The proposed AFD-Obj and PD-Obj systems achieve excellent results for Arabic corpus and comparable to the state-of-the-art method for English corpus. Moreover, we showed that using only one-scale detection is suitable for AFs detection or phoneme recognition.


2021 ◽  
Vol 11 (1) ◽  
pp. 102-111
Author(s):  
T. R. Ryzhikova ◽  

Introduction: the Khanty language and its dialects (in particular, the Surgut one) are still poorly studied from the point of view of the experimental phonetics. Direct digital laryngoscopy is one of the methods of dynamic observation of speech production processes in the larynx, which allows to describe not only the articulatory features of a concrete sound, but also to trace the processes of co-articulation and adaptation. Objective: to analyze articulatory features of the previously identified intermittent vowels. Research materials: the laryngo-grams of the intermittent vowels in the pronunciation of the native speakers of the Tromagan subdialect of the Surgut dialect of the Khanty language. Results and novelty of the research: the study revealed that the phonation of intermittent vowels can be divided into several phases with the corresponding acoustic and articulatory effects. Analysis of the laryngo-grams allowed to classify the Surgut vowels i, ɵ, e as injective, i. e. pronounced with a lowered larynx. The vowel a requires further study and verification with the help of somatic and acoustic methods. The data obtained are in a good agreement with the works by B. B. Feyer on the Ket vocalism.


Complexity ◽  
2020 ◽  
Vol 2020 ◽  
pp. 1-10
Author(s):  
Guofeng Ren ◽  
Guicheng Shao ◽  
Jianmei Fu

In the recent years, along with the development of artificial intelligence (AI) and man-machine interaction technology, speech recognition and production have been asked to adapt to the rapid development of AI and man-machine technology, which need to improve recognition accuracy through adding novel features, fusing the feature, and improving recognition methods. Aiming at developing novel recognition feature and application to speech recognition, this paper presents a new method for articulatory-to-acoustic conversion. In the study, we have converted articulatory features (i.e., velocities of tongue and motion of lips) into acoustic features (i.e., the second formant and Mel-Cepstra). By considering the graphical representation of the articulators’ motion, this study combined Bidirectional Long Short-Term Memory (BiLSTM) with convolution neural network (CNN) and adopted the idea of word attention in Mandarin to extract semantic features. In this paper, we used the electromagnetic articulography (EMA) database designed by Taiyuan University of Technology, which contains ten speakers’ 299 disyllables and sentences of Mandarin, and extracted 8-dimensional articulatory features and 1-dimensional semantic feature relying on the word-attention layer; we then trained 200 samples and tested 99 samples for the articulatory-to-acoustic conversion. Finally, Root Mean Square Error (RMSE), Mean Mel-Cepstral Distortion (MMCD), and correlation coefficient have been used to evaluate the conversion effect and for comparison with Gaussian Mixture Model (GMM) and BiLSTM of recurrent neural network (BiLSTM-RNN). The results illustrated that the MMCD of Mel-Frequency Cepstrum Coefficient (MFCC) was 1.467 dB, and the RMSE of F2 was 22.10 Hz. The research results of this study can be used in the features fusion and speech recognition to improve the accuracy of recognition.


Sign in / Sign up

Export Citation Format

Share Document