Multi-Level Processing in Human Speech Recognition

Multi-level region-of-interest CNNs for end to end speech recognition

Journal of Ambient Intelligence and Humanized Computing ◽

10.1007/s12652-018-1146-z ◽

2018 ◽

Vol 10 (11) ◽

pp. 4615-4624 ◽

Cited By ~ 6

Author(s):

Shubhanshi Singhal ◽

Vishal Passricha ◽

Pooja Sharma ◽

Rajesh Kumar Aggarwal

Keyword(s):

Speech Recognition ◽

Region Of Interest ◽

Multi Level ◽

End To End

Download Full-text

A MULTILINGUAL APPROACH TO TASK-ORIENTED MAN-MACHINE DIALOGUE BY VOICE

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001488000339 ◽

1988 ◽

Vol 02 (03) ◽

pp. 573-588

Author(s):

PHILIPPE MORIN ◽

JEAN-PAUL HATON ◽

JEAN-MARIE PIERREL ◽

GUENTHER RUSKE ◽

WALTER WEIGEL

Keyword(s):

Speech Recognition ◽

Electronic Mail ◽

Dialogue Systems ◽

Human Speech ◽

Artificial Languages ◽

Man Machine Dialogue ◽

Machine Communication ◽

Task Oriented ◽

Multilingual Approach ◽

Multimedia Interfaces

In the framework of man-machine communication, oral dialogue has a particular place since human speech presents several advantages when used either alone or in multimedia interfaces. The last decade has witnessed a proliferation of research into speech recognition and understanding, but few systems have been defined with a view to managing and understanding an actual man-machine dialogue. The PARTNER system that we describe in this paper proposes a solution in the case of task oriented dialogue with the use of artificial languages. A description of the essential characteristics of dialogue systems is followed by a presentation of the architecture and the principles of the PARTNER system. Finally, we present the most recent results obtained in the oral management of electronic mail in French and German.

Download Full-text

Bridging automatic speech recognition and psycholinguistics: Extending Shortlist to an end-to-end model of human speech recognition (L)

The Journal of the Acoustical Society of America ◽

10.1121/1.1624065 ◽

2003 ◽

Vol 114 (6) ◽

pp. 3032-3035 ◽

Cited By ~ 9

Author(s):

Odette Scharenborg ◽

Louis ten Bosch ◽

Lou Boves ◽

Dennis Norris

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Human Speech ◽

End To End

Download Full-text

Multi-Level Multi-Decision Model for Automatic Speech Recognition and Understanding

Speech Processing, Recognition and Artificial Neural Networks ◽

10.1007/978-1-4471-0845-0_10 ◽

1999 ◽

pp. 221-224 ◽

Cited By ~ 1

Author(s):

Taras K. Vintsiuk

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Decision Model ◽

Multi Level

Download Full-text

Vocal convergence in a multi-level primate society: insights into the evolution of vocal learning

Proceedings of The Royal Society B Biological Sciences ◽

10.1098/rspb.2020.2531 ◽

2020 ◽

Vol 287 (1941) ◽

pp. 20202531

Author(s):

Julia Fischer ◽

Franziska Wegdell ◽

Franziska Trede ◽

Federica Dal Pesco ◽

Kurt Hammerschmidt

Keyword(s):

Social Interactions ◽

Genetic Relatedness ◽

Vocal Learning ◽

Acoustic Similarity ◽

Speech Acquisition ◽

Human Speech ◽

Vocal Behaviour ◽

Vocal Convergence ◽

Multi Level ◽

Papio Papio

The extent to which nonhuman primate vocalizations are amenable to modification through experience is relevant for understanding the substrate from which human speech evolved. We examined the vocal behaviour of Guinea baboons, Papio papio , ranging in the Niokolo Koba National Park in Senegal. Guinea baboons live in a multi-level society, with units nested within parties nested within gangs. We investigated whether the acoustic structure of grunts of 27 male baboons of two gangs varied with party/gang membership and genetic relatedness. Males in this species are philopatric, resulting in increased male relatedness within gangs and parties. Grunts of males that were members of the same social levels were more similar than those of males in different social levels ( N = 351 dyads for comparison within and between gangs, and N = 169 dyads within and between parties), but the effect sizes were small. Yet, acoustic similarity did not correlate with genetic relatedness, suggesting that higher amounts of social interactions rather than genetic relatedness promote the observed vocal convergence. We consider this convergence a result of sensory–motor integration and suggest this to be an implicit form of vocal learning shared with humans, in contrast to the goal-directed and intentional explicit form of vocal learning unique to human speech acquisition.

Download Full-text

A new joint CTC-attention-based speech recognition model with multi-level multi-head attention

EURASIP Journal on Audio Speech and Music Processing ◽

10.1186/s13636-019-0161-0 ◽

2019 ◽

Vol 2019 (1) ◽

Cited By ~ 2

Author(s):

Chu-Xiong Qin ◽

Wen-Lin Zhang ◽

Dan Qu

Keyword(s):

Speech Recognition ◽

Nonnegative Matrix Factorization ◽

State Of The Art ◽

Nonnegative Matrix ◽

Attention Mechanism ◽

Word Error Rate ◽

Absolute Value ◽

Multi Level ◽

End To End ◽

High Level

Abstract A method called joint connectionist temporal classification (CTC)-attention-based speech recognition has recently received increasing focus and has achieved impressive performance. A hybrid end-to-end architecture that adds an extra CTC loss to the attention-based model could force extra restrictions on alignments. To explore better the end-to-end models, we propose improvements to the feature extraction and attention mechanism. First, we introduce a joint model trained with nonnegative matrix factorization (NMF)-based high-level features. Then, we put forward a hybrid attention mechanism by incorporating multi-head attentions and calculating attention scores over multi-level outputs. Experiments on TIMIT indicate that the new method achieves state-of-the-art performance with our best model. Experiments on WSJ show that our method exhibits a word error rate (WER) that is only 0.2% worse in absolute value than the best referenced method, which is trained on a much larger dataset, and it beats all present end-to-end methods. Further experiments on LibriSpeech show that our method is also comparable to the state-of-the-art end-to-end system in WER.

Download Full-text