multimodal dialogue
Recently Published Documents


TOTAL DOCUMENTS

117
(FIVE YEARS 7)

H-INDEX

10
(FIVE YEARS 0)

2021 ◽  
Author(s):  
Hardik Kothare ◽  
Vikram Ramanarayanan ◽  
Oliver Roesler ◽  
Michael Neumann ◽  
Jackson Liscombe ◽  
...  

2021 ◽  
Vol 29 ◽  
pp. 100491
Author(s):  
Rotem Abdu ◽  
Gitte van Helden ◽  
Rosa Alberto ◽  
Arthur Bakker

2021 ◽  
Author(s):  
Hardik Kothare ◽  
Vikram Ramanarayanan ◽  
Oliver Roesler ◽  
Michael Neumann ◽  
Jackson Liscombe ◽  
...  

We explore the utility of an on-demand multimodal conversational platform in extracting speech and facial metrics in children with Autism Spectrum Disorder (ASD). We investigate the extent to which these metrics correlate with objective clinical measures, particularly as they pertain to the interplay between the affective, phonatory and motoric subsystems. 22 participants diagnosed with ASD engaged with a virtual agent in conversational affect production tasks designed to elicit facial and vocal affect. We found significant correlations between vocal pitch and loudness extracted by our platform during these tasks and accuracy in recognition of facial and vocal affect, assessed via the Diagnostic Analysis of Nonverbal Accuracy-2 (DANVA-2) neuropsychological task. We also found significant correlations between jaw kinematic metrics extracted using our platform and motor speed of the dominant hand assessed via a standardised neuropsychological finger tapping task. These findings offer preliminary evidence for the usefulness of these audiovisual analytic metrics and could help us better model the interplay between different physiological subsystems in individuals with ASD.


2021 ◽  
Vol 12 (2) ◽  
pp. 1-33
Author(s):  
Mauajama Firdaus ◽  
Nidhi Thakur ◽  
Asif Ekbal

Multimodality in dialogue systems has opened up new frontiers for the creation of robust conversational agents. Any multimodal system aims at bridging the gap between language and vision by leveraging diverse and often complementary information from image, audio, and video, as well as text. For every task-oriented dialog system, different aspects of the product or service are crucial for satisfying the user’s demands. Based upon the aspect, the user decides upon selecting the product or service. The ability to generate responses with the specified aspects in a goal-oriented dialogue setup facilitates user satisfaction by fulfilling the user’s goals. Therefore, in our current work, we propose the task of aspect controlled response generation in a multimodal task-oriented dialog system. We employ a multimodal hierarchical memory network for generating responses that utilize information from both text and images. As there was no readily available data for building such multimodal systems, we create a Multi-Domain Multi-Modal Dialog (MDMMD++) dataset. The dataset comprises the conversations having both text and images belonging to the four different domains, such as hotels, restaurants, electronics, and furniture. Quantitative and qualitative analysis on the newly created MDMMD++ dataset shows that the proposed methodology outperforms the baseline models for the proposed task of aspect controlled response generation.


Author(s):  
Ana Abril Hernández

The comic form of art has witnessed a dramatic increase in the number of readers who have chosen this medium to deepen into meaning-making processes in multimodal texts. But little has been said so far about the adaptation of certain literary genres to the comic form. This is the case of poetry, narrative poetry, in particular illustrated in this study in the celebrated ballad: “La belle dame sans merci” (1819) by John Keats and the modern sonnet: “The singing-woman from the wood’s edge” (1920) by the American feminist poet Edna St. Vincent Millay. In view of two recent adaptations to the comic format of these poems, the present investigation explores from a comparative approach the semiotic processes at stake in representing women from the poets’ own point of view and also from their corresponding graphic artists’ in order to have a look at the changes in the depiction of women in poetry from the Romantic image of women to the view of women in the early twentieth century to the present day.


PLoS ONE ◽  
2020 ◽  
Vol 15 (11) ◽  
pp. e0241271
Author(s):  
Mauajama Firdaus ◽  
Arunav Pratap Shandeelya ◽  
Asif Ekbal

Multimodal dialogue system, due to its many-fold applications, has gained much attention to the researchers and developers in recent times. With the release of large-scale multimodal dialog dataset Saha et al. 2018 on the fashion domain, it has been possible to investigate the dialogue systems having both textual and visual modalities. Response generation is an essential aspect of every dialogue system, and making the responses diverse is an important problem. For any goal-oriented conversational agent, the system’s responses must be informative, diverse and polite, that may lead to better user experiences. In this paper, we propose an end-to-end neural framework for generating varied responses in a multimodal dialogue setup capturing information from both the text and image. Multimodal encoder with co-attention between the text and image is used for focusing on the different modalities to obtain better contextual information. For effective information sharing across the modalities, we combine the information of text and images using the BLOCK fusion technique that helps in learning an improved multimodal representation. We employ stochastic beam search with Gumble Top K-tricks to achieve diversified responses while preserving the content and politeness in the responses. Experimental results show that our proposed approach performs significantly better compared to the existing and baseline methods in terms of distinct metrics, and thereby generates more diverse responses that are informative, interesting and polite without any loss of information. Empirical evaluation also reveals that images, while used along with the text, improve the efficiency of the model in generating diversified responses.


Sign in / Sign up

Export Citation Format

Share Document