Aspect-Aware Response Generation for Multimodal Dialogue System

Mauajama Firdaus; Nidhi Thakur; Asif Ekbal

doi:10.1145/3430752

Aspect-Aware Response Generation for Multimodal Dialogue System

ACM Transactions on Intelligent Systems and Technology ◽

10.1145/3430752 ◽

2021 ◽

Vol 12 (2) ◽

pp. 1-33

Author(s):

Mauajama Firdaus ◽

Nidhi Thakur ◽

Asif Ekbal

Keyword(s):

User Satisfaction ◽

Dialogue Systems ◽

Conversational Agents ◽

Dialogue System ◽

Multimodal Systems ◽

Dialog System ◽

Language And Vision ◽

Memory Network ◽

Multimodal Dialogue ◽

Task Oriented

Multimodality in dialogue systems has opened up new frontiers for the creation of robust conversational agents. Any multimodal system aims at bridging the gap between language and vision by leveraging diverse and often complementary information from image, audio, and video, as well as text. For every task-oriented dialog system, different aspects of the product or service are crucial for satisfying the user’s demands. Based upon the aspect, the user decides upon selecting the product or service. The ability to generate responses with the specified aspects in a goal-oriented dialogue setup facilitates user satisfaction by fulfilling the user’s goals. Therefore, in our current work, we propose the task of aspect controlled response generation in a multimodal task-oriented dialog system. We employ a multimodal hierarchical memory network for generating responses that utilize information from both text and images. As there was no readily available data for building such multimodal systems, we create a Multi-Domain Multi-Modal Dialog (MDMMD++) dataset. The dataset comprises the conversations having both text and images belonging to the four different domains, such as hotels, restaurants, electronics, and furniture. Quantitative and qualitative analysis on the newly created MDMMD++ dataset shows that the proposed methodology outperforms the baseline models for the proposed task of aspect controlled response generation.

Download Full-text

More to diverse: Generating diversified responses in a task oriented multimodal dialog system

PLoS ONE ◽

10.1371/journal.pone.0241271 ◽

2020 ◽

Vol 15 (11) ◽

pp. e0241271

Author(s):

Mauajama Firdaus ◽

Arunav Pratap Shandeelya ◽

Asif Ekbal

Keyword(s):

Large Scale ◽

Contextual Information ◽

Empirical Evaluation ◽

Dialogue Systems ◽

Beam Search ◽

Dialogue System ◽

Text And Image ◽

Dialog System ◽

Multimodal Dialogue ◽

Task Oriented

Multimodal dialogue system, due to its many-fold applications, has gained much attention to the researchers and developers in recent times. With the release of large-scale multimodal dialog dataset Saha et al. 2018 on the fashion domain, it has been possible to investigate the dialogue systems having both textual and visual modalities. Response generation is an essential aspect of every dialogue system, and making the responses diverse is an important problem. For any goal-oriented conversational agent, the system’s responses must be informative, diverse and polite, that may lead to better user experiences. In this paper, we propose an end-to-end neural framework for generating varied responses in a multimodal dialogue setup capturing information from both the text and image. Multimodal encoder with co-attention between the text and image is used for focusing on the different modalities to obtain better contextual information. For effective information sharing across the modalities, we combine the information of text and images using the BLOCK fusion technique that helps in learning an improved multimodal representation. We employ stochastic beam search with Gumble Top K-tricks to achieve diversified responses while preserving the content and politeness in the responses. Experimental results show that our proposed approach performs significantly better compared to the existing and baseline methods in terms of distinct metrics, and thereby generates more diverse responses that are informative, interesting and polite without any loss of information. Empirical evaluation also reveals that images, while used along with the text, improve the efficiency of the model in generating diversified responses.

Download Full-text

Design and Implementation Issues for Convincing Conversational Agents

Conversational Agents and Natural Language Interaction ◽

10.4018/978-1-60960-617-6.ch007 ◽

2011 ◽

pp. 156-176 ◽

Cited By ~ 2

Author(s):

Markus Löckelt

Keyword(s):

Interactive Narrative ◽

Dialogue Systems ◽

Conversational Agents ◽

Comprehensive Overview ◽

Design And Implementation ◽

Implementation Level ◽

Multimodal Dialogue ◽

Task Oriented ◽

Future Work ◽

New System

This chapter describes a selection of experiences from designing and implementing virtual conversational characters for multimodal dialogue systems. It uses examples from the large interactive narrative VirtualHuman and some related systems of the task-oriented variety. The idea is not to give a comprehensive overview of any one system, but rather to identify and describe some issues that might also be relevant for the designer of a new system, to show how they can be addressed, and what problems still remain unresolved for future work. Besides giving an overview of how characters for interactive narrative systems can be built in the implementation level, the focus is on what should be in the knowledge base for virtual characters, and how it should be organized to be able to provide a convincing interaction with one or multiple characters.

Download Full-text

Designing ECAs to Improve Robustness of Human-Machine Dialogue

Conversational Agents and Natural Language Interaction ◽

10.4018/978-1-60960-617-6.ch003 ◽

2011 ◽

pp. 50-79

Author(s):

Beatriz López Mencía ◽

David D. Pardo ◽

Alvaro Hernández Trapote ◽

Luis A. Hernández Gómez

Keyword(s):

Speech Recognition ◽

Subjective Experience ◽

System Development ◽

Dialogue Systems ◽

Conversational Agents ◽

Dialogue System ◽

Privacy And Security ◽

Important Family ◽

Commercial Applications ◽

Gesture Design

One of the major challenges for dialogue systems deployed in commercial applications is to improve robustness when common low-level problems occur that are related with speech recognition. We first discuss this important family of interaction problems, and then we discuss the features of non-verbal, visual, communication that Embodied Conversational Agents (ECAs) bring ‘into the picture’ and which may be tapped into to improve spoken dialogue robustness and the general smoothness and efficiency of the interaction between the human and the machine. Our approach is centred around the information provided by ECAs. We deal with all stages of the conversation system development process, from scenario description, to gesture design and evaluation with comparative user tests. We conclude that ECAs can help improve the robustness of, as well as the users’ subjective experience with, a dialogue system. However, they may also make users more demanding and intensify privacy and security concerns.

Download Full-text

Estimating User Satisfaction Impact in Cities Using Physical Reaction Sensing and Multimodal Dialogue System

Lecture Notes in Electrical Engineering - 9th International Workshop on Spoken Dialogue System Technology ◽

10.1007/978-981-13-9443-0_15 ◽

2019 ◽

pp. 177-183 ◽

Cited By ~ 1

Author(s):

Yuki Matsuda ◽

Dmitrii Fedotov ◽

Yuta Takahashi ◽

Yutaka Arakawa ◽

Keiichi Yasumoto ◽

...

Keyword(s):

User Satisfaction ◽

Dialogue System ◽

Multimodal Dialogue ◽

Physical Reaction

Download Full-text

‘Can I Trust the Spoken Dialogue System Because It Uses the Same Words as I Do?’—Influence of Lexically Aligned Spoken Dialogue Systems on Trustworthiness and User Satisfaction

Interacting with Computers ◽

10.1093/iwc/iwy005 ◽

2018 ◽

Vol 30 (3) ◽

pp. 173-186 ◽

Cited By ~ 1

Author(s):

Gesa Alena Linnemann ◽

Regina Jucks

Keyword(s):

User Satisfaction ◽

Dialogue Systems ◽

Dialogue System ◽

Spoken Dialogue Systems ◽

Spoken Dialogue ◽

Spoken Dialogue System

Download Full-text

A Cooperative Memory Network for Personalized Task-oriented Dialogue Systems with Incomplete User Profiles

Proceedings of the Web Conference 2021 ◽

10.1145/3442381.3449843 ◽

2021 ◽

Author(s):

Jiahuan Pei ◽

Pengjie Ren ◽

Maarten de Rijke

Keyword(s):

Dialogue Systems ◽

User Profiles ◽

Memory Network ◽

Task Oriented

Download Full-text

Multimodal Fusion Algorithm and Reinforcement Learning-Based Dialog System in Human-Machine Interaction

International Journal on Electrical Engineering and Informatics ◽

10.15676/ijeei.2020.12.4.19 ◽

2020 ◽

Vol 12 (4) ◽

pp. 1016-1046

Author(s):

Hanif Fakhrurroja ◽

◽

Carmadi Machbub ◽

Ary Setijadi Prihatmanto ◽

Ayu Purwarianti ◽

...

Keyword(s):

Face Detection ◽

User Satisfaction ◽

System Development ◽

Multimodal Fusion ◽

Dialogue System ◽

Fusion Algorithm ◽

Human Machine Interaction ◽

Interaction System ◽

Dialog System ◽

Machine Interaction

Studies on human-machine interaction system show positive results on system development accuracy. However, there are problems, especially using certain input modalities such as speech, gesture, face detection, and skeleton tracking. These problems include how to design an interface system for a machine to contextualize the existing conversations. Other problems include activating the system using various modalities, right multimodal fusion methods, machine understanding of human intentions, and methods for developing knowledge. This study developed a method of human-machine interaction system. It involved several stages, including a multimodal activation system, methods for recognizing speech modalities, gestures, face detection and skeleton tracking, multimodal fusion strategies, understanding human intent and Indonesian dialogue systems, as well as machine knowledge development methods and the right response. The research contributes to an easier and more natural humanmachine interaction system using multimodal fusion-based systems. The average accuracy rate of multimodal activation, testing dialogue system using Indonesian, gesture recognition interaction, and multimodal fusion is 87.42%, 92.11%, 93.54% and 93%, respectively. The level of user satisfaction towards the multimodal recognition-based human-machine interaction system developed was 95%. According to 76.2% of users, this interaction system was natural, while 79.4% agreed that the machine responded well to their wishes.

Download Full-text

CERG: Chinese Emotional Response Generator with Retrieval Method

Research ◽

10.34133/2020/2616410 ◽

2020 ◽

Vol 2020 ◽

pp. 1-8

Author(s):

Yangyang Zhou ◽

Fuji Ren

Keyword(s):

Emotional Response ◽

Dialogue Systems ◽

Inference Process ◽

Dialogue System ◽

Data Set ◽

Retrieval Method ◽

Input Text ◽

Proposed Model ◽

Semantic Relevance ◽

Task Oriented

The dialogue system has always been one of the important topics in the domain of artificial intelligence. So far, most of the mature dialogue systems are task-oriented based, while non-task-oriented dialogue systems still have a lot of room for improvement. We propose a data-driven non-task-oriented dialogue generator “CERG” based on neural networks. This model has the emotion recognition capability and can generate corresponding responses. The data set we adopt comes from the NTCIR-14 STC-3 CECG subtask, which contains more than 1.7 million Chinese Weibo post-response pairs and 6 emotion categories. We try to concatenate the post and the response with the emotion, then mask the response part of the input text character by character to emulate the encoder-decoder framework. We use the improved transformer blocks as the core to build the model and add regularization methods to alleviate the problems of overcorrection and exposure bias. We introduce the retrieval method to the inference process to improve the semantic relevance of generated responses. The results of the manual evaluation show that our proposed model can make different responses to different emotions to improve the human-computer interaction experience. This model can be applied to lots of domains, such as automatic reply robots of social application.

Download Full-text

Study on Development of Humor Discriminator for Dialogue System

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2020.p0422 ◽

2020 ◽

Vol 24 (3) ◽

pp. 422-435

Author(s):

Tomohiro Yoshikawa ◽

◽

Ryosuke Iwakura

Keyword(s):

Human Subjects ◽

The Other ◽

P Value ◽

Dialogue Systems ◽

Generation System ◽

Dialogue System ◽

Evaluation Scores ◽

Significant Difference ◽

Computer Based ◽

Task Oriented

Studies on automatic dialogue systems, which allow people and computers to communicate with each other using natural language, have been attracting attention. In particular, the main objective of a non-task-oriented dialogue system is not to achieve a specific task but to amuse users through chat and free dialogue. For this type of dialogue system, continuity of the dialogue is important because users can easily get tired if the dialogue is monotonous. On the other hand, preceding studies have shown that speech with humorous expressions is effective in improving the continuity of a dialogue. In this study, we developed a computer-based humor discriminator to perform user- or situation-independent objective discrimination of humor. Using the humor discriminator, we also developed an automatic humor generation system and conducted an evaluation experiment with human subjects to test the generated jokes. A t-test on the evaluation scores revealed a significant difference (P value: 3.5×10-5) between the proposed and existing methods of joke generation.

Download Full-text

Dialogue state tracking accuracy improvement by distinguishing slot-value pairs and dialogue behaviour

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v21.i2.pp1057-1064 ◽

2021 ◽

Vol 21 (2) ◽

pp. 1057

Author(s):

Khaldoon H. Alhussayni ◽

Alexander Zamyatin ◽

S. Eman Alshamery

Keyword(s):

Critical Role ◽

Historical Context ◽

Dialogue Systems ◽

Tracking Accuracy ◽

Dialogue System ◽

Dialogue Acts ◽

Proposed Model ◽

State Tracking ◽

Task Oriented ◽

Previous User

<div><p>Dialog state tracking (DST) plays a critical role in cycle life of a task-oriented dialogue system. DST represents the goals of the consumer at each step by dialogue and describes such objectives as a conceptual structure comprising slot-value pairs and dialogue actions that specifically improve the performance and effectiveness of dialogue systems. DST faces several challenges: diversity of linguistics, dynamic social context and the dissemination of the state of dialogue over candidate values both in slot values and in dialogue acts determined in ontology. In many turns during the dialogue, users indirectly refer to the previous utterances, and that produce a challenge to distinguishing and use of related dialogue history, Recent methods used and popular for that are ineffective. In this paper, we propose a dialogue historical context self-Attention framework for DST that recognizes relevant historical context by including previous user utterance beside current user utterances and previous system actions where specific slot-value piers variations and uses that together with weighted system utterance to outperform existing models by recognizing the related context and the relevance of a system utterance. For the evaluation of the proposed model the WoZ dataset was used. The implementation was attempted with the prior user utterance as a dialogue encoder and second by the additional score combined with all the candidate slot-value pairs in the context of previous user utterances and current utterances. The proposed model obtained 0.8 per cent better results than all state-of-the-art methods in the combined precision of the target, but this is not the turnaround challenge for the submission.</p></div>

Download Full-text