crossmodal attention
Recently Published Documents


TOTAL DOCUMENTS

31
(FIVE YEARS 3)

H-INDEX

9
(FIVE YEARS 0)

Author(s):  
Haiyan Li ◽  
Dezhi Han

Visual Question Answering (VQA) is a multimodal research related to Computer Vision (CV) and Natural Language Processing (NLP). How to better obtain useful information from images and questions and give an accurate answer to the question is the core of the VQA task. This paper presents a VQA model based on multimodal encoders and decoders with gate attention (MEDGA). Each encoder and decoder block in the MEDGA applies not only self-attention and crossmodal attention but also gate attention, so that the new model can better focus on inter-modal and intra-modal interactions simultaneously within visual and language modality. Besides, MEDGA further filters out noise information irrelevant to the results via gate attention and finally outputs attention results that are closely related to visual features and language features, which makes the answer prediction result more accurate. Experimental evaluations on the VQA 2.0 dataset and the ablation experiments under different conditions prove the effectiveness of MEDGA. In addition, the MEDGA accuracy on the test-std dataset has reached 70.11%, which exceeds many existing methods.


2020 ◽  
Author(s):  
Charles Spence ◽  
Salvador Soto-Faraco
Keyword(s):  

2020 ◽  
Author(s):  
Guochun Yang ◽  
Di Fu ◽  
Li Zhenghan ◽  
Haiyan Wu ◽  
Honghui Xu ◽  
...  

Multisensory integration and crossmodal attention are two of the basic mechanisms in processing multisensory inputs, and they are usually mixed. Whether these two processes are dependent or independent remains controversial. To examine the relationship between multisensory integration and crossmodal attention, we adopted modified multilevel audiovisual gender judgment paradigms and evaluated the congruency effects in reaction time (RT) and the inverse effectiveness (IE) effects. If they were dependent, the occurrence of one effect would be accompanied with that of the other. Using both morphed faces and voices, we first performed a speeded classification task, in which participants were either asked to attend to faces (experiment 1a) or attend to voices (experiment 1b); then, we performed an unspeeded rating task with faces as the targets (experiment 2). We observed both a congruency effect in RT and an IE effect in experiment 1a, a congruency effect in RT alone in experiment 1b, and an IE effect alone in experiment 2. These results indicate that the two processes are independent of each other.


2015 ◽  
Vol 2 (10) ◽  
pp. 150324 ◽  
Author(s):  
Vivek Nityananda ◽  
Lars Chittka

Attentional demands can prevent humans and other animals from performing multiple tasks simultaneously. Some studies, however, show that tasks presented in different sensory modalities (e.g. visual and auditory) can be processed simultaneously. This suggests that, at least in these cases, attention might be modality-specific and divided differently between tasks when present in the same modality compared with different modalities. We investigated this possibility in bumblebees ( Bombus terrestris ) using a biologically relevant experimental set-up where they had to simultaneously choose more rewarding flowers and avoid simulated predatory attacks by robotic ‘spiders’. We found that when the tasks had to be performed using visual cues alone, bees failed to perform both tasks simultaneously. However, when highly rewarding flowers were indicated by olfactory cues and predators were indicated by visual cues, bees managed to perform both tasks successfully. Our results thus provide evidence for modality-specific attention in foraging bees and establish a novel framework for future studies of crossmodal attention in ecologically realistic settings.


2015 ◽  
Vol 155 ◽  
pp. 67-76 ◽  
Author(s):  
Magali Kreutzfeldt ◽  
Denise N. Stephan ◽  
Walter Sturm ◽  
Klaus Willmes ◽  
Iring Koch

Sign in / Sign up

Export Citation Format

Share Document