Group Emotion Recognition Based on Global and Local Features

Fusing Visual Attention CNN and Bag of Visual Words for Cross-Corpus Speech Emotion Recognition

Sensors ◽

10.3390/s20195559 ◽

2020 ◽

Vol 20 (19) ◽

pp. 5559

Author(s):

Minji Seo ◽

Myungho Kim

Keyword(s):

Visual Attention ◽

Emotion Recognition ◽

Expressed Emotion ◽

Local Features ◽

Speech Emotion Recognition ◽

Bag Of Visual Words ◽

Emotional Speech ◽

Visual Words ◽

Performance Reduction ◽

Global And Local

Speech emotion recognition (SER) classifies emotions using low-level features or a spectrogram of an utterance. When SER methods are trained and tested using different datasets, they have shown performance reduction. Cross-corpus SER research identifies speech emotion using different corpora and languages. Recent cross-corpus SER research has been conducted to improve generalization. To improve the cross-corpus SER performance, we pretrained the log-mel spectrograms of the source dataset using our designed visual attention convolutional neural network (VACNN), which has a 2D CNN base model with channel- and spatial-wise visual attention modules. To train the target dataset, we extracted the feature vector using a bag of visual words (BOVW) to assist the fine-tuned model. Because visual words represent local features in the image, the BOVW helps VACNN to learn global and local features in the log-mel spectrogram by constructing a frequency histogram of visual words. The proposed method shows an overall accuracy of 83.33%, 86.92%, and 75.00% in the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), the Berlin Database of Emotional Speech (EmoDB), and Surrey Audio-Visual Expressed Emotion (SAVEE), respectively. Experimental results on RAVDESS, EmoDB, SAVEE demonstrate improvements of 7.73%, 15.12%, and 2.34% compared to existing state-of-the-art cross-corpus SER approaches.

Download Full-text

A Global-Local Blur Disentangling Network for Dynamic Scene Deblurring

Applied Sciences ◽

10.3390/app11052174 ◽

2021 ◽

Vol 11 (5) ◽

pp. 2174

Author(s):

Xiaoguang Li ◽

Feifan Yang ◽

Jianglu Huang ◽

Li Zhuo

Keyword(s):

Local Features ◽

Attention Mechanism ◽

Experimental Results ◽

Dynamic Scene ◽

Feature Maps ◽

Training Scheme ◽

Real Scene ◽

Global And Local

Images captured in a real scene usually suffer from complex non-uniform degradation, which includes both global and local blurs. It is difficult to handle the complex blur variances by a unified processing model. We propose a global-local blur disentangling network, which can effectively extract global and local blur features via two branches. A phased training scheme is designed to disentangle the global and local blur features, that is the branches are trained with task-specific datasets, respectively. A branch attention mechanism is introduced to dynamically fuse global and local features. Complex blurry images are used to train the attention module and the reconstruction module. The visualized feature maps of different branches indicated that our dual-branch network can decouple the global and local blur features efficiently. Experimental results show that the proposed dual-branch blur disentangling network can improve both the subjective and objective deblurring effects for real captured images.

Download Full-text

Multi-modal Fusion Using Spatio-temporal and Static Features for Group Emotion Recognition

Proceedings of the 2020 International Conference on Multimodal Interaction ◽

10.1145/3382507.3417971 ◽

2020 ◽

Author(s):

Mo Sun ◽

Jian Li ◽

Hui Feng ◽

Wei Gou ◽

Haifeng Shen ◽

...

Keyword(s):

Emotion Recognition ◽

Spatio Temporal ◽

Group Emotion

Download Full-text

Online multi-object tracking based on global and local features

2016 Visual Communications and Image Processing (VCIP) ◽

10.1109/vcip.2016.7805435 ◽

2016 ◽

Cited By ~ 2

Author(s):

Liang Xu ◽

Weihai Li ◽

Huiling Wu ◽

Qiang Li

Keyword(s):

Object Tracking ◽

Local Features ◽

Global And Local

Download Full-text

Automated insect classification with combined global and local features for orchard management

10.13031/2013.26977 ◽

2009 ◽

Author(s):

Chenglu Wen ◽

Daniel E Guyer ◽

Wei Li

Keyword(s):

Local Features ◽

Orchard Management ◽

Insect Classification ◽

Global And Local

Download Full-text

Cascade Attention Networks For Group Emotion Recognition with Face, Body and Image Cues

Proceedings of the 2018 on International Conference on Multimodal Interaction - ICMI '18 ◽

10.1145/3242969.3264991 ◽

2018 ◽

Cited By ~ 13

Author(s):

Kai Wang ◽

Xiaoxing Zeng ◽

Jianfei Yang ◽

Debin Meng ◽

Kaipeng Zhang ◽

...

Keyword(s):

Emotion Recognition ◽

Attention Networks ◽

Group Emotion

Download Full-text

Temporal Dissociation of Global and Local Features by Hierarchy of Vision

International Journal of Neuroscience ◽

10.1080/00207450802540524 ◽

2009 ◽

Vol 119 (3) ◽

pp. 373-383 ◽

Cited By ~ 6

Author(s):

Tomohiro Ishizu ◽

Tomoaki Ayabe ◽

Shozo Kojima

Keyword(s):

Local Features ◽

Global And Local

Download Full-text

Ensemble global and local features for single-sample face recognition

Computing, Control, Information and Education Engineering ◽

10.1201/b18828-92 ◽

2015 ◽

pp. 431-434 ◽

Cited By ~ 1

Keyword(s):

Face Recognition ◽

Local Features ◽

Single Sample ◽

Global And Local

Download Full-text

Memory-Augmented Transformer for Remote Sensing Image Semantic Segmentation

Remote Sensing ◽

10.3390/rs13224518 ◽

2021 ◽

Vol 13 (22) ◽

pp. 4518

Author(s):

Xin Zhao ◽

Jiayi Guo ◽

Yueting Zhang ◽

Yirong Wu

Keyword(s):

Remote Sensing ◽

Feature Extraction ◽

Semantic Segmentation ◽

Local Features ◽

Local Feature ◽

Global Information ◽

Deep Convolutional Neural Networks ◽

Global Representation ◽

Local Feature Extraction ◽

Global And Local

The semantic segmentation of remote sensing images requires distinguishing local regions of different classes and exploiting a uniform global representation of the same-class instances. Such requirements make it necessary for the segmentation methods to extract discriminative local features between different classes and to explore representative features for all instances of a given class. While common deep convolutional neural networks (DCNNs) can effectively focus on local features, they are limited by their receptive field to obtain consistent global information. In this paper, we propose a memory-augmented transformer (MAT) to effectively model both the local and global information. The feature extraction pipeline of the MAT is split into a memory-based global relationship guidance module and a local feature extraction module. The local feature extraction module mainly consists of a transformer, which is used to extract features from the input images. The global relationship guidance module maintains a memory bank for the consistent encoding of the global information. Global guidance is performed by memory interaction. Bidirectional information flow between the global and local branches is conducted by a memory-query module, as well as a memory-update module, respectively. Experiment results on the ISPRS Potsdam and ISPRS Vaihingen datasets demonstrated that our method can perform competitively with state-of-the-art methods.

Download Full-text

Person Reidentification Model Based on Multiattention Modules and Multiscale Residuals

Complexity ◽

10.1155/2021/6673461 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Yongyi Li ◽

Shiqi Wang ◽

Shuang Dong ◽

Xueling Lv ◽

Changzhi Lv ◽

...

Keyword(s):

Local Features ◽

Attention Mechanism ◽

Experimental Results ◽

Original Network ◽

Fine Grained ◽

Backbone Network ◽

Model Based ◽

Local Branch ◽

Feature Expression ◽

Global And Local

At present, person reidentification based on attention mechanism has attracted many scholars’ interests. Although attention module can improve the representation ability and reidentification accuracy of Re-ID model to a certain extent, it depends on the coupling of attention module and original network. In this paper, a person reidentification model that combines multiple attentions and multiscale residuals is proposed. The model introduces combined attention fusion module and multiscale residual fusion module in the backbone network ResNet 50 to enhance the feature flow between residual blocks and better fuse multiscale features. Furthermore, a global branch and a local branch are designed and applied to enhance the channel aggregation and position perception ability of the network by utilizing the dual ensemble attention module, as along as the fine-grained feature expression is obtained by using multiproportion block and reorganization. Thus, the global and local features are enhanced. The experimental results on Market-1501 dataset and DukeMTMC-reID dataset show that the indexes of the presented model, especially Rank-1 accuracy, reach 96.20% and 89.59%, respectively, which can be considered as a progress in Re-ID.

Download Full-text