Convolutional Attention Network with Maximizing Mutual Information for Fine-Grained Image Classification

Fenglei Wang; Hao Zhou; Shuohao Li; Jun Lei; Jun Zhang

doi:10.3390/sym12091511

Convolutional Attention Network with Maximizing Mutual Information for Fine-Grained Image Classification

Symmetry ◽

10.3390/sym12091511 ◽

2020 ◽

Vol 12 (9) ◽

pp. 1511

Author(s):

Fenglei Wang ◽

Hao Zhou ◽

Shuohao Li ◽

Jun Lei ◽

Jun Zhang

Keyword(s):

Mutual Information ◽

Image Classification ◽

Semantic Features ◽

Classification Methods ◽

Global Features ◽

Fine Grained ◽

Symmetric Structure ◽

Learning Techniques ◽

Local And Global Features ◽

Image Pairs

Fine-grained image classification has seen a great improvement benefiting from the advantages of deep learning techniques. Most fine-grained image classification methods focus on extracting discriminative features and combining the global features with the local ones. However, the accuracy is limited due to the inter-class similarity and the inner-class divergence as well as the lack of enough labelled images to train a deep network which can generalize to fine-grained classes. To deal with these problems, we develop an algorithm which combines Maximizing the Mutual Information (MMI) with the Learning Attention (LA). We make use of MMI to distill knowledge from the image pairs which contain the same object. Meanwhile we take advantage of the LA mechanism to find the salient region of the image to enhance the information distillation. Our model can extract more discriminative semantic features and improve the performance on fine-grained image classification. Our model has a symmetric structure, in which the paired images are inputted into the same network to extract the local and global features for the subsequent MMI and LA modules. We train the model by maximizing the mutual information and minimizing the cross-entropy stage by stage alternatively. Experiments show that our model can improve the performance of the fine-grained image classification effectively.

Download Full-text

Image classification by combining local and global features

The Visual Computer ◽

10.1007/s00371-018-1503-0 ◽

2018 ◽

Vol 35 (5) ◽

pp. 679-693 ◽

Cited By ~ 15

Author(s):

Leila Kabbai ◽

Mehrez Abdellaoui ◽

Ali Douik

Keyword(s):

Image Classification ◽

Global Features ◽

Local And Global Features

Download Full-text

Scene image classification based on visual words concatenation of local and global features

Multimedia Tools and Applications ◽

10.1007/s11042-021-11354-5 ◽

2021 ◽

Author(s):

Shrinivasa S R ◽

Prabhakar C J

Keyword(s):

Image Classification ◽

Global Features ◽

Visual Words ◽

Scene Image ◽

Local And Global Features

Download Full-text

Ensemble of Neural Networks for Automated Cell Phenotype Image Classification

Advances in Bioinformatics and Biomedical Engineering - Biomedical Image Analysis and Machine Learning Technologies ◽

10.4018/978-1-60566-956-4.ch011 ◽

2010 ◽

pp. 234-259 ◽

Cited By ~ 1

Author(s):

Loris Nanni ◽

Alessandra Lumini

Keyword(s):

Neural Networks ◽

Image Classification ◽

Subcellular Location ◽

Classification Problem ◽

Machine Learning Techniques ◽

Experimental Comparison ◽

Cell Phenotype ◽

Learning Techniques ◽

Problems And Solutions ◽

Local And Global Features

Subcellular location is related to the knowledge of the spatial distribution of a protein within the cell. The knowledge of the location of all proteins is crucial for several applications ranging from early diagnosis of a disease to monitoring of therapeutic effectiveness of drugs. This chapter focuses on the study of machine learning techniques for cell phenotype image classification and is aimed at pointing out some of the advantages of using a multi-classifier system instead of a stand-alone method to solve this difficult classification problem. The main problems and solutions proposed in this field are discussed and a new approach is proposed based on ensemble of neural networks trained by local and global features. Finally, the most used benchmarks for this problem are presented and an experimental comparison among several state-of-the-art approaches is reported which allows to quantify the performance improvement obtained by the approach proposed in this chapter.

Download Full-text

A Multichannel Model for Microbial Key Event Extraction Based on Feature Fusion and Attention Mechanism

Security and Communication Networks ◽

10.1155/2021/7800144 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Peng Li ◽

Qian Wang

Keyword(s):

Sentiment Analysis ◽

Feature Fusion ◽

Attention Mechanism ◽

Event Extraction ◽

Semantic Features ◽

Analysis Model ◽

Global Features ◽

Public Health Emergencies ◽

Multimodal Information ◽

Local And Global Features

In order to further mine the deep semantic information of the microbial text of public health emergencies, this paper proposes a multichannel microbial sentiment analysis model MCMF-A. Firstly, we use word2vec and fastText to generate word vectors in the feature vector embedding layer and fuse them with lexical and location feature vectors; secondly, we build a multichannel layer based on CNN and BiLSTM to extract local and global features of the microbial text; then we build an attention mechanism layer to extract the important semantic features of the microbial text; thirdly, we merge the multichannel output in the fusion layer and use soft; finally, the results are merged in the fusion layer, and a surtax function is used in the output layer for sentiment classification. The results show that the F1 value of the MCMF-A sentiment analysis model reaches 90.21%, which is 9.71% and 9.14% higher than the benchmark CNN and BiLSTM models, respectively. The constructed dataset is small in size, and the multimodal information such as images and speech has not been considered.

Download Full-text

Alignment of Local and Global Features from Multiple Layers of Convolutional Neural Network for Image Classification

2019 32nd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI) ◽

10.1109/sibgrapi.2019.00040 ◽

2019 ◽

Author(s):

Fernando Pereira dos Santos ◽

Moacir Antonelli Ponti

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Image Classification ◽

Global Features ◽

Local And Global Features

Download Full-text

HCNN: A Neural Network Model for Combining Local and Global Features Towards Human-Like Classification

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001416550041 ◽

2015 ◽

Vol 30 (01) ◽

pp. 1655004 ◽

Cited By ~ 3

Author(s):

Tielin Zhang ◽

Yi Zeng ◽

Bo Xu

Keyword(s):

Neural Network ◽

Visual System ◽

Image Classification ◽

Network Model ◽

Neural Network Model ◽

Human Visual System ◽

Experimental Results ◽

Natural Images ◽

Global Features ◽

Local And Global Features

Brain-inspired algorithms such as convolutional neural network (CNN) have helped machine vision systems to achieve state-of-the-art performance for various tasks (e.g. image classification). However, CNNs mainly rely on local features (e.g. hierarchical features of points and angles from images), while important global structured features such as contour features are lost. Global understanding of natural objects is considered to be essential characteristics that the human visual system follows, and for developing human-like visual systems, the lost of consideration from this perspective may lead to inevitable failure on certain tasks. Experimental results have proved that well-trained CNN classifier cannot correctly distinguish fooling images (in which some local features from the natural images are chaotically distributed) from natural images. For example, a picture that is composed of yellow–black bars will be recognized as school bus with very high confidence by CNN. On the contrary, human visual system focuses on both the texture and contour features to form representation of images and would not mis-take them. In order to solve the upper problem, we propose a neural network model, named as histogram of oriented gradient (HOG) improved CNN (HCNN), that combines local and global features towards human-like classification based on CNN and HOG. The experimental results on MNIST datasets and part of ImageNet datasets show that HCNN outperforms traditional CNN for object classification with fooling images, which indicates the feasibility, accuracy and potential effectiveness of HCNN for solving image classification problem.

Download Full-text

Lightweight image classifier using dilated and depthwise separable convolutions

Journal of Cloud Computing Advances Systems and Applications ◽

10.1186/s13677-020-00203-9 ◽

2020 ◽

Vol 9 (1) ◽

Author(s):

Wei Sun ◽

Xiaorui Zhang ◽

Xiaozheng He

Keyword(s):

Image Classification ◽

Classification Accuracy ◽

Image Resolution ◽

Semantic Features ◽

Compression Process ◽

Global Features ◽

Dilated Convolution ◽

Proposed Model ◽

Convolution Process ◽

And Storage

Abstract The image classification based on cloud computing suffers from difficult deployment as the network depth and data volume increase. Due to the depth of the model and the convolution process of each layer will produce a great amount of calculation, the GPU and storage performance of the device are extremely demanding, and the GPU and storage devices equipped on the embedded and mobile terminals cannot support large models. So it is necessary to compress the model so that the model can be deployed on these devices. Meanwhile, traditional compression based methods often miss many global features during the compression process, resulting in low classification accuracy. To solve the problem, this paper proposes a lightweight neural network model based on dilated convolution and depthwise separable convolution with twenty-nine layers for image classification. The proposed model employs the dilated convolution to expand the receptive field during the convolution process while maintaining the number of convolution parameters, which can extract more high-level global semantic features to improve the classification accuracy. Also, the depthwise separable convolution is applied to reduce the network parameters and computational complexity in convolution operations, which reduces the size of the network. The proposed model introduces three hyperparameters: width multiplier, image resolution, and dilated rate, to compress the network on the premise of ensuring accuracy. The experimental results show that compared with GoogleNet, the network proposed in this paper improves the classification accuracy by nearly 1%, and the number of parameters is reduced by 3.7 million.

Download Full-text

Weakly Supervised Fine-Grained Image Classification via Salient Region Localization and Different Layer Feature Fusion

Applied Sciences ◽

10.3390/app10134652 ◽

2020 ◽

Vol 10 (13) ◽

pp. 4652

Author(s):

Fangxiong Chen ◽

Guoheng Huang ◽

Jiaying Lan ◽

Yanhui Wu ◽

Chi-Man Pun ◽

...

Keyword(s):

Image Classification ◽

Feature Fusion ◽

Classification Performance ◽

Training Data ◽

Classification Model ◽

Global Features ◽

Salient Region ◽

Fine Grained ◽

Proposed Model ◽

Weakly Supervised

The fine-grained image classification task is about differentiating between different object classes. The difficulties of the task are large intra-class variance and small inter-class variance. For this reason, improving models’ accuracies on the task heavily relies on discriminative parts’ annotations and regional parts’ annotations. Such delicate annotations’ dependency causes the restriction on models’ practicability. To tackle this issue, a saliency module based on a weakly supervised fine-grained image classification model is proposed by this article. Through our salient region localization module, the proposed model can localize essential regional parts with the use of saliency maps, while only image class annotations are provided. Besides, the bilinear attention module can improve the performance on feature extraction by using higher- and lower-level layers of the network to fuse regional features with global features. With the application of the bilinear attention architecture, we propose the different layer feature fusion module to improve the expression ability of model features. We tested and verified our model on public datasets released specifically for fine-grained image classification. The results of our test show that our proposed model can achieve close to state-of-the-art classification performance on various datasets, while only the least training data are provided. Such a result indicates that the practicality of our model is incredibly improved since fine-grained image datasets are expensive.

Download Full-text

Ensemble of Neural Networks for Automated Cell Phenotype Image Classification

Machine Learning ◽

10.4018/978-1-60960-818-7.ch405 ◽

2012 ◽

pp. 793-816

Author(s):

Loris Nanni ◽

Alessandra Lumini

Keyword(s):

Neural Networks ◽

Image Classification ◽

Subcellular Location ◽

Classification Problem ◽

Machine Learning Techniques ◽

Experimental Comparison ◽

Cell Phenotype ◽

Learning Techniques ◽

Problems And Solutions ◽

Local And Global Features

Download Full-text

Two-Level Attentions and Grouping Attention Convolutional Network for Fine-Grained Image Classification

Applied Sciences ◽

10.3390/app9091939 ◽

2019 ◽

Vol 9 (9) ◽

pp. 1939 ◽

Cited By ~ 5

Author(s):

Yadong Yang ◽

Xiaofeng Wang ◽

Quan Zhao ◽

Tingting Sui

Keyword(s):

Image Classification ◽

Feature Fusion ◽

Recognition Rate ◽

Fine Tuning ◽

Semantic Features ◽

Convolutional Network ◽

Attention Model ◽

Fine Grained ◽

Visual Attention Mechanism ◽

High Level

The focus of fine-grained image classification tasks is to ignore interference information and grasp local features. This challenge is what the visual attention mechanism excels at. Firstly, we have constructed a two-level attention convolutional network, which characterizes the object-level attention and the pixel-level attention. Then, we combine the two kinds of attention through a second-order response transform algorithm. Furthermore, we propose a clustering-based grouping attention model, which implies the part-level attention. The grouping attention method is to stretch all the semantic features, in a deeper convolution layer of the network, into vectors. These vectors are clustered by a vector dot product, and each category represents a special semantic. The grouping attention algorithm implements the functions of group convolution and feature clustering, which can greatly reduce the network parameters and improve the recognition rate and interpretability of the network. Finally, the low-level visual features and high-level semantic information are merged by a multi-level feature fusion method to accurately classify fine-grained images. We have achieved good results without using pre-training networks and fine-tuning techniques.

Download Full-text