Fine-Grained 3D-Attention Prototypes for Few-Shot Learning

Xin Hu; Jun Liu; Jie Ma; Yudai Pan; Lingling Zhang

doi:10.1162/neco_a_01302

Fine-Grained 3D-Attention Prototypes for Few-Shot Learning

Neural Computation ◽

10.1162/neco_a_01302 ◽

2020 ◽

Vol 32 (9) ◽

pp. 1664-1684

Author(s):

Xin Hu ◽

Jun Liu ◽

Jie Ma ◽

Yudai Pan ◽

Lingling Zhang

Keyword(s):

Image Reconstruction ◽

Feature Learning ◽

Local Features ◽

Image Features ◽

Superior Performance ◽

Fine Grained ◽

Learning Module ◽

Class Distribution ◽

Label Distribution ◽

3D Attention

In the real world, a limited number of labeled finely grained images per class can hardly represent the class distribution effectively. Due to the more subtle visual differences in fine-grained images than simple images with obvious objects, that is, there exist smaller interclass and larger intraclass variations. To solve these issues, we propose an end-to-end attention-based model for fine-grained few-shot image classification (AFG) with the recent episode training strategy. It is composed mainly of a feature learning module, an image reconstruction module, and a label distribution module. The feature learning module mainly devises a 3D-Attention mechanism, which considers both the spatial positions and different channel attentions of the image features, in order to learn more discriminative local features to better represent the class distribution. The image reconstruction module calculates the mappings between local features and the original images. It is constrained by a designed loss function as auxiliary supervised information, so that the learning of each local feature does not need extra annotations. The label distribution module is used to predict the label distribution of a given unlabeled sample, and we use the local features to represent the image features for classification. By conducting comprehensive experiments on Mini-ImageNet and three fine-grained data sets, we demonstrate that the proposed model achieves superior performance over the competitors.

Download Full-text

Multi-Level Joint Feature Learning for Person Re-Identification

Algorithms ◽

10.3390/a13050111 ◽

2020 ◽

Vol 13 (5) ◽

pp. 111

Author(s):

Shaojun Wu ◽

Ling Gao

Keyword(s):

Deep Learning ◽

Feature Fusion ◽

Feature Learning ◽

Local Features ◽

Image Features ◽

Learning Networks ◽

Fusion Model ◽

Global Features ◽

Multi Level ◽

High Level

In person re-identification, extracting image features is an important step when retrieving pedestrian images. Most of the current methods only extract global features or local features of pedestrian images. Some inconspicuous details are easily ignored when learning image features, which is not efficient or robust to for scenarios with large differences. In this paper, we propose a Multi-level Feature Fusion model that combines both global features and local features of images through deep learning networks to generate more discriminative pedestrian descriptors. Specifically, we extract local features from different depths of network by the Part-based Multi-level Net to fuse low-to-high level local features of pedestrian images. Global-Local Branches are used to extract the local features and global features at the highest level. The experiments have proved that our deep learning model based on multi-level feature fusion works well in person re-identification. The overall results outperform the state of the art with considerable margins on three widely-used datasets. For instance, we achieve 96% Rank-1 accuracy on the Market-1501 dataset and 76.1% mAP on the DukeMTMC-reID dataset, outperforming the existing works by a large margin (more than 6%).

Download Full-text

Multi-Neighborhood Convolutional Networks

Vision Letters ◽

10.15353/vsnl.v1i1.56 ◽

2015 ◽

Vol 1 (1) ◽

Author(s):

Elnaz Barshan ◽

Paul Fieguth ◽

Alexander Wong

Keyword(s):

Feature Learning ◽

Scale Space ◽

Image Features ◽

Superior Performance ◽

Convolutional Networks ◽

Multi Scale ◽

Image Characteristics ◽

Single Scale ◽

Nonlinear Scale

We explore the role of scale for improved feature learning in convolutional networks. We propose multi-neighborhood convolutional networks, designed to learn image features at different levels of detail. Utilizing nonlinear scale-space models, the proposed multineighborhood model can effectively capture fine-scale image characteristics (i.e., appearance) using a small-size neighborhood, while coarse-scale image structures (i.e., shape) are detected through a larger neighborhood. The experimental results demonstrate the superior performance of the proposed multi-scale multi-neighborhood models over their single-scale counterparts.

Download Full-text

Invariant Image Representation Using Novel Fractional-Order Polar Harmonic Fourier Moments

Sensors ◽

10.3390/s21041544 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1544

Author(s):

Chunpeng Wang ◽

Hongling Gao ◽

Meihong Yang ◽

Jian Li ◽

Bin Ma ◽

...

Keyword(s):

Image Reconstruction ◽

Fractional Order ◽

Continuous Functions ◽

Kernel Functions ◽

Superior Performance ◽

Image Description ◽

Orthogonal Moments ◽

Integer Order ◽

Geometric Invariance ◽

Order Continuous

Continuous orthogonal moments, for which continuous functions are used as kernel functions, are invariant to rotation and scaling, and they have been greatly developed over the recent years. Among continuous orthogonal moments, polar harmonic Fourier moments (PHFMs) have superior performance and strong image description ability. In order to improve the performance of PHFMs in noise resistance and image reconstruction, PHFMs, which can only take integer numbers, are extended to fractional-order polar harmonic Fourier moments (FrPHFMs) in this paper. Firstly, the radial polynomials of integer-order PHFMs are modified to obtain fractional-order radial polynomials, and FrPHFMs are constructed based on the fractional-order radial polynomials; subsequently, the strong reconstruction ability, orthogonality, and geometric invariance of the proposed FrPHFMs are proven; and, finally, the performance of the proposed FrPHFMs is compared with that of integer-order PHFMs, fractional-order radial harmonic Fourier moments (FrRHFMs), fractional-order polar harmonic transforms (FrPHTs), and fractional-order Zernike moments (FrZMs). The experimental results show that the FrPHFMs constructed in this paper are superior to integer-order PHFMs and other fractional-order continuous orthogonal moments in terms of performance in image reconstruction and object recognition, as well as that the proposed FrPHFMs have strong image description ability and good stability.

Download Full-text

Knowledge Enhanced LSTM for Coreference Resolution on Biomedical Texts

Bioinformatics ◽

10.1093/bioinformatics/btab153 ◽

2021 ◽

Author(s):

Yufei Li ◽

Xiaoyong Ma ◽

Xiangyu Zhou ◽

Pengzhen Cheng ◽

Kai He ◽

...

Keyword(s):

Information Integration ◽

Short Term Memory ◽

Superior Performance ◽

Supplementary Information ◽

Specific Information ◽

Coreference Resolution ◽

Fine Grained ◽

Domain Specific ◽

Memory Network ◽

Biomedical Texts

Abstract Motivation Bio-entity Coreference Resolution focuses on identifying the coreferential links in biomedical texts, which is crucial to complete bio-events’ attributes and interconnect events into bio-networks. Previously, as one of the most powerful tools, deep neural network-based general domain systems are applied to the biomedical domain with domain-specific information integration. However, such methods may raise much noise due to its insufficiency of combining context and complex domain-specific information. Results In this paper, we explore how to leverage the external knowledge base in a fine-grained way to better resolve coreference by introducing a knowledge-enhanced Long Short Term Memory network (LSTM), which is more flexible to encode the knowledge information inside the LSTM. Moreover, we further propose a knowledge attention module to extract informative knowledge effectively based on contexts. The experimental results on the BioNLP and CRAFT datasets achieve state-of-the-art performance, with a gain of 7.5 F1 on BioNLP and 10.6 F1 on CRAFT. Additional experiments also demonstrate superior performance on the cross-sentence coreferences. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Unsupervised Deep Feature Learning for Remote Sensing Image Retrieval

Remote Sensing ◽

10.3390/rs10081243 ◽

2018 ◽

Vol 10 (8) ◽

pp. 1243 ◽

Cited By ~ 29

Author(s):

Xu Tang ◽

Xiangrong Zhang ◽

Fang Liu ◽

Licheng Jiao

Keyword(s):

Remote Sensing ◽

Image Retrieval ◽

Feature Learning ◽

Remote Sensing Image ◽

Code Word ◽

Image Features ◽

Learning Method ◽

L1 Norm ◽

Image Descriptor ◽

Image Archives

Due to the specific characteristics and complicated contents of remote sensing (RS) images, remote sensing image retrieval (RSIR) is always an open and tough research topic in the RS community. There are two basic blocks in RSIR, including feature learning and similarity matching. In this paper, we focus on developing an effective feature learning method for RSIR. With the help of the deep learning technique, the proposed feature learning method is designed under the bag-of-words (BOW) paradigm. Thus, we name the obtained feature deep BOW (DBOW). The learning process consists of two parts, including image descriptor learning and feature construction. First, to explore the complex contents within the RS image, we extract the image descriptor in the image patch level rather than the whole image. In addition, instead of using the handcrafted feature to describe the patches, we propose the deep convolutional auto-encoder (DCAE) model to deeply learn the discriminative descriptor for the RS image. Second, the k-means algorithm is selected to generate the codebook using the obtained deep descriptors. Then, the final histogrammic DBOW features are acquired by counting the frequency of the single code word. When we get the DBOW features from the RS images, the similarities between RS images are measured using L1-norm distance. Then, the retrieval results can be acquired according to the similarity order. The encouraging experimental results counted on four public RS image archives demonstrate that our DBOW feature is effective for the RSIR task. Compared with the existing RS image features, our DBOW can achieve improved behavior on RSIR.

Download Full-text

Hybrid Collaborative Representation for Remote-Sensing Image Scene Classification

Remote Sensing ◽

10.3390/rs10121934 ◽

2018 ◽

Vol 10 (12) ◽

pp. 1934 ◽

Cited By ~ 11

Author(s):

Bao-Di Liu ◽

Wen-Yang Xie ◽

Jie Meng ◽

Ye Li ◽

Yanjiang Wang

Keyword(s):

Remote Sensing ◽

Test Sample ◽

Remote Sensing Image ◽

Image Features ◽

Superior Performance ◽

Collaborative Representation ◽

Great Success ◽

Specific Class ◽

Remote Sensing Images ◽

Training Samples

In recent years, the collaborative representation-based classification (CRC) method has achieved great success in visual recognition by directly utilizing training images as dictionary bases. However, it describes a test sample with all training samples to extract shared attributes and does not consider the representation of the test sample with the training samples in a specific class to extract the class-specific attributes. For remote-sensing images, both the shared attributes and class-specific attributes are important for classification. In this paper, we propose a hybrid collaborative representation-based classification approach. The proposed method is capable of improving the performance of classifying remote-sensing images by embedding the class-specific collaborative representation to conventional collaborative representation-based classification. Moreover, we extend the proposed method to arbitrary kernel space to explore the nonlinear characteristics hidden in remote-sensing image features to further enhance classification performance. Extensive experiments on several benchmark remote-sensing image datasets were conducted and clearly demonstrate the superior performance of our proposed algorithm to state-of-the-art approaches.

Download Full-text

Centralized Ranking Loss with Weakly Supervised Localization for Fine-Grained Object Retrieval

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/171 ◽

2018 ◽

Cited By ~ 9

Author(s):

Xiawu Zheng ◽

Rongrong Ji ◽

Xiaoshuai Sun ◽

Yongjian Wu ◽

Feiyue Huang ◽

...

Keyword(s):

State Of The Art ◽

Feature Learning ◽

Target Object ◽

Object Retrieval ◽

Unified Framework ◽

Fine Grained ◽

Discriminative Feature ◽

Triplet Loss ◽

Weakly Supervised ◽

Ranking Loss

Fine-grained object retrieval has attracted extensive research focus recently. Its state-of-the-art schemesare typically based upon convolutional neural network (CNN) features. Despite the extensive progress, two issues remain open. On one hand, the deep features are coarsely extracted at image level rather than precisely at object level, which are interrupted by background clutters. On the other hand, training CNN features with a standard triplet loss is time consuming and incapable to learn discriminative features. In this paper, we present a novel fine-grained object retrieval scheme that conquers these issues in a unified framework. Firstly, we introduce a novel centralized ranking loss (CRL), which achieves a very efficient (1,000times training speedup comparing to the triplet loss) and discriminative feature learning by a ?centralized? global pooling. Secondly, a weakly supervised attractive feature extraction is proposed, which segments object contours with top-down saliency. Consequently, the contours are integrated into the CNN response map to precisely extract features ?within? the target object. Interestingly, we have discovered that the combination of CRL and weakly supervised learning can reinforce each other. We evaluate the performance ofthe proposed scheme on widely-used benchmarks including CUB200-2011 and CARS196. We havereported significant gains over the state-of-the-art schemes, e.g., 5.4% over SCDA [Wei et al., 2017]on CARS196, and 3.7% on CUB200-2011.

Download Full-text

Self-Amplificated Network: Learning fine-grained learner with few samples

Journal of Physics Conference Series ◽

10.1088/1742-6596/2050/1/012006 ◽

2021 ◽

Vol 2050 (1) ◽

pp. 012006

Author(s):

Xili Dai ◽

Chunmei Ma ◽

Jingwei Sun ◽

Tao Zhang ◽

Haigang Gong ◽

...

Keyword(s):

Deep Neural Networks ◽

Classification Problem ◽

The Self ◽

Superior Performance ◽

Query Image ◽

Network Learning ◽

Fine Grained ◽

Support Set ◽

Meta Learning ◽

Benchmark Datasets

Abstract Training deep neural networks from only a few examples has been an interesting topic that motivated few shot learning. In this paper, we study the fine-grained image classification problem in a challenging few-shot learning setting, and propose the Self-Amplificated Network (SAN), a method based on meta-learning to tackle this problem. The SAN model consists of three parts, which are the Encoder, Amplification and Similarity Modules. The Encoder Module encodes a fine-grained image input into a feature vector. The Amplification Module is used to amplify subtle differences between fine-grained images based on the self attention mechanism which is composed of multi-head attention. The Similarity Module measures how similar the query image and the support set are in order to determine the classification result. In-depth experiments on three benchmark datasets have showcased that our network achieves superior performance over the competing baselines.

Download Full-text

Person Reidentification Model Based on Multiattention Modules and Multiscale Residuals

Complexity ◽

10.1155/2021/6673461 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Yongyi Li ◽

Shiqi Wang ◽

Shuang Dong ◽

Xueling Lv ◽

Changzhi Lv ◽

...

Keyword(s):

Local Features ◽

Attention Mechanism ◽

Experimental Results ◽

Original Network ◽

Fine Grained ◽

Backbone Network ◽

Model Based ◽

Local Branch ◽

Feature Expression ◽

Global And Local

At present, person reidentification based on attention mechanism has attracted many scholars’ interests. Although attention module can improve the representation ability and reidentification accuracy of Re-ID model to a certain extent, it depends on the coupling of attention module and original network. In this paper, a person reidentification model that combines multiple attentions and multiscale residuals is proposed. The model introduces combined attention fusion module and multiscale residual fusion module in the backbone network ResNet 50 to enhance the feature flow between residual blocks and better fuse multiscale features. Furthermore, a global branch and a local branch are designed and applied to enhance the channel aggregation and position perception ability of the network by utilizing the dual ensemble attention module, as along as the fine-grained feature expression is obtained by using multiproportion block and reorganization. Thus, the global and local features are enhanced. The experimental results on Market-1501 dataset and DukeMTMC-reID dataset show that the indexes of the presented model, especially Rank-1 accuracy, reach 96.20% and 89.59%, respectively, which can be considered as a progress in Re-ID.

Download Full-text

A Grid Feature-Point Selection Method for Large-Scale Street View Image Retrieval Based on Deep Local Features

Remote Sensing ◽

10.3390/rs12233978 ◽

2020 ◽

Vol 12 (23) ◽

pp. 3978

Author(s):

Tianyou Chu ◽

Yumin Chen ◽

Liheng Huang ◽

Zhiqiang Xu ◽

Huangyuan Tan

Keyword(s):

Image Retrieval ◽

Large Scale ◽

Local Features ◽

Selection Method ◽

Image Features ◽

Feature Point ◽

Feature Points ◽

Point Selection ◽

Retrieval Task ◽

Street View

Street view image retrieval aims to estimate the image locations by querying the nearest neighbor images with the same scene from a large-scale reference dataset. Query images usually have no location information and are represented by features to search for similar results. The deep local features (DELF) method shows great performance in the landmark retrieval task, but the method extracts many features so that the feature file is too large to load into memory when training the features index. The memory size is limited, and removing the part of features simply causes a great retrieval precision loss. Therefore, this paper proposes a grid feature-point selection method (GFS) to reduce the number of feature points in each image and minimize the precision loss. Convolutional Neural Networks (CNNs) are constructed to extract dense features, and an attention module is embedded into the network to score features. GFS divides the image into a grid and selects features with local region high scores. Product quantization and an inverted index are used to index the image features to improve retrieval efficiency. The retrieval performance of the method is tested on a large-scale Hong Kong street view dataset, and the results show that the GFS reduces feature points by 32.27–77.09% compared with the raw feature. In addition, GFS has a 5.27–23.59% higher precision than other methods.

Download Full-text