Parts Semantic Segmentation Aware Representation Learning for Person Re-Identification

Hua Gao; Shengyong Chen; Zhaosheng Zhang

doi:10.3390/app9061239

Parts Semantic Segmentation Aware Representation Learning for Person Re-Identification

Applied Sciences ◽

10.3390/app9061239 ◽

2019 ◽

Vol 9 (6) ◽

pp. 1239 ◽

Cited By ~ 1

Author(s):

Hua Gao ◽

Shengyong Chen ◽

Zhaosheng Zhang

Keyword(s):

Semantic Segmentation ◽

Representation Learning ◽

Local Features ◽

Point Of View ◽

Body Parts ◽

Feature Maps ◽

Benchmark Datasets ◽

Background Clutter ◽

Three Body ◽

Global And Local

Person re-identification is a typical computer vision problem which aims at matching pedestrians across disjoint camera views. It is challenging due to the misalignment of body parts caused by pose variations, background clutter, detection errors, camera point of view variation, different accessories and occlusion. In this paper, we propose a person re-identification network which fuses global and local features, to deal with part misalignment problem. The network is a four-branch convolutional neural network (CNN) which learns global person appearance and local features of three human body parts respectively. Local patches, including the head, torso and lower body, are segmented by using a U_Net semantic segmentation CNN architecture. All four feature maps are then concatenated and fused to represent a person image. We propose a DropParts method to solve the parts missing problem, with which the local features are weighed according to the number of parts found by semantic segmentation. Since three body parts are well aligned, the approach significantly improves person re-identification. Experiments on the standard benchmark datasets, such as Market1501, CUHK03 and DukeMTMC-reID datasets, show the effectiveness of our proposed pipeline.

Download Full-text

A Global-Local Blur Disentangling Network for Dynamic Scene Deblurring

Applied Sciences ◽

10.3390/app11052174 ◽

2021 ◽

Vol 11 (5) ◽

pp. 2174

Author(s):

Xiaoguang Li ◽

Feifan Yang ◽

Jianglu Huang ◽

Li Zhuo

Keyword(s):

Local Features ◽

Attention Mechanism ◽

Experimental Results ◽

Dynamic Scene ◽

Feature Maps ◽

Training Scheme ◽

Real Scene ◽

Global And Local

Images captured in a real scene usually suffer from complex non-uniform degradation, which includes both global and local blurs. It is difficult to handle the complex blur variances by a unified processing model. We propose a global-local blur disentangling network, which can effectively extract global and local blur features via two branches. A phased training scheme is designed to disentangle the global and local blur features, that is the branches are trained with task-specific datasets, respectively. A branch attention mechanism is introduced to dynamically fuse global and local features. Complex blurry images are used to train the attention module and the reconstruction module. The visualized feature maps of different branches indicated that our dual-branch network can decouple the global and local blur features efficiently. Experimental results show that the proposed dual-branch blur disentangling network can improve both the subjective and objective deblurring effects for real captured images.

Download Full-text

Memory-Augmented Transformer for Remote Sensing Image Semantic Segmentation

Remote Sensing ◽

10.3390/rs13224518 ◽

2021 ◽

Vol 13 (22) ◽

pp. 4518

Author(s):

Xin Zhao ◽

Jiayi Guo ◽

Yueting Zhang ◽

Yirong Wu

Keyword(s):

Remote Sensing ◽

Feature Extraction ◽

Semantic Segmentation ◽

Local Features ◽

Local Feature ◽

Global Information ◽

Deep Convolutional Neural Networks ◽

Global Representation ◽

Local Feature Extraction ◽

Global And Local

The semantic segmentation of remote sensing images requires distinguishing local regions of different classes and exploiting a uniform global representation of the same-class instances. Such requirements make it necessary for the segmentation methods to extract discriminative local features between different classes and to explore representative features for all instances of a given class. While common deep convolutional neural networks (DCNNs) can effectively focus on local features, they are limited by their receptive field to obtain consistent global information. In this paper, we propose a memory-augmented transformer (MAT) to effectively model both the local and global information. The feature extraction pipeline of the MAT is split into a memory-based global relationship guidance module and a local feature extraction module. The local feature extraction module mainly consists of a transformer, which is used to extract features from the input images. The global relationship guidance module maintains a memory bank for the consistent encoding of the global information. Global guidance is performed by memory interaction. Bidirectional information flow between the global and local branches is conducted by a memory-query module, as well as a memory-update module, respectively. Experiment results on the ISPRS Potsdam and ISPRS Vaihingen datasets demonstrated that our method can perform competitively with state-of-the-art methods.

Download Full-text

Class-Wise Fully Convolutional Network for Semantic Segmentation of Remote Sensing Images

Remote Sensing ◽

10.3390/rs13163211 ◽

2021 ◽

Vol 13 (16) ◽

pp. 3211

Author(s):

Tian Tian ◽

Zhengquan Chu ◽

Qian Hu ◽

Li Ma

Keyword(s):

Remote Sensing ◽

Image Interpretation ◽

Semantic Segmentation ◽

Remote Sensing Images ◽

Feature Maps ◽

Convolutional Network ◽

Fully Convolutional Network ◽

Semantic Labeling ◽

Benchmark Datasets ◽

Semantic Label

Semantic segmentation is a fundamental task in remote sensing image interpretation, which aims to assign a semantic label for every pixel in the given image. Accurate semantic segmentation is still challenging due to the complex distributions of various ground objects. With the development of deep learning, a series of segmentation networks represented by fully convolutional network (FCN) has made remarkable progress on this problem, but the segmentation accuracy is still far from expectations. This paper focuses on the importance of class-specific features of different land cover objects, and presents a novel end-to-end class-wise processing framework for segmentation. The proposed class-wise FCN (C-FCN) is shaped in the form of an encoder-decoder structure with skip-connections, in which the encoder is shared to produce general features for all categories and the decoder is class-wise to process class-specific features. To be detailed, class-wise transition (CT), class-wise up-sampling (CU), class-wise supervision (CS), and class-wise classification (CC) modules are designed to achieve the class-wise transfer, recover the resolution of class-wise feature maps, bridge the encoder and modified decoder, and implement class-wise classifications, respectively. Class-wise and group convolutions are adopted in the architecture with regard to the control of parameter numbers. The method is tested on the public ISPRS 2D semantic labeling benchmark datasets. Experimental results show that the proposed C-FCN significantly improves the segmentation performances compared with many state-of-the-art FCN-based networks, revealing its potentials on accurate segmentation of complex remote sensing images.

Download Full-text

Semantics-Aligned Representation Learning for Person Re-Identification

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6775 ◽

2020 ◽

Vol 34 (07) ◽

pp. 11173-11180 ◽

Cited By ~ 3

Author(s):

Xin Jin ◽

Cuiling Lan ◽

Wenjun Zeng ◽

Guoqiang Wei ◽

Zhibo Chen

Keyword(s):

State Of The Art ◽

Representation Learning ◽

The State ◽

Feature Representation ◽

Texture Image ◽

Computationally Efficient ◽

Feature Maps ◽

Benchmark Datasets ◽

Texture Generation ◽

Base Network

Person re-identification (reID) aims to match person images to retrieve the ones with the same identity. This is a challenging task, as the images to be matched are generally semantically misaligned due to the diversity of human poses and capture viewpoints, incompleteness of the visible bodies (due to occlusion), etc. In this paper, we propose a framework that drives the reID network to learn semantics-aligned feature representation through delicate supervision designs. Specifically, we build a Semantics Aligning Network (SAN) which consists of a base network as encoder (SA-Enc) for re-ID, and a decoder (SA-Dec) for reconstructing/regressing the densely semantics aligned full texture image. We jointly train the SAN under the supervisions of person re-identification and aligned texture generation. Moreover, at the decoder, besides the reconstruction loss, we add Triplet ReID constraints over the feature maps as the perceptual losses. The decoder is discarded in the inference and thus our scheme is computationally efficient. Ablation studies demonstrate the effectiveness of our design. We achieve the state-of-the-art performances on the benchmark datasets CUHK03, Market1501, MSMT17, and the partial person reID dataset Partial REID.

Download Full-text

SceneEncoder: Scene-Aware Semantic Segmentation of Point Clouds with A Learnable Scene Descriptor

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/84 ◽

2020 ◽

Cited By ~ 1

Author(s):

Jiachen Xu ◽

Jingyu Gong ◽

Jie Zhou ◽

Xin Tan ◽

Yuan Xie ◽

...

Keyword(s):

Essential Role ◽

State Of The Art ◽

Semantic Segmentation ◽

Point Clouds ◽

Local Features ◽

Local Region ◽

Global Information ◽

Benchmark Datasets ◽

Distinguishing Features ◽

Point Level

Besides local features, global information plays an essential role in semantic segmentation, while recent works usually fail to explicitly extract the meaningful global information and make full use of it. In this paper, we propose a SceneEncoder module to impose a scene-aware guidance to enhance the effect of global information. The module predicts a scene descriptor, which learns to represent the categories of objects existing in the scene and directly guides the point-level semantic segmentation through filtering out categories not belonging to this scene. Additionally, to alleviate segmentation noise in local region, we design a region similarity loss to propagate distinguishing features to their own neighboring points with the same label, leading to the enhancement of the distinguishing ability of point-wise features. We integrate our methods into several prevailing networks and conduct extensive experiments on benchmark datasets ScanNet and ShapeNet. Results show that our methods greatly improve the performance of baselines and achieve state-of-the-art performance.

Download Full-text

TextGTL: Graph-based Transductive Learning for Semi-supervised Text Classification via Structure-Sensitive Interpolation

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/369 ◽

2021 ◽

Author(s):

Chen Li ◽

Xutan Peng ◽

Hao Peng ◽

Jianxin Li ◽

Lihong Wang

Keyword(s):

Text Classification ◽

Data Augmentation ◽

Oriented Graph ◽

Representation Learning ◽

Free Text ◽

Transductive Learning ◽

Benchmark Datasets ◽

Significant Performance ◽

Performance Gains ◽

Global And Local

Compared with traditional sequential learning models, graph-based neural networks exhibit excellent properties when encoding text, such as the capacity of capturing global and local information simultaneously. Especially in the semi-supervised scenario, propagating information along the edge can effectively alleviate the sparsity of labeled data. In this paper, beyond the existing architecture of heterogeneous word-document graphs, for the first time, we investigate how to construct lightweight non-heterogeneous graphs based on different linguistic information to better serve free text representation learning. Then, a novel semi-supervised framework for text classification that refines graph topology under theoretical guidance and shares information across different text graphs, namely Text-oriented Graph-based Transductive Learning (TextGTL), is proposed. TextGTL also performs attribute space interpolation based on dense substructure in graphs to predict low-entropy labels with high-quality feature nodes for data augmentation. To verify the effectiveness of TextGTL, we conduct extensive experiments on various benchmark datasets, observing significant performance gains over conventional heterogeneous graphs. In addition, we also design ablation studies to dive deep into the validity of components in TextTGL.

Download Full-text

HARP Pro: Hierarchical Representation Learning based on global and local features for social networks

10.18293/seke2021-145 ◽

2021 ◽

Author(s):

Wei Zhang

Keyword(s):

Social Networks ◽

Representation Learning ◽

Local Features ◽

Hierarchical Representation ◽

Global And Local

Download Full-text

Learning Fully Dense Neural Networks for Image Semantic Segmentation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33019283 ◽

2019 ◽

Vol 33 ◽

pp. 9283-9290 ◽

Cited By ~ 3

Author(s):

Mingmin Zhen ◽

Jinglu Wang ◽

Lei Zhou ◽

Tian Fang ◽

Long Quan

Keyword(s):

Neural Network ◽

Neural Networks ◽

Loss Function ◽

Spatial Information ◽

Semantic Segmentation ◽

The Other ◽

Feature Maps ◽

Spatial Reconstruction ◽

Benchmark Datasets ◽

The One

Semantic segmentation is pixel-wise classification which retains critical spatial information. The “feature map reuse” has been commonly adopted in CNN based approaches to take advantage of feature maps in the early layers for the later spatial reconstruction. Along this direction, we go a step further by proposing a fully dense neural network with an encoderdecoder structure that we abbreviate as FDNet. For each stage in the decoder module, feature maps of all the previous blocks are adaptively aggregated to feedforward as input. On the one hand, it reconstructs the spatial boundaries accurately. On the other hand, it learns more efficiently with the more efficient gradient backpropagation. In addition, we propose the boundary-aware loss function to focus more attention on the pixels near the boundary, which boosts the “hard examples” labeling. We have demonstrated the best performance of the FDNet on the two benchmark datasets: PASCAL VOC 2012, NYUDv2 over previous works when not considering training on other datasets.

Download Full-text

Nuclei Segmentation via a Deep Panoptic Model with Semantic Feature Fusion

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/121 ◽

2019 ◽

Cited By ~ 5

Author(s):

Dongnan Liu ◽

Donghao Zhang ◽

Yang Song ◽

Chaoyi Zhang ◽

Fan Zhang ◽

...

Keyword(s):

Feature Fusion ◽

Semantic Segmentation ◽

Local Features ◽

Information Loss ◽

Nuclei Segmentation ◽

Diagnosis And Prognosis ◽

Segmentation Methods ◽

Global And Local ◽

Instance Segmentation ◽

Segmentation Models

Automated detection and segmentation of individual nuclei in histopathology images is important for cancer diagnosis and prognosis. Due to the high variability of nuclei appearances and numerous overlapping objects, this task still remains challenging. Deep learning based semantic and instance segmentation models have been proposed to address the challenges, but these methods tend to concentrate on either the global or local features and hence still suffer from information loss. In this work, we propose a panoptic segmentation model which incorporates an auxiliary semantic segmentation branch with the instance branch to integrate global and local features. Furthermore, we design a feature map fusion mechanism in the instance branch and a new mask generator to prevent information loss. Experimental results on three different histopathology datasets demonstrate that our method outperforms the state-of-the-art nuclei segmentation methods and popular semantic and instance segmentation models by a large margin.

Download Full-text

Object–Part Registration–Fusion Net for Fine-Grained Image Classification

Symmetry ◽

10.3390/sym13101838 ◽

2021 ◽

Vol 13 (10) ◽

pp. 1838

Author(s):

Chih-Wei Lin ◽

Mengxiang Lin ◽

Jinfu Liu

Keyword(s):

Feature Fusion ◽

Bird Species ◽

Image Understanding ◽

Local Features ◽

Feature Maps ◽

Global Features ◽

Fine Grained ◽

Object Part ◽

Global And Local ◽

Fusion Feature

Classifying fine-grained categories (e.g., bird species, car, and aircraft types) is a crucial problem in image understanding and is difficult due to intra-class and inter-class variance. Most of the existing fine-grained approaches individually utilize various parts and local information of objects to improve the classification accuracy but neglect the mechanism of the feature fusion between the object (global) and object’s parts (local) to reinforce fine-grained features. In this paper, we present a novel framework, namely object–part registration–fusion Net (OR-Net), which considers the mechanism of registration and fusion between an object (global) and its parts’ (local) features for fine-grained classification. Our model learns the fine-grained features from the object of global and local regions and fuses these features with the registration mechanism to reinforce each region’s characteristics in the feature maps. Precisely, OR-Net consists of: (1) a multi-stream feature extraction net, which generates features with global and various local regions of objects; (2) a registration–fusion feature module calculates the dimension and location relationships between global (object) regions and local (parts) regions to generate the registration information and fuses the local features into the global features with registration information to generate the fine-grained feature. Experiments execute symmetric GPU devices with symmetric mini-batch to verify that OR-Net surpasses the state-of-the-art approaches on CUB-200-2011 (Birds), Stanford-Cars, and Stanford-Aircraft datasets.

Download Full-text