HROM: Learning High-Resolution Representation and Object-Aware Masks for Visual Object Tracking

Dawei Zhang; Zhonglong Zheng; Tianxiang Wang; Yiran He

doi:10.3390/s20174807

MR-InpaintNet: Toward Deep Multi-Resolution Learning for Progressive Image Inpainting

10.36227/techrxiv.16641241 ◽

2021 ◽

Author(s):

Huan Zhang ◽

Zhao Zhang ◽

Haijun Zhang ◽

Yi Yang ◽

Shuicheng Yan ◽

...

Keyword(s):

Deep Learning ◽

High Resolution ◽

Semantic Information ◽

Feature Fusion ◽

Image Inpainting ◽

Feature Learning ◽

Low Resolution ◽

Resolution Image ◽

Texture Information ◽

Multiple Resolutions

<div>Deep learning based image inpainting methods have improved the performance greatly due to powerful representation ability of deep learning. However, current deep inpainting methods still tend to produce unreasonable structure and blurry texture, implying that image inpainting is still a challenging topic due to the ill-posed property of the task. To address these issues, we propose a novel deep multi-resolution learning-based progressive image inpainting method, termed MR-InpaintNet, which takes the damaged images of different resolutions as input and then fuses the multi-resolution features for repairing the damaged images. The idea is motivated by the fact that images of different resolutions can provide different levels of feature information. Specifically, the low-resolution image provides strong semantic information and the high-resolution image offers detailed texture information. The middle-resolution image can be used to reduce the gap between low-resolution and high-resolution images, which can further refine the inpainting result. To fuse and improve the multi-resolution features, a novel multi-resolution feature learning (MRFL) process is designed, which is consisted of a multi-resolution feature fusion (MRFF) module, an adaptive feature enhancement (AFE) module and a memory enhanced mechanism (MEM) module for information preservation. Then, the refined multi-resolution features contain both rich semantic information and detailed texture information from multiple resolutions. We further handle the refined multiresolution features by the decoder to obtain the recovered image. Extensive experiments on the Paris Street View, Places2 and CelebA-HQ datasets demonstrate that our proposed MRInpaintNet can effectively recover the textures and structures, and performs favorably against state-of-the-art methods.</div>

Download Full-text

Learning Soft Mask Based Feature Fusion with Channel and Spatial Attention for Robust Visual Object Tracking

Sensors ◽

10.3390/s20144021 ◽

2020 ◽

Vol 20 (14) ◽

pp. 4021 ◽

Cited By ~ 2

Author(s):

Mustansar Fiaz ◽

Arif Mahmood ◽

Soon Ki Jung

Keyword(s):

Object Tracking ◽

Spatial Attention ◽

Feature Fusion ◽

State Of The Art ◽

Feature Representation ◽

Visual Object ◽

Target Feature ◽

Visual Object Tracking ◽

Low Level ◽

Benchmark Datasets

We propose to improve the visual object tracking by introducing a soft mask based low-level feature fusion technique. The proposed technique is further strengthened by integrating channel and spatial attention mechanisms. The proposed approach is integrated within a Siamese framework to demonstrate its effectiveness for visual object tracking. The proposed soft mask is used to give more importance to the target regions as compared to the other regions to enable effective target feature representation and to increase discriminative power. The low-level feature fusion improves the tracker robustness against distractors. The channel attention is used to identify more discriminative channels for better target representation. The spatial attention complements the soft mask based approach to better localize the target objects in challenging tracking scenarios. We evaluated our proposed approach over five publicly available benchmark datasets and performed extensive comparisons with 39 state-of-the-art tracking algorithms. The proposed tracker demonstrates excellent performance compared to the existing state-of-the-art trackers.

Download Full-text

A Two-Stream Deep Fusion Framework for High-Resolution Aerial Scene Classification

Computational Intelligence and Neuroscience ◽

10.1155/2018/8639367 ◽

2018 ◽

Vol 2018 ◽

pp. 1-13 ◽

Cited By ~ 25

Author(s):

Yunlong Yu ◽

Fuxian Liu

Keyword(s):

High Resolution ◽

Classification Accuracy ◽

Feature Fusion ◽

Saliency Detection ◽

Feature Representation ◽

Aerial Image ◽

Scene Classification ◽

Representation Method ◽

Learning Machine ◽

Elm Classifier

One of the challenging problems in understanding high-resolution remote sensing images is aerial scene classification. A well-designed feature representation method and classifier can improve classification accuracy. In this paper, we construct a new two-stream deep architecture for aerial scene classification. First, we use two pretrained convolutional neural networks (CNNs) as feature extractor to learn deep features from the original aerial image and the processed aerial image through saliency detection, respectively. Second, two feature fusion strategies are adopted to fuse the two different types of deep convolutional features extracted by the original RGB stream and the saliency stream. Finally, we use the extreme learning machine (ELM) classifier for final classification with the fused features. The effectiveness of the proposed architecture is tested on four challenging datasets: UC-Merced dataset with 21 scene categories, WHU-RS dataset with 19 scene categories, AID dataset with 30 scene categories, and NWPU-RESISC45 dataset with 45 challenging scene categories. The experimental results demonstrate that our architecture gets a significant classification accuracy improvement over all state-of-the-art references.

Download Full-text

Low Resolution Information Also Matters: Learning Multi-Resolution Representations for Person Re-Identification

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/179 ◽

2021 ◽

Author(s):

Guoqing Zhang ◽

Yuhao Chen ◽

Weisi Lin ◽

Arun Chandran ◽

Xuan Jing

Keyword(s):

Feature Extraction ◽

High Resolution ◽

Feature Fusion ◽

State Of The Art ◽

Super Resolution ◽

Input Image ◽

Low Resolution ◽

Joint Learning ◽

Novel Method ◽

Valid Information

As a prevailing task in video surveillance and forensics field, person re-identification (re-ID) aims to match person images captured from non-overlapped cameras. In unconstrained scenarios, person images often suffer from the resolution mismatch problem, i.e., Cross-Resolution Person Re-ID. To overcome this problem, most existing methods restore low resolution (LR) images to high resolution (HR) by super-resolution (SR). However, they only focus on the HR feature extraction and ignore the valid information from original LR images. In this work, we explore the influence of resolutions on feature extraction and develop a novel method for cross-resolution person re-ID called Multi-Resolution Representations Joint Learning (MRJL). Our method consists of a Resolution Reconstruction Network (RRN) and a Dual Feature Fusion Network (DFFN). The RRN uses an input image to construct a HR version and a LR version with an encoder and two decoders, while the DFFN adopts a dual-branch structure to generate person representations from multi-resolution images. Comprehensive experiments on five benchmarks verify the superiority of the proposed MRJL over the relevent state-of-the-art methods.

Download Full-text

Cascaded SR-GAN for Scale-Adaptive Low Resolution Person Re-identification

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/541 ◽

2018 ◽

Cited By ~ 25

Author(s):

Zheng Wang ◽

Mang Ye ◽

Fan Yang ◽

Xiang Bai ◽

Shin'ichi Satoh

Keyword(s):

High Resolution ◽

Super Resolution ◽

Feature Representation ◽

Image Feature ◽

Low Resolution ◽

Open World ◽

In Series ◽

High Resolution Images ◽

Public Dataset ◽

Image Super Resolution

Person re-identification (REID) is an important task in video surveillance and forensics applications. Most of previous approaches are based on a key assumption that all person images have uniform and sufficiently high resolutions. Actually, various low-resolutions and scale mismatching always exist in open world REID. We name this kind of problem as Scale-Adaptive Low Resolution Person Re-identification (SALR-REID). The most intuitive way to address this problem is to increase various low-resolutions (not only low, but also with different scales) to a uniform high-resolution. SR-GAN is one of the most competitive image super-resolution deep networks, designed with a fixed upscaling factor. However, it is still not suitable for SALR-REID task, which requires a network not only synthesizing high-resolution images with different upscaling factors, but also extracting discriminative image feature for judging person’s identity. (1) To promote the ability of scale-adaptive upscaling, we cascade multiple SRGANs in series. (2) To supplement the ability of image feature representation, we plug-in a reidentification network. With a unified formulation, a Cascaded Super-Resolution GAN (CSR-GAN) framework is proposed. Extensive evaluations on two simulated datasets and one public dataset demonstrate the advantages of our method over related state-of-the-art methods.

Download Full-text

An Example-Based Super-Resolution Algorithm for Selfie Images

The Scientific World JOURNAL ◽

10.1155/2016/8306342 ◽

2016 ◽

Vol 2016 ◽

pp. 1-12

Author(s):

Jino Hans William ◽

N. Venkateswaran ◽

Srinath Narayanan ◽

Sandeep Ramachandran

Keyword(s):

High Resolution ◽

State Of The Art ◽

Super Resolution ◽

Image Pair ◽

Image Patch ◽

Low Resolution ◽

Pixel Resolution ◽

Efficiency And Effectiveness ◽

Level Information ◽

Resolution Algorithm

A selfie is typically a self-portrait captured using the front camera of a smartphone. Most state-of-the-art smartphones are equipped with a high-resolution (HR) rear camera and a low-resolution (LR) front camera. As selfies are captured by front camera with limited pixel resolution, the fine details in it are explicitly missed. This paper aims to improve the resolution of selfies by exploiting the fine details in HR images captured by rear camera using an example-based super-resolution (SR) algorithm. HR images captured by rear camera carry significant fine details and are used as an exemplar to train an optimal matrix-value regression (MVR) operator. The MVR operator serves as an image-pair priori which learns the correspondence between the LR-HR patch-pairs and is effectively used to super-resolve LR selfie images. The proposed MVR algorithm avoids vectorization of image patch-pairs and preserves image-level information during both learning and recovering process. The proposed algorithm is evaluated for its efficiency and effectiveness both qualitatively and quantitatively with other state-of-the-art SR algorithms. The results validate that the proposed algorithm is efficient as it requires less than 3 seconds to super-resolve LR selfie and is effective as it preserves sharp details without introducing any counterfeit fine details.

Download Full-text

Service-based Processing of Gigapixel Images

10.24132/csrn.2021.3101.29 ◽

2021 ◽

Author(s):

Florian Fregien ◽

Sebastian Pasewaldt ◽

Jürgen Döllner ◽

Matthias Trapp

Keyword(s):

High Resolution ◽

High Performance ◽

Digital Images ◽

Concept Design ◽

Digital Cameras ◽

Low Resolution ◽

Design And Implementation ◽

Performance Requirements ◽

Runtime Performance ◽

Low Resolution Images

With the ongoing improvement of digital cameras and smartphones, more and more people can acquire high- resolution digital images. Due to their size and high performance requirements, such Gigapixel Images (GPIs) are often challenging to process and explore compared to conventional low resolution images. To address this problem, this paper presents a service-based approach for GPI processing in a device-independent way using cloud-based processing. For it, the concept, design, and implementation of GPI processing functionality into service-based architectures is presented and evaluated with respect to advantages, limitations, and runtime performance.

Download Full-text

Rich CNN Features for Water-Body Segmentation from Very High Resolution Aerial and Satellite Imagery

Remote Sensing ◽

10.3390/rs13101912 ◽

2021 ◽

Vol 13 (10) ◽

pp. 1912

Author(s):

Zhili Zhang ◽

Meng Lu ◽

Shunping Ji ◽

Huafen Yu ◽

Chenhui Nie

Keyword(s):

Remote Sensing ◽

Feature Extraction ◽

High Resolution ◽

Satellite Imagery ◽

Water Body ◽

Feature Fusion ◽

Feature Representation ◽

Water Bodies ◽

Feature Maps ◽

Very High

Extracting water-bodies accurately is a great challenge from very high resolution (VHR) remote sensing imagery. The boundaries of a water body are commonly hard to identify due to the complex spectral mixtures caused by aquatic vegetation, distinct lake/river colors, silts near the bank, shadows from the surrounding tall plants, and so on. The diversity and semantic information of features need to be increased for a better extraction of water-bodies from VHR remote sensing images. In this paper, we address these problems by designing a novel multi-feature extraction and combination module. This module consists of three feature extraction sub-modules based on spatial and channel correlations in feature maps at each scale, which extract the complete target information from the local space, larger space, and between-channel relationship to achieve a rich feature representation. Simultaneously, to better predict the fine contours of water-bodies, we adopt a multi-scale prediction fusion module. Besides, to solve the semantic inconsistency of feature fusion between the encoding stage and the decoding stage, we apply an encoder-decoder semantic feature fusion module to promote fusion effects. We carry out extensive experiments in VHR aerial and satellite imagery respectively. The result shows that our method achieves state-of-the-art segmentation performance, surpassing the classic and recent methods. Moreover, our proposed method is robust in challenging water-body extraction scenarios.

Download Full-text

MR-InpaintNet: Toward Deep Multi-Resolution Learning for Progressive Image Inpainting

10.36227/techrxiv.16641241.v1 ◽

2021 ◽

Author(s):

Huan Zhang ◽

Zhao Zhang ◽

Haijun Zhang ◽

Yi Yang ◽

Shuicheng Yan ◽

...

Keyword(s):

Deep Learning ◽

High Resolution ◽

Semantic Information ◽

Feature Fusion ◽

Image Inpainting ◽

Feature Learning ◽

Low Resolution ◽

Resolution Image ◽

Texture Information ◽

Multiple Resolutions

<div>Deep learning based image inpainting methods have improved the performance greatly due to powerful representation ability of deep learning. However, current deep inpainting methods still tend to produce unreasonable structure and blurry texture, implying that image inpainting is still a challenging topic due to the ill-posed property of the task. To address these issues, we propose a novel deep multi-resolution learning-based progressive image inpainting method, termed MR-InpaintNet, which takes the damaged images of different resolutions as input and then fuses the multi-resolution features for repairing the damaged images. The idea is motivated by the fact that images of different resolutions can provide different levels of feature information. Specifically, the low-resolution image provides strong semantic information and the high-resolution image offers detailed texture information. The middle-resolution image can be used to reduce the gap between low-resolution and high-resolution images, which can further refine the inpainting result. To fuse and improve the multi-resolution features, a novel multi-resolution feature learning (MRFL) process is designed, which is consisted of a multi-resolution feature fusion (MRFF) module, an adaptive feature enhancement (AFE) module and a memory enhanced mechanism (MEM) module for information preservation. Then, the refined multi-resolution features contain both rich semantic information and detailed texture information from multiple resolutions. We further handle the refined multiresolution features by the decoder to obtain the recovered image. Extensive experiments on the Paris Street View, Places2 and CelebA-HQ datasets demonstrate that our proposed MRInpaintNet can effectively recover the textures and structures, and performs favorably against state-of-the-art methods.</div>

Download Full-text

Learning a Twofold Siamese Network for RGB-T Object Tracking

Journal of Circuits System and Computers ◽

10.1142/s0218126621500894 ◽

2020 ◽

pp. 2150089

Author(s):

Yangliu Kuai ◽

Dongdong Li ◽

Que Qian

Keyword(s):

Embedded System ◽

Object Tracking ◽

High Performance ◽

Target Location ◽

Optical Tracking ◽

Visual Object ◽

Siamese Network ◽

Final Response ◽

Image Pairs ◽

Thermal Sources

Visual object tracking works as a key component for many instrumentation and measurement applications such as UAV systems, optical tracking and measuring systems. This paper investigates how to implement accurate RGB-T tracking by integrating the complementary information from RGB and thermal sources. Inspired by the success of Siamese networks in the RGB tracking field, we design a twofold Siamese network for RGB-T tracking, which is composed of an RGB branch and a thermal branch. Each branch is a Siamese network, which can be utilized to compute similarities between the search image and the exemplar image. The parameters in the RGB branch are kept the same as SiamFC. The thermal branch is initialized with parameter weights of the trained network in SiamFC and fine-tuned with constructed thermal image pairs to better capture the target characteristics in the thermal data. Two criteria are further proposed to measure the confidence degrees of response maps obtained by these two modalities. The final response map is computed by adaptively fusing them according to their confidence degrees. The maximum location on the final response map is identified as the target location, and the target scale is obtained through a simple multi-scale search. Experiments on the recently public benchmark RGB-T234 demonstrate the effectiveness of our proposed method when compared to other state-of-the-art trackers. The high performance of our proposed trakcer makes it easy to be implemented in embedded system.

Download Full-text