Siamese High-Level Feature Refine Network for Visual Object Tracking

Md. Maklachur Rahman; Md Rishad Ahmed; Lamyanba Laishram; Seock Ho Kim; Soon Ki Jung

doi:10.3390/electronics9111918

Siamese High-Level Feature Refine Network for Visual Object Tracking

Electronics ◽

10.3390/electronics9111918 ◽

2020 ◽

Vol 9 (11) ◽

pp. 1918 ◽

Cited By ~ 1

Author(s):

Md. Maklachur Rahman ◽

Md Rishad Ahmed ◽

Lamyanba Laishram ◽

Seock Ho Kim ◽

Soon Ki Jung

Keyword(s):

Visual Tracking ◽

Feature Representation ◽

Visual Object ◽

Target Feature ◽

Discriminative Ability ◽

Visual Object Tracking ◽

Discrimination Ability ◽

Proposed Model ◽

Real Time Tracking ◽

High Level

Siamese network-based trackers are broadly applied to solve visual tracking problems due to its balanced performance in terms of speed and accuracy. Tracking desired objects in challenging scenarios is still one of the fundamental concerns during visual tracking. This research paper proposes a feature refined end-to-end tracking framework with real-time tracking speed and considerable performance. The feature refine network has been incorporated to enhance the target feature representation power, utilizing high-level semantic information. Besides, it allows the network to capture the salient information to locate the target and learns to represent the target feature in a more generalized way advancing the overall tracking performance, particularly in the challenging sequences. But, only the feature refine module is unable to handle such challenges because of its less discriminative ability. To overcome this difficulty, we employ an attention module inside the feature refine network that strengths the tracker discrimination ability between the target and background. Furthermore, we conduct extensive experiments to ensure the proposed tracker’s effectiveness using several popular tracking benchmarks, demonstrating that our proposed model achieves state-of-the-art performance over other trackers.

Download Full-text

Learning Soft Mask Based Feature Fusion with Channel and Spatial Attention for Robust Visual Object Tracking

Sensors ◽

10.3390/s20144021 ◽

2020 ◽

Vol 20 (14) ◽

pp. 4021 ◽

Cited By ~ 2

Author(s):

Mustansar Fiaz ◽

Arif Mahmood ◽

Soon Ki Jung

Keyword(s):

Object Tracking ◽

Spatial Attention ◽

Feature Fusion ◽

State Of The Art ◽

Feature Representation ◽

Visual Object ◽

Target Feature ◽

Visual Object Tracking ◽

Low Level ◽

Benchmark Datasets

We propose to improve the visual object tracking by introducing a soft mask based low-level feature fusion technique. The proposed technique is further strengthened by integrating channel and spatial attention mechanisms. The proposed approach is integrated within a Siamese framework to demonstrate its effectiveness for visual object tracking. The proposed soft mask is used to give more importance to the target regions as compared to the other regions to enable effective target feature representation and to increase discriminative power. The low-level feature fusion improves the tracker robustness against distractors. The channel attention is used to identify more discriminative channels for better target representation. The spatial attention complements the soft mask based approach to better localize the target objects in challenging tracking scenarios. We evaluated our proposed approach over five publicly available benchmark datasets and performed extensive comparisons with 39 state-of-the-art tracking algorithms. The proposed tracker demonstrates excellent performance compared to the existing state-of-the-art trackers.

Download Full-text

Visual Object Tracking in RGB-D Data via Genetic Feature Learning

Complexity ◽

10.1155/2019/4539410 ◽

2019 ◽

Vol 2019 ◽

pp. 1-8

Author(s):

Ming-xin Jiang ◽

Xian-xian Luo ◽

Tao Hai ◽

Hai-yan Wang ◽

Song Yang ◽

...

Keyword(s):

Object Tracking ◽

Feature Learning ◽

Feature Representation ◽

Visual Object ◽

Visual Object Tracking ◽

Fitness Evaluation ◽

Candidate Solution ◽

Genetic Feature ◽

Computer Vision Applications ◽

Crossover And Mutation

Visual object tracking is a fundamental component in many computer vision applications. Extracting robust features of object is one of the most important steps in tracking. As trackers, only formulated on RGB data, are usually affected by occlusions, appearance, or illumination variations, we propose a novel RGB-D tracking method based on genetic feature learning in this paper. Our approach addresses feature learning as an optimization problem. As owning the advantage of parallel computing, genetic algorithm (GA) has fast speed of convergence and excellent global optimization performance. At the same time, unlike handcrafted feature and deep learning methods, GA can be employed to solve the problem of feature representation without prior knowledge, and it has no use for a large number of parameters to be learned. The candidate solution in RGB or depth modality is represented as an encoding of an image in GA, and genetic feature is learned through population initialization, fitness evaluation, selection, crossover, and mutation. The proposed RGB-D tracker is evaluated on popular benchmark dataset, and experimental results indicate that our method achieves higher accuracy and faster tracking speed.

Download Full-text

Visual Tracking Based on Complementary Learners with Distractor Handling

Mathematical Problems in Engineering ◽

10.1155/2017/5295601 ◽

2017 ◽

Vol 2017 ◽

pp. 1-13 ◽

Cited By ~ 2

Author(s):

Suryo Adhi Wibowo ◽

Hansoo Lee ◽

Eun Kyeong Kim ◽

Sungshin Kim

Keyword(s):

Visual Tracking ◽

Object Representation ◽

Target Location ◽

Target Object ◽

Tracking Algorithm ◽

Color Histogram ◽

Correlation Filter ◽

Visual Object ◽

Visual Object Tracking ◽

Benchmark Datasets

The representation of the object is an important factor in building a robust visual object tracking algorithm. To resolve this problem, complementary learners that use color histogram- and correlation filter-based representation to represent the target object can be used since they each have advantages that can be exploited to compensate the other’s drawback in visual tracking. Further, a tracking algorithm can fail because of the distractor, even when complementary learners have been implemented for the target object representation. In this study, we show that, in order to handle the distractor, first the distractor must be detected by learning the responses from the color-histogram- and correlation-filter-based representation. Then, to determine the target location, we can decide whether the responses from each representation should be merged or only the response from the correlation filter should be used. This decision depends on the result obtained from the distractor detection process. Experiments were performed on the widely used VOT2014 and VOT2015 benchmark datasets. It was verified that our proposed method performs favorably as compared with several state-of-the-art visual tracking algorithms.

Download Full-text

Occluded Object Tracking System (OOTS)

International Journal of Service Science Management Engineering and Technology ◽

10.4018/ijssmet.2020070105 ◽

2020 ◽

Vol 11 (3) ◽

pp. 65-81

Author(s):

Rawan Fayez ◽

Mohamed Taha Abd Elfattah Taha ◽

Mahmoud Gadallah

Keyword(s):

Object Tracking ◽

Visual Tracking ◽

Intelligent Control ◽

Tracking System ◽

Video Sequences ◽

Visual Object ◽

Visual Object Tracking ◽

Intelligent Control System ◽

System A ◽

Learning Techniques

Visual object tracking remains a challenge facing an intelligent control system. A variety of applications serve many purposes such as surveillance. The developed technology faces plenty of obstacles that should be addressed including occlusion. In visual tracking, online learning techniques are most common due to their efficiency for most video sequences. Many object tracking techniques have emerged. However, the drifting problem in the case of noisy updates has been a stumbling block for the majority of relevant techniques. Such a problem can now be surmounted through updating the classifiers. The proposed system is called the Occluded Object Tracking System (OOTS) It is a hybrid system constructed from two algorithms: a fast technique Circulant Structure Kernels with Color Names (CSK-CN) and an efficient algorithm occlusion-aware Real-time Object Tracking (ROT). The proposed OOTS is evaluated with standard visual tracking benchmark databases. The experimental results proved that the proposed OOTS system is more reliable and provides efficient tracking results than other compared methods.

Download Full-text

Distractor-Aware Deep Regression for Visual Tracking

Sensors ◽

10.3390/s19020387 ◽

2019 ◽

Vol 19 (2) ◽

pp. 387 ◽

Cited By ~ 1

Author(s):

Ming Du ◽

Yan Ding ◽

Xiuyun Meng ◽

Hua-Liang Wei ◽

Yifan Zhao

Keyword(s):

Object Tracking ◽

Visual Tracking ◽

Test Data ◽

Loss Function ◽

State Of The Art ◽

Target Object ◽

Visual Object ◽

Visual Object Tracking ◽

Training Samples ◽

Better Than

In recent years, regression trackers have drawn increasing attention in the visual-object tracking community due to their favorable performance and easy implementation. The tracker algorithms directly learn mapping from dense samples around the target object to Gaussian-like soft labels. However, in many real applications, when applied to test data, the extreme imbalanced distribution of training samples usually hinders the robustness and accuracy of regression trackers. In this paper, we propose a novel effective distractor-aware loss function to balance this issue by highlighting the significant domain and by severely penalizing the pure background. In addition, we introduce a full differentiable hierarchy-normalized concatenation connection to exploit abstractions across multiple convolutional layers. Extensive experiments were conducted on five challenging benchmark-tracking datasets, that is, OTB-13, OTB-15, TC-128, UAV-123, and VOT17. The experimental results are promising and show that the proposed tracker performs much better than nearly all the compared state-of-the-art approaches.

Download Full-text

A Robust Visual Tracking Algorithm Based on Spatial-Temporal Context Hierarchical Response Fusion

Algorithms ◽

10.3390/a12010008 ◽

2018 ◽

Vol 12 (1) ◽

pp. 8 ◽

Cited By ~ 2

Author(s):

Wancheng Zhang ◽

Yanmin Luo ◽

Zhi Chen ◽

Yongzhao Du ◽

Daxin Zhu ◽

...

Keyword(s):

Visual Tracking ◽

Correlation Filter ◽

Temporal Context ◽

Visual Object ◽

Correlation Filters ◽

Visual Object Tracking ◽

Illumination Changes ◽

Model Update ◽

Benchmark Datasets ◽

Hierarchical Features

Discriminative correlation filters (DCFs) have been shown to perform superiorly in visual object tracking. However, visual tracking is still challenging when the target objects undergo complex scenarios such as occlusion, deformation, scale changes and illumination changes. In this paper, we utilize the hierarchical features of convolutional neural networks (CNNs) and learn a spatial-temporal context correlation filter on convolutional layers. Then, the translation is estimated by fusing the response score of the filters on the three convolutional layers. In terms of scale estimation, we learn a discriminative correlation filter to estimate scale from the best confidence results. Furthermore, we proposed a re-detection activation discrimination method to improve the robustness of visual tracking in the case of tracking failure and an adaptive model update method to reduce tracking drift caused by noisy updates. We evaluate the proposed tracker with DCFs and deep features on OTB benchmark datasets. The tracking results demonstrated that the proposed algorithm is superior to several state-of-the-art DCF methods in terms of accuracy and robustness.

Download Full-text

Improved Hierarchical Convolutional Features for Robust Visual Object Tracking

Complexity ◽

10.1155/2021/6690237 ◽

2021 ◽

Vol 2021 ◽

pp. 1-16

Author(s):

Jinping Sun

Keyword(s):

Object Tracking ◽

Target Position ◽

Feature Representation ◽

Correlation Filter ◽

Low Rank ◽

Visual Object ◽

Threshold Condition ◽

Current Frame ◽

Visual Object Tracking

The target and background will change continuously in the long-term tracking process, which brings great challenges to the accurate prediction of targets. The correlation filter algorithm based on manual features is difficult to meet the actual needs due to its limited feature representation ability. Thus, to improve the tracking performance and robustness, an improved hierarchical convolutional features model is proposed into a correlation filter framework for visual object tracking. First, the objective function is designed by lasso regression modeling, and a sparse, time-series low-rank filter is learned to increase the interpretability of the model. Second, the features of the last layer and the second pool layer of the convolutional neural network are extracted to realize the target position prediction from coarse to fine. In addition, using the filters learned from the first frame and the current frame to calculate the response maps, respectively, the target position is obtained by finding the maximum response value in the response map. The filter model is updated only when these two maximum responses meet the threshold condition. The proposed tracker is evaluated by simulation analysis on TC-128/OTB2015 benchmarks including more than 100 video sequences. Extensive experiments demonstrate that the proposed tracker achieves competitive performance against state-of-the-art trackers. The distance precision rate and overlap success rate of the proposed algorithm on OTB2015 are 0.829 and 0.695, respectively. The proposed algorithm effectively solves the long-term object tracking problem in complex scenes.

Download Full-text

CFNN: Correlation Filter Neural Network for Visual Object Tracking

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/309 ◽

2017 ◽

Cited By ~ 2

Author(s):

Yang Li ◽

Zhan Xu ◽

Jianke Zhu

Keyword(s):

Neural Network ◽

Visual Tracking ◽

Network Architecture ◽

Back Propagation ◽

Correlation Filter ◽

Visual Object ◽

Neural Network Architecture ◽

Visual Object Tracking ◽

Single Target ◽

Wide Range

Albeit convolutional neural network (CNN) has shown promising capacity in many computer vision tasks, applying it to visual tracking is yet far from solved. Existing methods either employ a large external dataset to undertake exhaustive pre-training or suffer from less satisfactory results in terms of accuracy and robustness. To track single target in a wide range of videos, we present a novel Correlation Filter Neural Network architecture, as well as a complete visual tracking pipeline, The proposed approach is a special case of CNN, whose initialization does not need any pre-training on the external dataset. The initialization of network enjoys the merits of cyclic sampling to achieve the appealing discriminative capability, while the network updating scheme adopts advantages from back-propagation in order to capture new appearance variations. The tracking pipeline integrates both aspects well by making them complementary to each other. We validate our tracker on OTB-2013 benchmark. The proposed tracker obtains the promising results compared to most of existing representative trackers.

Download Full-text

A Learning Frequency-Aware Feature Siamese Network for Real-Time Visual Tracking

Electronics ◽

10.3390/electronics9050854 ◽

2020 ◽

Vol 9 (5) ◽

pp. 854

Author(s):

Yuxiang Yang ◽

Weiwei Xing ◽

Shunli Zhang ◽

Qi Yu ◽

Xiaoyu Guo ◽

...

Keyword(s):

Object Tracking ◽

Low Frequency ◽

Linear Representation ◽

Visual Object ◽

Tracking Accuracy ◽

Discriminative Ability ◽

Visual Object Tracking ◽

Training Samples ◽

Complex Scenes ◽

Siamese Networks

Visual object tracking by Siamese networks has achieved favorable performance in accuracy and speed. However, the features used in Siamese networks have spatially redundant information, which increases computation and limits the discriminative ability of Siamese networks. Addressing this issue, we present a novel frequency-aware feature (FAF) method for robust visual object tracking in complex scenes. Unlike previous works, which select features from different channels or layers, the proposed method factorizes the feature map into multi-frequency and reduces the low-frequency information that is spatially redundant. By reducing the low-frequency map’s resolution, the computation is saved and the receptive field of the layer is also increased to obtain more discriminative information. To further improve the performance of the FAF, we design an innovative data-independent augmentation for object tracking to improve the discriminative ability of tracker, which enhanced linear representation among training samples by convex combinations of the images and tags. Finally, a joint judgment strategy is proposed to adjust the bounding box result that combines intersection-over-union (IoU) and classification scores to improve tracking accuracy. Extensive experiments on 5 challenging benchmarks demonstrate that our FAF method performs favorably against SOTA tracking methods while running around 45 frames per second.

Download Full-text

Multi-domain collaborative feature representation for robust visual object tracking

The Visual Computer ◽

10.1007/s00371-021-02237-9 ◽

2021 ◽

Author(s):

Jiqing Zhang ◽

Kai Zhao ◽

Bo Dong ◽

Yingkai Fu ◽

Yuxin Wang ◽

...

Keyword(s):

Object Tracking ◽

Feature Representation ◽

Visual Object ◽

Visual Object Tracking

Download Full-text