scholarly journals An Anchor-Free Siamese Network with Multi-Template Update for Object Tracking

Electronics ◽  
2021 ◽  
Vol 10 (9) ◽  
pp. 1067
Author(s):  
Tongtong Yuan ◽  
Wenzhu Yang ◽  
Qian Li ◽  
Yuxia Wang

Siamese trackers are widely used in various fields for their advantages of balancing speed and accuracy. Compared with the anchor-based method, the anchor-free-based approach can reach faster speeds without any drop in precision. Inspired by the Siamese network and anchor-free idea, an anchor-free Siamese network (AFSN) with multi-template updates for object tracking is proposed. To improve tracking performance, a dual-fusion method is adopted in which the multi-layer features and multiple prediction results are combined respectively. The low-level feature maps are concatenated with the high-level feature maps to make full use of both spatial and semantic information. To make the results as stable as possible, the final results are obtained by combining multiple prediction results. Aiming at the template update, a high-confidence multi-template update mechanism is used. The average peak to correlation energy is used to determine whether the template should be updated. We use the anchor-free network to implement object tracking in a per-pixel manner, which computes the object category and bounding boxes directly. Experimental results indicate that the average overlap and success rate of the proposed algorithm increase by about 5% and 10%, respectively, compared to the SiamRPN++ algorithm when running on the dataset of GOT-10k (Generic Object Tracking Benchmark).

2021 ◽  
Author(s):  
Kosuke Honda ◽  
Hamido Fujita

In recent years, template-based methods such as Siamese network trackers and Correlation Filter (CF) based trackers have achieved state-of-the-art performance in several benchmarks. Recent Siamese network trackers use deep features extracted from convolutional neural networks to locate the target. However, the tracking performance of these trackers decreases when there are similar distractors to the object and the target object is deformed. On the other hand, correlation filter (CF)-based trackers that use handcrafted features (e.g., HOG features) to spatially locate the target. These two approaches have complementary characteristics due to differences in learning methods, features used, and the size of search regions. Also, we found that these trackers are complementary in terms of performance in benchmarking. Therefore, we propose the “Complementary Tracking framework using Average peak-to-correlation energy” (CTA). CTA is the generic object tracking framework that connects CF-trackers and Siamese-trackers in parallel and exploits the complementary features of these. In CTA, when a tracking failure of the Siamese tracker is detected using Average peak-to-correlation energy (APCE), which is an evaluation index of the response map matrix, the CF-trackers correct the output. In experimental on OTB100, CTA significantly improves the performance over the original tracker for several combinations of Siamese-trackers and CF-rackers.


Author(s):  
D. Zhang ◽  
J. Lv ◽  
Z. Cheng ◽  
Y. Bai ◽  
Y. Cao

Abstract. After the development of deep learning object tracking methods in recent years, the fully convolutional siamese network object tracking algorithm SiamFC has become a more classic deep learning object tracking algorithm. In view of the problem that the accuracy of the tracking results of SiamFC will be reduced in the case of complex backgrounds, this paper introduces the attention mechanism based on the SiamFC, which performs channel and spatial weighting on the feature maps obtained by convolution of the input image. At the same time, the backbone network model of CNN in the algorithm is adjusted, then the siamese network combined with attention mechanism for object tracking is proposed. It can strengthen the effectiveness of the results of feature extraction and enhance the ability of the network model to discriminate targets. In this paper, the algorithm is tested on the OTB2015, VOT2016 and VOT2017 datasets, and compared with multiple object tracking algorithms. Experimental results show that the algorithm in this paper can better solve the complex background problem in object tracking, and has certain advantages compared with other algorithms.


2020 ◽  
Vol 2020 ◽  
pp. 1-19
Author(s):  
Chenpu Li ◽  
Qianjian Xing ◽  
Zhenguo Ma ◽  
Ke Zang

With the development of deep learning, trackers based on convolutional neural networks (CNNs) have made significant achievements in visual tracking over the years. The fully connected Siamese network (SiamFC) is a typical representation of those trackers. SiamFC designs a two-branch architecture of a CNN and models’ visual tracking as a general similarity-learning problem. However, the feature maps it uses for visual tracking are only from the last layer of the CNN. Those features contain high-level semantic information but lack sufficiently detailed texture information. This means that the SiamFC tracker tends to drift when there are other same-category objects or when the contrast between the target and the background is very low. Focusing on addressing this problem, we design a novel tracking algorithm that combines a correlation filter tracker and the SiamFC tracker into one framework. In this framework, the correlation filter tracker can use the Histograms of Oriented Gradients (HOG) and color name (CN) features to guide the SiamFC tracker. This framework also contains an evaluation criterion which we design to evaluate the tracking result of the two trackers. If this criterion finds the SiamFC tracker fails in some cases, our framework will use the tracking result from the correlation filter tracker to correct the SiamFC. In this way, the defects of SiamFC’s high-level semantic features are remedied by the HOG and CN features. So, our algorithm provides a framework which combines two trackers together and makes them complement each other in visual tracking. And to the best of our knowledge, our algorithm is also the first one which designs an evaluation criterion using correlation filter and zero padding to evaluate the tracking result. Comprehensive experiments are conducted on the Online Tracking Benchmark (OTB), Temple Color (TC128), Benchmark for UAV Tracking (UAV-123), and Visual Object Tracking (VOT) Benchmark. The results show that our algorithm achieves quite a competitive performance when compared with the baseline tracker and several other state-of-the-art trackers.


2020 ◽  
Vol 8 (1) ◽  
pp. 35-46
Author(s):  
Yongpeng Zhao ◽  
Lasheng Yu ◽  
Xiaopeng Zheng

Siamese networks have drawn increasing interest in the field of visual object tracking due to their balance of precision and efficiency. However, Siamese trackers use relatively shallow backbone networks, such as AlexNet, and therefore do not take full advantage of the capabilities of modern deep convolutional neural networks (CNNs). Moreover, the feature representations of the target object in a Siamese tracker are extracted through the last layer of CNNs and mainly capture semantic information, which causes the tracker's precision to be relatively low and to drift easily in the presence of similar distractors. In this paper, a new nonpadding residual unit (NPRU) is designed and used to stack a 22-layer deep ResNet, referred as ResNet22. After utilizing ResNet22 as the backbone network, we can build a deep Siamese network, which can greatly enhance the tracking performance. Considering that the different levels of the feature maps of the CNN represent different aspects of the target object, we aggregated different deep convolutional layers to make use of ResNet22's multilevel feature maps, which can form hyperfeature representations of targets. The designed deep hyper Siamese network is named DHSiam. Experimental results show that DHSiam has achieved significant improvement on multiple benchmark datasets.


2018 ◽  
Author(s):  
Oscar A. Douglas-Gallardo ◽  
David A. Sáez ◽  
Stefan Vogt-Geisse ◽  
Esteban Vöhringer-Martinez

<div><div><div><p>Carboxylation reactions represent a very special class of chemical reactions that is characterized by the presence of a carbon dioxide (CO2) molecule as reactive species within its global chemical equation. These reactions work as fundamental gear to accomplish the CO2 fixation and thus to build up more complex molecules through different technological and biochemical processes. In this context, a correct description of the CO2 electronic structure turns out to be crucial to study the chemical and electronic properties associated with this kind of reactions. Here, a sys- tematic study of CO2 electronic structure and its contribution to different carboxylation reaction electronic energies has been carried out by means of several high-level ab-initio post-Hartree Fock (post-HF) and Density Functional Theory (DFT) calculations for a set of biochemistry and inorganic systems. We have found that for a correct description of the CO2 electronic correlation energy it is necessary to include post-CCSD(T) contributions (beyond the gold standard). These high-order excitations are required to properly describe the interactions of the four π-electrons as- sociated with the two degenerated π-molecular orbitals of the CO2 molecule. Likewise, our results show that in some reactions it is possible to obtain accurate reaction electronic energy values with computationally less demanding methods when the error in the electronic correlation energy com- pensates between reactants and products. Furthermore, the provided post-HF reference values allowed to validate different DFT exchange-correlation functionals combined with different basis sets for chemical reactions that are relevant in biochemical CO2 fixing enzymes.</p></div></div></div>


Symmetry ◽  
2021 ◽  
Vol 13 (2) ◽  
pp. 266 ◽  
Author(s):  
Yifeng Wang ◽  
Zhijiang Zhang ◽  
Ning Zhang ◽  
Dan Zeng

The one-shot multiple object tracking (MOT) framework has drawn more and more attention in the MOT research community due to its advantage in inference speed. However, the tracking accuracy of current one-shot approaches could lead to an inferior performance compared with their two-stage counterparts. The reasons are two-fold: one is that motion information is often neglected due to the single-image input. The other is that detection and re-identification (ReID) are two different tasks with different focuses. Joining detection and re-identification at the training stage could lead to a suboptimal performance. To alleviate the above limitations, we propose a one-shot network named Motion and Correlation-Multiple Object Tracking (MAC-MOT). MAC-MOT introduces a motion enhance attention module (MEA) and a dual correlation attention module (DCA). MEA performs differences on adjacent feature maps which enhances the motion-related features while suppressing irrelevant information. The DCA module focuses on decoupling the detection task and re-identification task to strike a balance and reduce the competition between these two tasks. Moreover, symmetry is a core design idea in our proposed framework which is reflected in Siamese-based deep learning backbone networks, the input of dual stream images, as well as a dual correlation attention module. Our proposed approach is evaluated on the popular multiple object tracking benchmarks MOT16 and MOT17. We demonstrate that the proposed MAC-MOT can achieve a better performance than the baseline state of the arts (SOTAs).


2021 ◽  
pp. 1-10
Author(s):  
Mona M. Moussa ◽  
Rasha Shoitan ◽  
Mohamed S. Abdallah

Finding the common objects in a set of images is considered one of the recent challenges in different computer vision tasks. Most of the conventional methods have proposed unsupervised and weakly supervised co-localization methods to find the common objects; however, these methods require producing a huge amount of region proposals. This paper tackles this problem by exploiting supervised learning benefits to localize the common object in a set of unlabeled images containing multiple objects or with no common objects. Two stages are proposed to localize the common objects: the candidate box generation stage and the matching and clustering stage. In the candidate box generation stage, the objects are localized and surrounded by the bounding boxes. The matching and clustering stage is applied on the generated bounding boxes and creates a distance matrix based on a trained Siamese network to reflect the matching percentage. Hierarchical clustering uses the generated distance matrix to find the common objects and create clusters for each one. The proposed method is trained on PASCAL VOC 2007 dataset; on the other hand, it is assessed by applying different experiments on PASCAL VOC 2007 6×2 and Object Discovery datasets, respectively. The results reveal that the proposed method outperforms the conventional methods by 8% to 40% in terms of corloc metric.


Sign in / Sign up

Export Citation Format

Share Document