scholarly journals Learning to Track Aircraft in Infrared Imagery

2020 ◽  
Vol 12 (23) ◽  
pp. 3995
Author(s):  
Sijie Wu ◽  
Kai Zhang ◽  
Shaoyi Li ◽  
Jie Yan

Airborne target tracking in infrared imagery remains a challenging task. The airborne target usually has a low signal-to-noise ratio and shows different visual patterns. The features adopted in the visual tracking algorithm are usually deep features pre-trained on ImageNet, which are not tightly coupled with the current video domain and therefore might not be optimal for infrared target tracking. To this end, we propose a new approach to learn the domain-specific features, which can be adapted to the current video online without pre-training on a large datasets. Considering that only a few samples of the initial frame can be used for online training, general feature representations are encoded to the network for a better initialization. The feature learning module is flexible and can be integrated into tracking frameworks based on correlation filters to improve the baseline method. Experiments on airborne infrared imagery are conducted to demonstrate the effectiveness of our tracking algorithm.

2020 ◽  
Vol 34 (07) ◽  
pp. 11386-11393 ◽  
Author(s):  
Shuang Li ◽  
Chi Liu ◽  
Qiuxia Lin ◽  
Binhui Xie ◽  
Zhengming Ding ◽  
...  

Tremendous research efforts have been made to thrive deep domain adaptation (DA) by seeking domain-invariant features. Most existing deep DA models only focus on aligning feature representations of task-specific layers across domains while integrating a totally shared convolutional architecture for source and target. However, we argue that such strongly-shared convolutional layers might be harmful for domain-specific feature learning when source and target data distribution differs to a large extent. In this paper, we relax a shared-convnets assumption made by previous DA methods and propose a Domain Conditioned Adaptation Network (DCAN), which aims to excite distinct convolutional channels with a domain conditioned channel attention mechanism. As a result, the critical low-level domain-dependent knowledge could be explored appropriately. As far as we know, this is the first work to explore the domain-wise convolutional channel activation for deep DA networks. Moreover, to effectively align high-level feature distributions across two domains, we further deploy domain conditioned feature correction blocks after task-specific layers, which will explicitly correct the domain discrepancy. Extensive experiments on three cross-domain benchmarks demonstrate the proposed approach outperforms existing methods by a large margin, especially on very tough cross-domain learning tasks.


2021 ◽  
Author(s):  
Qingxing Cao ◽  
Wentao Wan ◽  
Xiaodan Liang ◽  
Liang Lin

Despite the significant success in various domains, the data-driven deep neural networks compromise the feature interpretability, lack the global reasoning capability, and can’t incorporate external information crucial for complicated real-world tasks. Since the structured knowledge can provide rich cues to record human observations and commonsense, it is thus desirable to bridge symbolic semantics with learned local feature representations. In this chapter, we review works that incorporate different domain knowledge into the intermediate feature representation.These methods firstly construct a domain-specific graph that represents related human knowledge. Then, they characterize node representations with neural network features and perform graph convolution to enhance these symbolic nodes via the graph neural network(GNN).Lastly, they map the enhanced node feature back into the neural network for further propagation or prediction. Through integrating knowledge graphs into neural networks, one can collaborate feature learning and graph reasoning with the same supervised loss function and achieve a more effective and interpretable way to introduce structure constraints.


Sensors ◽  
2021 ◽  
Vol 21 (8) ◽  
pp. 2864
Author(s):  
Yuanping Zhang ◽  
Xiumei Huang ◽  
Ming Yang

To meet the challenge of video target tracking, based on a self-organization mapping network (SOM) and correlation filter, a long-term visual tracking algorithm is proposed. Objects in different videos or images often have completely different appearance, therefore, the self-organization mapping neural network with the characteristics of signal processing mechanism of human brain neurons is used to perform adaptive and unsupervised features learning. A reliable method of robust target tracking is proposed, based on multiple adaptive correlation filters with a memory function of target appearance at the same time. Filters in our method have different updating strategies and can carry out long-term tracking cooperatively. The first is the displacement filter, a kernelized correlation filter that combines contextual characteristics to precisely locate and track targets. Secondly, the scale filters are used to predict the changing scale of a target. Finally, the memory filter is used to maintain the appearance of the target in long-term memory and judge whether the target has failed to track. If the tracking fails, the incremental learning detector is used to recover the target tracking in the way of sliding window. Several experiments show that our method can effectively solve the tracking problems such as severe occlusion, target loss and scale change, and is superior to the state-of-the-art methods in the aspects of efficiency, accuracy and robustness.


2010 ◽  
Vol 32 (9) ◽  
pp. 2052-2057
Author(s):  
Xiao-yan Sun ◽  
Jian-dong Li ◽  
Yan-hui Chen ◽  
Wen-zhu Zhang ◽  
Jun-liang Yao

2021 ◽  
Vol 13 (4) ◽  
pp. 742
Author(s):  
Jian Peng ◽  
Xiaoming Mei ◽  
Wenbo Li ◽  
Liang Hong ◽  
Bingyu Sun ◽  
...  

Scene understanding of remote sensing images is of great significance in various applications. Its fundamental problem is how to construct representative features. Various convolutional neural network architectures have been proposed for automatically learning features from images. However, is the current way of configuring the same architecture to learn all the data while ignoring the differences between images the right one? It seems to be contrary to our intuition: it is clear that some images are easier to recognize, and some are harder to recognize. This problem is the gap between the characteristics of the images and the learning features corresponding to specific network structures. Unfortunately, the literature so far lacks an analysis of the two. In this paper, we explore this problem from three aspects: we first build a visual-based evaluation pipeline of scene complexity to characterize the intrinsic differences between images; then, we analyze the relationship between semantic concepts and feature representations, i.e., the scalability and hierarchy of features which the essential elements in CNNs of different architectures, for remote sensing scenes of different complexity; thirdly, we introduce CAM, a visualization method that explains feature learning within neural networks, to analyze the relationship between scenes with different complexity and semantic feature representations. The experimental results show that a complex scene would need deeper and multi-scale features, whereas a simpler scene would need lower and single-scale features. Besides, the complex scene concept is more dependent on the joint semantic representation of multiple objects. Furthermore, we propose the framework of scene complexity prediction for an image and utilize it to design a depth and scale-adaptive model. It achieves higher performance but with fewer parameters than the original model, demonstrating the potential significance of scene complexity.


Author(s):  
Tianyang Xu ◽  
Zhenhua Feng ◽  
Xiao-Jun Wu ◽  
Josef Kittler

AbstractDiscriminative Correlation Filters (DCF) have been shown to achieve impressive performance in visual object tracking. However, existing DCF-based trackers rely heavily on learning regularised appearance models from invariant image feature representations. To further improve the performance of DCF in accuracy and provide a parsimonious model from the attribute perspective, we propose to gauge the relevance of multi-channel features for the purpose of channel selection. This is achieved by assessing the information conveyed by the features of each channel as a group, using an adaptive group elastic net inducing independent sparsity and temporal smoothness on the DCF solution. The robustness and stability of the learned appearance model are significantly enhanced by the proposed method as the process of channel selection performs implicit spatial regularisation. We use the augmented Lagrangian method to optimise the discriminative filters efficiently. The experimental results obtained on a number of well-known benchmarking datasets demonstrate the effectiveness and stability of the proposed method. A superior performance over the state-of-the-art trackers is achieved using less than $$10\%$$ 10 % deep feature channels.


Sign in / Sign up

Export Citation Format

Share Document