Object Tracking in RGB-T Videos Using Modal-Aware Attention Network and Competitive Learning

Hui Zhang; Lei Zhang; Li Zhuo; Jing Zhang

doi:10.3390/s20020393

Object Tracking in RGB-T Videos Using Modal-Aware Attention Network and Competitive Learning

Sensors ◽

10.3390/s20020393 ◽

2020 ◽

Vol 20 (2) ◽

pp. 393

Author(s):

Hui Zhang ◽

Lei Zhang ◽

Li Zhuo ◽

Jing Zhang

Keyword(s):

Feature Extraction ◽

Object Tracking ◽

Feature Fusion ◽

Imaging System ◽

Rapid Development ◽

Competitive Learning ◽

Superior Performance ◽

Stream Network ◽

Attention Network ◽

Dual Modality

Object tracking in RGB-thermal (RGB-T) videos is increasingly used in many fields due to the all-weather and all-day working capability of the dual-modality imaging system, as well as the rapid development of low-cost and miniaturized infrared camera technology. However, it is still very challenging to effectively fuse dual-modality information to build a robust RGB-T tracker. In this paper, an RGB-T object tracking algorithm based on a modal-aware attention network and competitive learning (MaCNet) is proposed, which includes a feature extraction network, modal-aware attention network, and classification network. The feature extraction network adopts the form of a two-stream network to extract features from each modality image. The modal-aware attention network integrates the original data, establishes an attention model that characterizes the importance of different feature layers, and then guides the feature fusion to enhance the information interaction between modalities. The classification network constructs a modality-egoistic loss function through three parallel binary classifiers acting on the RGB branch, the thermal infrared branch, and the fusion branch, respectively. Guided by the training strategy of competitive learning, the entire network is fine-tuned in the direction of the optimal fusion of the dual modalities. Extensive experiments on several publicly available RGB-T datasets show that our tracker has superior performance compared to other latest RGB-T and RGB tracking approaches.

Download Full-text

A Multi-Branch Feature Fusion Strategy Based on an Attention Mechanism for Remote Sensing Image Scene Classification

Remote Sensing ◽

10.3390/rs13101950 ◽

2021 ◽

Vol 13 (10) ◽

pp. 1950

Author(s):

Cuiping Shi ◽

Xin Zhao ◽

Liguo Wang

Keyword(s):

Remote Sensing ◽

Feature Extraction ◽

Classification Accuracy ◽

Feature Fusion ◽

State Of The Art ◽

Rapid Development ◽

Remote Sensing Image ◽

Classification Performance ◽

Attention Mechanism ◽

Scene Classification

In recent years, with the rapid development of computer vision, increasing attention has been paid to remote sensing image scene classification. To improve the classification performance, many studies have increased the depth of convolutional neural networks (CNNs) and expanded the width of the network to extract more deep features, thereby increasing the complexity of the model. To solve this problem, in this paper, we propose a lightweight convolutional neural network based on attention-oriented multi-branch feature fusion (AMB-CNN) for remote sensing image scene classification. Firstly, we propose two convolution combination modules for feature extraction, through which the deep features of images can be fully extracted with multi convolution cooperation. Then, the weights of the feature are calculated, and the extracted deep features are sent to the attention mechanism for further feature extraction. Next, all of the extracted features are fused by multiple branches. Finally, depth separable convolution and asymmetric convolution are implemented to greatly reduce the number of parameters. The experimental results show that, compared with some state-of-the-art methods, the proposed method still has a great advantage in classification accuracy with very few parameters.

Download Full-text

AN OPTIMIZED FEATURE EXTRACTION TECHNIQUE FOR CONTENT BASED IMAGE RETRIEVAL

International Journal of Image Processing and Vision Science ◽

10.47893/ijipvs.2013.1061 ◽

2013 ◽

pp. 11-16

Author(s):

SAVITHA SIVAN ◽

THUSNAVIS BELLA MARY. I

Keyword(s):

Feature Extraction ◽

Image Retrieval ◽

Feature Fusion ◽

Texture Features ◽

Research Area ◽

Superior Performance ◽

Content Based Image Retrieval ◽

Multimedia Technologies ◽

Active Research ◽

Occurrence Matrix

Content-based image retrieval (CBIR) is an active research area with the development of multimedia technologies and has become a source of exact and fast retrieval. The aim of CBIR is to search and retrieve images from a large database and find out the best match for the given query. Accuracy and efficiency for high dimensional datasets with enormous number of samples is a challenging arena. In this paper, Content Based Image Retrieval using various features such as color, shape, texture is made and a comparison is made among them. The performance of the retrieval system is evaluated depending upon the features extracted from an image. The performance was evaluated using precision and recall rates. Haralick texture features were analyzed at 0 o, 45 o, 90 o, 180 o using gray level co-occurrence matrix. Color feature extraction was done using color moments. Structured features and multiple feature fusion are two main technologies to ensure the retrieval accuracy in the system. GIST is considered as one of the main structured features. It was experimentally observed that combination of these techniques yielded superior performance than individual features. The results for the most efficient combination of techniques have also been presented and optimized for each class of query.

Download Full-text

Convolutional Rebalancing Network for the Classification of Large Imbalanced Rice Pest and Disease Datasets in the Field

Frontiers in Plant Science ◽

10.3389/fpls.2021.671134 ◽

2021 ◽

Vol 12 ◽

Author(s):

Guofeng Yang ◽

Guipeng Chen ◽

Cong Li ◽

Jiangfan Fu ◽

Yang Guo ◽

...

Keyword(s):

Feature Extraction ◽

Feature Fusion ◽

Classification Performance ◽

Superior Performance ◽

Imbalanced Dataset ◽

Pests And Diseases ◽

Rice Pests ◽

Rice Pest ◽

Image Datasets

The accurate classification of crop pests and diseases is essential for their prevention and control. However, datasets of pest and disease images collected in the field usually exhibit long-tailed distributions with heavy category imbalance, posing great challenges for a deep recognition and classification model. This paper proposes a novel convolutional rebalancing network to classify rice pests and diseases from image datasets collected in the field. To improve the classification performance, the proposed network includes a convolutional rebalancing module, an image augmentation module, and a feature fusion module. In the convolutional rebalancing module, instance-balanced sampling is used to extract features of the images in the rice pest and disease dataset, while reversed sampling is used to improve feature extraction of the categories with fewer images in the dataset. Building on the convolutional rebalancing module, we design an image augmentation module to augment the training data effectively. To further enhance the classification performance, a feature fusion module fuses the image features learned by the convolutional rebalancing module and ensures that the feature extraction of the imbalanced dataset is more comprehensive. Extensive experiments in the large-scale imbalanced dataset of rice pests and diseases (18,391 images), publicly available plant image datasets (Flavia, Swedish Leaf, and UCI Leaf) and pest image datasets (SMALL and IP102) verify the robustness of the proposed network, and the results demonstrate its superior performance over state-of-the-art methods, with an accuracy of 97.58% on rice pest and disease image dataset. We conclude that the proposed network can provide an important tool for the intelligent control of rice pests and diseases in the field.

Download Full-text

Object Tracking in Unmanned Aerial Vehicle Videos via Multifeature Discrimination and Instance-Aware Attention Network

Remote Sensing ◽

10.3390/rs12162646 ◽

2020 ◽

Vol 12 (16) ◽

pp. 2646

Author(s):

Shiyu Zhang ◽

Li Zhuo ◽

Hui Zhang ◽

Jiafeng Li

Keyword(s):

Object Tracking ◽

Unmanned Aerial Vehicle ◽

Target Detection ◽

Traffic Monitoring ◽

Superior Performance ◽

Visual Object ◽

Attention Network ◽

Bounding Box ◽

Detection Stage ◽

Aerial Vehicle

Visual object tracking in unmanned aerial vehicle (UAV) videos plays an important role in a variety of fields, such as traffic data collection, traffic monitoring, as well as film and television shooting. However, it is still challenging to track the target robustly in UAV vision task due to several factors such as appearance variation, background clutter, and severe occlusion. In this paper, we propose a novel two-stage UAV tracking framework, which includes a target detection stage based on multifeature discrimination and a bounding-box estimation stage based on the instance-aware attention network. In the target detection stage, we explore a feature representation scheme for a small target that integrates handcrafted features, low-level deep features, and high-level deep features. Then, the correlation filter is used to roughly predict target location. In the bounding-box estimation stage, an instance-aware intersection over union (IoU)-Net is integrated together with an instance-aware attention network to estimate the target size based on the bounding-box proposals generated in the target detection stage. Extensive experimental results on the UAV123 and UAVDT datasets show that our tracker, running at over 25 frames per second (FPS), has superior performance as compared with state-of-the-art UAV visual tracking approaches.

Download Full-text

Protein Structural Class Prediction Based on Distance-related Statistical Features from Graphical Representation of Predicted Secondary Structure

Letters in Organic Chemistry ◽

10.2174/1570178615666180914110451 ◽

2019 ◽

Vol 16 (4) ◽

pp. 317-324

Author(s):

Liang Kong ◽

Lichao Zhang ◽

Xiaodong Han ◽

Jinfeng Lv

Keyword(s):

Feature Extraction ◽

Secondary Structure ◽

Protein Sequence ◽

Function Analysis ◽

Superior Performance ◽

Support Vector ◽

Chaos Game Representation ◽

Class Prediction ◽

Structural Class ◽

Protein Structural Class

Protein structural class prediction is beneficial to protein structure and function analysis. Exploring good feature representation is a key step for this prediction task. Prior works have demonstrated the effectiveness of the secondary structure based feature extraction methods especially for lowsimilarity protein sequences. However, the prediction accuracies still remain limited. To explore the potential of secondary structure information, a novel feature extraction method based on a generalized chaos game representation of predicted secondary structure is proposed. Each protein sequence is converted into a 20-dimensional distance-related statistical feature vector to characterize the distribution of secondary structure elements and segments. The feature vectors are then fed into a support vector machine classifier to predict the protein structural class. Our experiments on three widely used lowsimilarity benchmark datasets (25PDB, 1189 and 640) show that the proposed method achieves superior performance to the state-of-the-art methods. It is anticipated that our method could be extended to other graphical representations of protein sequence and be helpful in future protein research.

Download Full-text

Automatic Modulation Classification Based on Deep Feature Fusion for High Noise Level and Large Dynamic Input

Sensors ◽

10.3390/s21062117 ◽

2021 ◽

Vol 21 (6) ◽

pp. 2117

Author(s):

Hui Han ◽

Zhiyuan Ren ◽

Lin Li ◽

Zhigang Zhu

Keyword(s):

Neural Network ◽

Frequency Domain ◽

Noise Level ◽

Background Noise ◽

Dynamic Range ◽

Feature Fusion ◽

Superior Performance ◽

Modulation Classification ◽

High Background ◽

Automatic Modulation Classification

Automatic modulation classification (AMC) is playing an increasingly important role in spectrum monitoring and cognitive radio. As communication and electronic technologies develop, the electromagnetic environment becomes increasingly complex. The high background noise level and large dynamic input have become the key problems for AMC. This paper proposes a feature fusion scheme based on deep learning, which attempts to fuse features from different domains of the input signal to obtain a more stable and efficient representation of the signal modulation types. We consider the complementarity among features that can be used to suppress the influence of the background noise interference and large dynamic range of the received (intercepted) signals. Specifically, the time-series signals are transformed into the frequency domain by Fast Fourier transform (FFT) and Welch power spectrum analysis, followed by the convolutional neural network (CNN) and stacked auto-encoder (SAE), respectively, for detailed and stable frequency-domain feature representations. Considering the complementary information in the time domain, the instantaneous amplitude (phase) statistics and higher-order cumulants (HOC) are extracted as the statistical features for fusion. Based on the fused features, a probabilistic neural network (PNN) is designed for automatic modulation classification. The simulation results demonstrate the superior performance of the proposed method. It is worth noting that the classification accuracy can reach 99.8% in the case when signal-to-noise ratio (SNR) is 0 dB.

Download Full-text

Adaptive Channel Selection for Robust Visual Object Tracking with Discriminative Correlation Filters

International Journal of Computer Vision ◽

10.1007/s11263-021-01435-1 ◽

2021 ◽

Author(s):

Tianyang Xu ◽

Zhenhua Feng ◽

Xiao-Jun Wu ◽

Josef Kittler

Keyword(s):

Object Tracking ◽

Augmented Lagrangian Method ◽

Channel Selection ◽

Image Feature ◽

Superior Performance ◽

Appearance Model ◽

Visual Object ◽

Correlation Filters ◽

Visual Object Tracking ◽

Feature Representations

AbstractDiscriminative Correlation Filters (DCF) have been shown to achieve impressive performance in visual object tracking. However, existing DCF-based trackers rely heavily on learning regularised appearance models from invariant image feature representations. To further improve the performance of DCF in accuracy and provide a parsimonious model from the attribute perspective, we propose to gauge the relevance of multi-channel features for the purpose of channel selection. This is achieved by assessing the information conveyed by the features of each channel as a group, using an adaptive group elastic net inducing independent sparsity and temporal smoothness on the DCF solution. The robustness and stability of the learned appearance model are significantly enhanced by the proposed method as the process of channel selection performs implicit spatial regularisation. We use the augmented Lagrangian method to optimise the discriminative filters efficiently. The experimental results obtained on a number of well-known benchmarking datasets demonstrate the effectiveness and stability of the proposed method. A superior performance over the state-of-the-art trackers is achieved using less than $$10\%$$ 10 % deep feature channels.

Download Full-text

A Multi-Scale Feature Extraction-Based Normalized Attention Neural Network for Image Denoising

Electronics ◽

10.3390/electronics10030319 ◽

2021 ◽

Vol 10 (3) ◽

pp. 319

Author(s):

Yi Wang ◽

Xiao Song ◽

Guanghong Gong ◽

Ni Li

Keyword(s):

Neural Network ◽

Feature Extraction ◽

Image Denoising ◽

Color Image ◽

Rapid Development ◽

Similarity Index ◽

Structural Similarity ◽

Convolutional Network ◽

Scale Feature ◽

Multi Scale

Due to the rapid development of deep learning and artificial intelligence techniques, denoising via neural networks has drawn great attention due to their flexibility and excellent performances. However, for most convolutional network denoising methods, the convolution kernel is only one layer deep, and features of distinct scales are neglected. Moreover, in the convolution operation, all channels are treated equally; the relationships of channels are not considered. In this paper, we propose a multi-scale feature extraction-based normalized attention neural network (MFENANN) for image denoising. In MFENANN, we define a multi-scale feature extraction block to extract and combine features at distinct scales of the noisy image. In addition, we propose a normalized attention network (NAN) to learn the relationships between channels, which smooths the optimization landscape and speeds up the convergence process for training an attention model. Moreover, we introduce the NAN to convolutional network denoising, in which each channel gets gain; channels can play different roles in the subsequent convolution. To testify the effectiveness of the proposed MFENANN, we used both grayscale and color image sets whose noise levels ranged from 0 to 75 to do the experiments. The experimental results show that compared with some state-of-the-art denoising methods, the restored images of MFENANN have larger peak signal-to-noise ratios (PSNR) and structural similarity index measure (SSIM) values and get better overall appearance.

Download Full-text

Explicit-implicit dual stream network for image quality assessment

EURASIP Journal on Image and Video Processing ◽

10.1186/s13640-020-00538-y ◽

2020 ◽

Vol 2020 (1) ◽

Author(s):

Guangyi Yang ◽

Xingyu Ding ◽

Tian Huang ◽

Kun Cheng ◽

Weizheng Jin

Keyword(s):

Deep Learning ◽

Image Quality ◽

Quality Assessment ◽

Frequency Domain ◽

Image Quality Assessment ◽

Feature Fusion ◽

Learning Model ◽

Stream Network ◽

Perception System ◽

Deep Learning Model

Abstract Communications industry has remarkably changed with the development of fifth-generation cellular networks. Image, as an indispensable component of communication, has attracted wide attention. Thus, finding a suitable approach to assess image quality is important. Therefore, we propose a deep learning model for image quality assessment (IQA) based on explicit-implicit dual stream network. We use frequency domain features of kurtosis based on wavelet transform to represent explicit features and spatial features extracted by convolutional neural network (CNN) to represent implicit features. Thus, we constructed an explicit-implicit (EI) parallel deep learning model, namely, EI-IQA model. The EI-IQA model is based on the VGGNet that extracts the spatial domain features. On this basis, the number of network layers of VGGNet is reduced by adding the parallel wavelet kurtosis value frequency domain features. Thus, the training parameters and the sample requirements decline. We verified, by cross-validation of different databases, that the wavelet kurtosis feature fusion method based on deep learning has a more complete feature extraction effect and a better generalisation ability. Thus, the method can simulate the human visual perception system better, and subjective feelings become closer to the human eye. The source code about the proposed EI-IQA model is available on github https://github.com/jacob6/EI-IQA.

Download Full-text

Theragnostic Glycol Chitosan-Conjugated Gold Nanoparticles for Photoacoustic Imaging of Regional Lymph Nodes and Delivering Tumor Antigen to Lymph Nodes

Nanomaterials ◽

10.3390/nano11071700 ◽

2021 ◽

Vol 11 (7) ◽

pp. 1700

Author(s):

In-Cheol Sun ◽

SeongHoon Jo ◽

Diego Dumani ◽

Wan Su Yun ◽

Hong Yeol Yoon ◽

...

Keyword(s):

Gold Nanoparticles ◽

Lymph Node ◽

Lymph Nodes ◽

Cancer Immunotherapy ◽

Tumor Antigen ◽

Tumor Antigens ◽

Imaging System ◽

Superior Performance ◽

Glycol Chitosan ◽

Cervical Lymph Nodes

Lymph node mapping is important in cancer immunotherapy because the morphology of lymph nodes is one of the crucial evaluation criteria of immune responses. We developed new theragnostic glycol-chitosan-coated gold nanoparticles (GC-AuNPs), which highlighted lymph nodes in ultrasound-guided photoacoustic (US/PA) imaging. Moreover, the ovalbumin epitope was conjugated GC-AuNPs (OVA-GC-AuNPs) for delivering tumor antigen to lymph node resident macrophage. In vitro studies proved the vigorous endocytosis activity of J774A.1 macrophage and consequent strong photoacoustic signals from them. The macrophages also presented a tumor antigen when OVA-GC-AuNPs were used for cellular uptake. After the lingual injection of GC-AuNPs into healthy mice, cervical lymph nodes were visible in a US/PA imaging system with high contrast. Three-dimensional analysis of lymph nodes revealed that the accumulation of GC-AuNPs in the lymph node increased as the post-injection time passed. Histological analysis showed GC-AuNPs or OVA-GC-AuNPs located in subcapsular and medullar sinuses where macrophages are abundant. Our new theragnostic GC-AuNPs present a superior performance in US/PA imaging of lymph nodes without targeting moieties or complex surface modification. Simultaneously, GC-AuNPs were able to deliver tumor antigens to cause macrophages to present the OVA epitope at targeted lymph nodes, which would be valuable for cancer immunotherapy.

Download Full-text