Dynamic Detection and Recognition of Objects Based on Sequential RGB Images

Shuai Dong; Zhihua Yang; Wensheng Li; Kun Zou

doi:10.3390/fi13070176

Dynamic Detection and Recognition of Objects Based on Sequential RGB Images

Future Internet ◽

10.3390/fi13070176 ◽

2021 ◽

Vol 13 (7) ◽

pp. 176

Author(s):

Shuai Dong ◽

Zhihua Yang ◽

Wensheng Li ◽

Kun Zou

Keyword(s):

Level Set ◽

Mapping Function ◽

Feature Space ◽

Retrieval Task ◽

Generative Adversarial Network ◽

Fine Grained ◽

Adversarial Network ◽

Dynamic Detection ◽

Instance Segmentation ◽

Detection And Recognition

Conveyors are used commonly in industrial production lines and automated sorting systems. Many applications require fast, reliable, and dynamic detection and recognition for the objects on conveyors. Aiming at this goal, we design a framework that involves three subtasks: one-class instance segmentation (OCIS), multiobject tracking (MOT), and zero-shot fine-grained recognition of 3D objects (ZSFGR3D). A new level set map network (LSMNet) and a multiview redundancy-free feature network (MVRFFNet) are proposed for the first and third subtasks, respectively. The level set map (LSM) is used to annotate instances instead of the traditional multichannel binary mask, and each peak of the LSM represents one instance. Based on the LSM, LSMNet can adopt a pix2pix architecture to segment instances. MVRFFNet is a generalized zero-shot learning (GZSL) framework based on the Wasserstein generative adversarial network for 3D object recognition. Multi-view features of an object are combined into a compact registered feature. By treating the registered features as the category attribution in the GZSL setting, MVRFFNet learns a mapping function that maps original retrieve features into a new redundancy-free feature space. To validate the performance of the proposed methods, a segmentation dataset and a fine-grained classification dataset about objects on a conveyor are established. Experimental results on these datasets show that LSMNet can achieve a recalling accuracy close to the light instance segmentation framework You Only Look At CoefficienTs (YOLACT), while its computing speed on an NVIDIA GTX1660TI GPU is 80 fps, which is much faster than YOLACT‘s 25 fps. Redundancy-free features generated by MVRFFNet perform much better than original features in the retrieval task.

Download Full-text

Generating Natural Counterfactual Visual Explanations

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/742 ◽

2020 ◽

Author(s):

Wenqi Zhao ◽

Satoshi Oyama ◽

Masahito Kurihara

Keyword(s):

Image Classification ◽

Classification Accuracy ◽

Classification Task ◽

Learning Models ◽

Generative Adversarial Network ◽

Fine Grained ◽

Class A ◽

Adversarial Network ◽

Class B ◽

Machine Learning Models

Counterfactual explanations help users to understand the behaviors of machine learning models by changing the inputs for the existing outputs. For an image classification task, an example counterfactual visual explanation explains: "for an example that belongs to class A, what changes do we need to make to the input so that the output is more inclined to class B." Our research considers changing the attribute description text of class A on the basis of the attributes of class B and generating counterfactual images on the basis of the modified text. We can use the prediction results of the model on counterfactual images to find the attributes that have the greatest effect when the model is predicting classes A and B. We applied our method to a fine-grained image classification dataset and used the generative adversarial network to generate natural counterfactual visual explanations. To evaluate these explanations, we used them to assist crowdsourcing workers in an image classification task. We found that, within a specific range, they improved classification accuracy.

Download Full-text

Unsupervised classification of snowflake images using a generative adversarial network and <i>K</i>-medoids classification

Atmospheric Measurement Techniques ◽

10.5194/amt-13-2949-2020 ◽

2020 ◽

Vol 13 (6) ◽

pp. 2949-2964

Author(s):

Jussi Leinonen ◽

Alexis Berne

Keyword(s):

Classification Scheme ◽

Feature Space ◽

Unsupervised Classification ◽

Automated Classification ◽

Generative Adversarial Network ◽

Microphysical Properties ◽

Adversarial Network ◽

Computationally Intensive ◽

Comparison Of The Results ◽

Supervised Classification Methods

Abstract. The increasing availability of sensors imaging cloud and precipitation particles, like the Multi-Angle Snowflake Camera (MASC), has resulted in datasets comprising millions of images of falling snowflakes. Automated classification is required for effective analysis of such large datasets. While supervised classification methods have been developed for this purpose in recent years, their ability to generalize is limited by the representativeness of their labeled training datasets, which are affected by the subjective judgment of the expert and require significant manual effort to derive. An alternative is unsupervised classification, which seeks to divide a dataset into distinct classes without expert-provided labels. In this paper, we introduce an unsupervised classification scheme based on a generative adversarial network (GAN) that learns to extract the key features from the snowflake images. Each image is then associated with a distribution of points in the feature space, and these distributions are used as the basis of K-medoids classification and hierarchical clustering. We found that the classification scheme is able to separate the dataset into distinct classes, each characterized by a particular size, shape and texture of the snowflake image, providing signatures of the microphysical properties of the snowflakes. This finding is supported by a comparison of the results to an existing supervised scheme. Although training the GAN is computationally intensive, the classification process proceeds directly from images to classes with minimal human intervention and therefore can be repeated for other MASC datasets with minor manual effort. As the algorithm is not specific to snowflakes, we also expect this approach to be relevant to other applications.

Download Full-text

Text to Realistic Image Generation with Attentional Concatenation Generative Adversarial Networks

Discrete Dynamics in Nature and Society ◽

10.1155/2020/6452536 ◽

2020 ◽

Vol 2020 ◽

pp. 1-10

Author(s):

Linyan Li ◽

Yu Sun ◽

Fuyuan Hu ◽

Tao Zhou ◽

Xuefeng Xi ◽

...

Keyword(s):

High Resolution ◽

Image Synthesis ◽

Semantic Space ◽

Generative Adversarial Networks ◽

Generative Adversarial Network ◽

Fine Grained ◽

Adversarial Network ◽

Adversarial Networks ◽

Cascade Structure ◽

High Resolution Images

In this paper, we propose an Attentional Concatenation Generative Adversarial Network (ACGAN) aiming at generating 1024 × 1024 high-resolution images. First, we propose a multilevel cascade structure, for text-to-image synthesis. During training progress, we gradually add new layers and, at the same time, use the results and word vectors from the previous layer as inputs to the next layer to generate high-resolution images with photo-realistic details. Second, the deep attentional multimodal similarity model is introduced into the network, and we match word vectors with images in a common semantic space to compute a fine-grained matching loss for training the generator. In this way, we can pay attention to the fine-grained information of the word level in the semantics. Finally, the measure of diversity is added to the discriminator, which enables the generator to obtain more diverse gradient directions and improve the diversity of generated samples. The experimental results show that the inception scores of the proposed model on the CUB and Oxford-102 datasets have reached 4.48 and 4.16, improved by 2.75% and 6.42% compared to Attentional Generative Adversarial Networks (AttenGAN). The ACGAN model has a better effect on text-generated images, and the resulting image is closer to the real image.

Download Full-text

Fine-grained Image Inpainting with Scale-Enhanced Generative Adversarial Network

Pattern Recognition Letters ◽

10.1016/j.patrec.2020.12.008 ◽

2021 ◽

Author(s):

Weirong Liu ◽

ChengruiJie CaoLiu ◽

Chenwen Ren ◽

Yulin Wei ◽

Honglin Guo

Keyword(s):

Image Inpainting ◽

Generative Adversarial Network ◽

Fine Grained ◽

Adversarial Network

Download Full-text

Feature Learning for SAR Target Recognition with Unknown Classes by Using CVAE-GAN

Remote Sensing ◽

10.3390/rs13183554 ◽

2021 ◽

Vol 13 (18) ◽

pp. 3554

Author(s):

Xiaowei Hu ◽

Weike Feng ◽

Yiduo Guo ◽

Qiang Wang

Keyword(s):

Target Recognition ◽

Feature Learning ◽

Feature Space ◽

Automatic Target Recognition ◽

Data Sets ◽

Generative Adversarial Network ◽

Data Set ◽

Adversarial Network ◽

Public Data ◽

Class Labels

Even though deep learning (DL) has achieved excellent results on some public data sets for synthetic aperture radar (SAR) automatic target recognition(ATR), several problems exist at present. One is the lack of transparency and interpretability for most of the existing DL networks. Another is the neglect of unknown target classes which are often present in practice. To solve the above problems, a deep generation as well as recognition model is derived based on Conditional Variational Auto-encoder (CVAE) and Generative Adversarial Network (GAN). A feature space for SAR-ATR is built based on the proposed CVAE-GAN model. By using the feature space, clear SAR images can be generated with given class labels and observation angles. Besides, the feature of the SAR image is continuous in the feature space and can represent some attributes of the target. Furthermore, it is possible to classify the known classes and reject the unknown target classes by using the feature space. Experiments on the MSTAR data set validate the advantages of the proposed method.

Download Full-text

Category-Sensitive Domain Adaptation for Land Cover Mapping in Aerial Scenes

Remote Sensing ◽

10.3390/rs11222631 ◽

2019 ◽

Vol 11 (22) ◽

pp. 2631 ◽

Cited By ~ 2

Author(s):

Bo Fang ◽

Rong Kou ◽

Li Pan ◽

Pengfei Chen

Keyword(s):

Land Cover ◽

Domain Adaptation ◽

Feature Space ◽

Aerial Images ◽

Land Cover Mapping ◽

Target Domain ◽

Generative Adversarial Network ◽

Source Domain ◽

Adversarial Network ◽

Semantic Labeling

Since manually labeling aerial images for pixel-level classification is expensive and time-consuming, developing strategies for land cover mapping without reference labels is essential and meaningful. As an efficient solution for this issue, domain adaptation has been widely utilized in numerous semantic labeling-based applications. However, current approaches generally pursue the marginal distribution alignment between the source and target features and ignore the category-level alignment. Therefore, directly applying them to land cover mapping leads to unsatisfactory performance in the target domain. In our research, to address this problem, we embed a geometry-consistent generative adversarial network (GcGAN) into a co-training adversarial learning network (CtALN), and then develop a category-sensitive domain adaptation (CsDA) method for land cover mapping using very-high-resolution (VHR) optical aerial images. The GcGAN aims to eliminate the domain discrepancies between labeled and unlabeled images while retaining their intrinsic land cover information by translating the features of the labeled images from the source domain to the target domain. Meanwhile, the CtALN aims to learn a semantic labeling model in the target domain with the translated features and corresponding reference labels. By training this hybrid framework, our method learns to distill knowledge from the source domain and transfers it to the target domain, while preserving not only global domain consistency, but also category-level consistency between labeled and unlabeled images in the feature space. The experimental results between two airborne benchmark datasets and the comparison with other state-of-the-art methods verify the robustness and superiority of our proposed CsDA.

Download Full-text

SSAH: Semi-Supervised Adversarial Deep Hashing with Self-Paced Hard Sample Generation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6773 ◽

2020 ◽

Vol 34 (07) ◽

pp. 11157-11164

Author(s):

Sheng Jin ◽

Shangchen Zhou ◽

Yao Liu ◽

Chao Chen ◽

Xiaoshuai Sun ◽

...

Keyword(s):

Large Scale ◽

Semantic Information ◽

Unified Framework ◽

Generative Adversarial Network ◽

Fine Grained ◽

Multi Scale ◽

Deep Hashing ◽

Adversarial Network ◽

Improve State

Deep hashing methods have been proved to be effective and efficient for large-scale Web media search. The success of these data-driven methods largely depends on collecting sufficient labeled data, which is usually a crucial limitation in practical cases. The current solutions to this issue utilize Generative Adversarial Network (GAN) to augment data in semi-supervised learning. However, existing GAN-based methods treat image generations and hashing learning as two isolated processes, leading to generation ineffectiveness. Besides, most works fail to exploit the semantic information in unlabeled data. In this paper, we propose a novel Semi-supervised Self-pace Adversarial Hashing method, named SSAH to solve the above problems in a unified framework. The SSAH method consists of an adversarial network (A-Net) and a hashing network (H-Net). To improve the quality of generative images, first, the A-Net learns hard samples with multi-scale occlusions and multi-angle rotated deformations which compete against the learning of accurate hashing codes. Second, we design a novel self-paced hard generation policy to gradually increase the hashing difficulty of generated samples. To make use of the semantic information in unlabeled ones, we propose a semi-supervised consistent loss. The experimental results show that our method can significantly improve state-of-the-art models on both the widely-used hashing datasets and fine-grained datasets.

Download Full-text

DR-GAN: Conditional Generative Adversarial Network for Fine-Grained Lesion Synthesis on Diabetic Retinopathy Images

IEEE Journal of Biomedical and Health Informatics ◽

10.1109/jbhi.2020.3045475 ◽

2020 ◽

pp. 1-1

Author(s):

Yi Zhou ◽

Boyang Wang ◽

Xiaodong He ◽

Shanshan Cui ◽

Ling Shao

Keyword(s):

Diabetic Retinopathy ◽

Generative Adversarial Network ◽

Fine Grained ◽

Adversarial Network

Download Full-text

Self-Difference Convolutional Neural Network for Facial Expression Recognition

Sensors ◽

10.3390/s21062250 ◽

2021 ◽

Vol 21 (6) ◽

pp. 2250

Author(s):

Leyuan Liu ◽

Rubin Jiang ◽

Jiao Huo ◽

Jingying Chen

Keyword(s):

Facial Expression ◽

Facial Expressions ◽

Facial Expression Recognition ◽

Low Cost ◽

Feature Space ◽

Expression Recognition ◽

Generative Adversarial Network ◽

Convolutional Network ◽

Online Processing ◽

Adversarial Network

Facial expression recognition (FER) is a challenging problem due to the intra-class variation caused by subject identities. In this paper, a self-difference convolutional network (SD-CNN) is proposed to address the intra-class variation issue in FER. First, the SD-CNN uses a conditional generative adversarial network to generate the six typical facial expressions for the same subject in the testing image. Second, six compact and light-weighted difference-based CNNs, called DiffNets, are designed for classifying facial expressions. Each DiffNet extracts a pair of deep features from the testing image and one of the six synthesized expression images, and compares the difference between the deep feature pair. In this way, any potential facial expression in the testing image has an opportunity to be compared with the synthesized “Self”—an image of the same subject with the same facial expression as the testing image. As most of the self-difference features of the images with the same facial expression gather tightly in the feature space, the intra-class variation issue is significantly alleviated. The proposed SD-CNN is extensively evaluated on two widely-used facial expression datasets: CK+ and Oulu-CASIA. Experimental results demonstrate that the SD-CNN achieves state-of-the-art performance with accuracies of 99.7% on CK+ and 91.3% on Oulu-CASIA, respectively. Moreover, the model size of the online processing part of the SD-CNN is only 9.54 MB (1.59 MB ×6), which enables the SD-CNN to run on low-cost hardware.

Download Full-text

Semi-Supervised Multi-Temporal Deep Representation Fusion Network for Landslide Mapping from Aerial Orthophotos

Remote Sensing ◽

10.3390/rs13040548 ◽

2021 ◽

Vol 13 (4) ◽

pp. 548

Author(s):

Xiaokang Zhang ◽

Man-On Pun ◽

Ming Liu

Keyword(s):

Feature Space ◽

Representation Learning ◽

Feature Representation ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Multi Temporal ◽

Landslide Mapping ◽

Spatio Temporal ◽

Reliability And Robustness ◽

High Level

Using remote sensing techniques to monitor landslides and their resultant land cover changes is fundamentally important for risk assessment and hazard prevention. Despite enormous efforts in developing intelligent landslide mapping (LM) approaches, LM remains challenging owing to high spectral heterogeneity of very-high-resolution (VHR) images and the daunting labeling efforts. To this end, a deep learning model based on semi-supervised multi-temporal deep representation fusion network, namely SMDRF-Net, is proposed for reliable and efficient LM. In comparison with previous methods, the SMDRF-Net possesses three distinct properties. (1) Unsupervised deep representation learning at the pixel- and object-level is performed by transfer learning using the Wasserstein generative adversarial network with gradient penalty to learn discriminative deep features and retain precise outlines of landslide objects in the high-level feature space. (2) Attention-based adaptive fusion of multi-temporal and multi-level deep representations is developed to exploit the spatio-temporal dependencies of deep representations and enhance the feature representation capability of the network. (3) The network is optimized using limited samples with pseudo-labels that are automatically generated based on a comprehensive uncertainty index. Experimental results from the analysis of VHR aerial orthophotos demonstrate the reliability and robustness of the proposed approach for LM in comparison with state-of-the-art methods.

Download Full-text