scholarly journals Dynamic Detection and Recognition of Objects Based on Sequential RGB Images

2021 ◽  
Vol 13 (7) ◽  
pp. 176
Author(s):  
Shuai Dong ◽  
Zhihua Yang ◽  
Wensheng Li ◽  
Kun Zou

Conveyors are used commonly in industrial production lines and automated sorting systems. Many applications require fast, reliable, and dynamic detection and recognition for the objects on conveyors. Aiming at this goal, we design a framework that involves three subtasks: one-class instance segmentation (OCIS), multiobject tracking (MOT), and zero-shot fine-grained recognition of 3D objects (ZSFGR3D). A new level set map network (LSMNet) and a multiview redundancy-free feature network (MVRFFNet) are proposed for the first and third subtasks, respectively. The level set map (LSM) is used to annotate instances instead of the traditional multichannel binary mask, and each peak of the LSM represents one instance. Based on the LSM, LSMNet can adopt a pix2pix architecture to segment instances. MVRFFNet is a generalized zero-shot learning (GZSL) framework based on the Wasserstein generative adversarial network for 3D object recognition. Multi-view features of an object are combined into a compact registered feature. By treating the registered features as the category attribution in the GZSL setting, MVRFFNet learns a mapping function that maps original retrieve features into a new redundancy-free feature space. To validate the performance of the proposed methods, a segmentation dataset and a fine-grained classification dataset about objects on a conveyor are established. Experimental results on these datasets show that LSMNet can achieve a recalling accuracy close to the light instance segmentation framework You Only Look At CoefficienTs (YOLACT), while its computing speed on an NVIDIA GTX1660TI GPU is 80 fps, which is much faster than YOLACT‘s 25 fps. Redundancy-free features generated by MVRFFNet perform much better than original features in the retrieval task.

Author(s):  
Wenqi Zhao ◽  
Satoshi Oyama ◽  
Masahito Kurihara

Counterfactual explanations help users to understand the behaviors of machine learning models by changing the inputs for the existing outputs. For an image classification task, an example counterfactual visual explanation explains: "for an example that belongs to class A, what changes do we need to make to the input so that the output is more inclined to class B." Our research considers changing the attribute description text of class A on the basis of the attributes of class B and generating counterfactual images on the basis of the modified text. We can use the prediction results of the model on counterfactual images to find the attributes that have the greatest effect when the model is predicting classes A and B. We applied our method to a fine-grained image classification dataset and used the generative adversarial network to generate natural counterfactual visual explanations. To evaluate these explanations, we used them to assist crowdsourcing workers in an image classification task. We found that, within a specific range, they improved classification accuracy.


2020 ◽  
Vol 13 (6) ◽  
pp. 2949-2964
Author(s):  
Jussi Leinonen ◽  
Alexis Berne

Abstract. The increasing availability of sensors imaging cloud and precipitation particles, like the Multi-Angle Snowflake Camera (MASC), has resulted in datasets comprising millions of images of falling snowflakes. Automated classification is required for effective analysis of such large datasets. While supervised classification methods have been developed for this purpose in recent years, their ability to generalize is limited by the representativeness of their labeled training datasets, which are affected by the subjective judgment of the expert and require significant manual effort to derive. An alternative is unsupervised classification, which seeks to divide a dataset into distinct classes without expert-provided labels. In this paper, we introduce an unsupervised classification scheme based on a generative adversarial network (GAN) that learns to extract the key features from the snowflake images. Each image is then associated with a distribution of points in the feature space, and these distributions are used as the basis of K-medoids classification and hierarchical clustering. We found that the classification scheme is able to separate the dataset into distinct classes, each characterized by a particular size, shape and texture of the snowflake image, providing signatures of the microphysical properties of the snowflakes. This finding is supported by a comparison of the results to an existing supervised scheme. Although training the GAN is computationally intensive, the classification process proceeds directly from images to classes with minimal human intervention and therefore can be repeated for other MASC datasets with minor manual effort. As the algorithm is not specific to snowflakes, we also expect this approach to be relevant to other applications.


2020 ◽  
Vol 2020 ◽  
pp. 1-10
Author(s):  
Linyan Li ◽  
Yu Sun ◽  
Fuyuan Hu ◽  
Tao Zhou ◽  
Xuefeng Xi ◽  
...  

In this paper, we propose an Attentional Concatenation Generative Adversarial Network (ACGAN) aiming at generating 1024 × 1024 high-resolution images. First, we propose a multilevel cascade structure, for text-to-image synthesis. During training progress, we gradually add new layers and, at the same time, use the results and word vectors from the previous layer as inputs to the next layer to generate high-resolution images with photo-realistic details. Second, the deep attentional multimodal similarity model is introduced into the network, and we match word vectors with images in a common semantic space to compute a fine-grained matching loss for training the generator. In this way, we can pay attention to the fine-grained information of the word level in the semantics. Finally, the measure of diversity is added to the discriminator, which enables the generator to obtain more diverse gradient directions and improve the diversity of generated samples. The experimental results show that the inception scores of the proposed model on the CUB and Oxford-102 datasets have reached 4.48 and 4.16, improved by 2.75% and 6.42% compared to Attentional Generative Adversarial Networks (AttenGAN). The ACGAN model has a better effect on text-generated images, and the resulting image is closer to the real image.


Author(s):  
Weirong Liu ◽  
ChengruiJie CaoLiu ◽  
Chenwen Ren ◽  
Yulin Wei ◽  
Honglin Guo

2021 ◽  
Vol 13 (18) ◽  
pp. 3554
Author(s):  
Xiaowei Hu ◽  
Weike Feng ◽  
Yiduo Guo ◽  
Qiang Wang

Even though deep learning (DL) has achieved excellent results on some public data sets for synthetic aperture radar (SAR) automatic target recognition(ATR), several problems exist at present. One is the lack of transparency and interpretability for most of the existing DL networks. Another is the neglect of unknown target classes which are often present in practice. To solve the above problems, a deep generation as well as recognition model is derived based on Conditional Variational Auto-encoder (CVAE) and Generative Adversarial Network (GAN). A feature space for SAR-ATR is built based on the proposed CVAE-GAN model. By using the feature space, clear SAR images can be generated with given class labels and observation angles. Besides, the feature of the SAR image is continuous in the feature space and can represent some attributes of the target. Furthermore, it is possible to classify the known classes and reject the unknown target classes by using the feature space. Experiments on the MSTAR data set validate the advantages of the proposed method.


2019 ◽  
Vol 11 (22) ◽  
pp. 2631 ◽  
Author(s):  
Bo Fang ◽  
Rong Kou ◽  
Li Pan ◽  
Pengfei Chen

Since manually labeling aerial images for pixel-level classification is expensive and time-consuming, developing strategies for land cover mapping without reference labels is essential and meaningful. As an efficient solution for this issue, domain adaptation has been widely utilized in numerous semantic labeling-based applications. However, current approaches generally pursue the marginal distribution alignment between the source and target features and ignore the category-level alignment. Therefore, directly applying them to land cover mapping leads to unsatisfactory performance in the target domain. In our research, to address this problem, we embed a geometry-consistent generative adversarial network (GcGAN) into a co-training adversarial learning network (CtALN), and then develop a category-sensitive domain adaptation (CsDA) method for land cover mapping using very-high-resolution (VHR) optical aerial images. The GcGAN aims to eliminate the domain discrepancies between labeled and unlabeled images while retaining their intrinsic land cover information by translating the features of the labeled images from the source domain to the target domain. Meanwhile, the CtALN aims to learn a semantic labeling model in the target domain with the translated features and corresponding reference labels. By training this hybrid framework, our method learns to distill knowledge from the source domain and transfers it to the target domain, while preserving not only global domain consistency, but also category-level consistency between labeled and unlabeled images in the feature space. The experimental results between two airborne benchmark datasets and the comparison with other state-of-the-art methods verify the robustness and superiority of our proposed CsDA.


2020 ◽  
Vol 34 (07) ◽  
pp. 11157-11164
Author(s):  
Sheng Jin ◽  
Shangchen Zhou ◽  
Yao Liu ◽  
Chao Chen ◽  
Xiaoshuai Sun ◽  
...  

Deep hashing methods have been proved to be effective and efficient for large-scale Web media search. The success of these data-driven methods largely depends on collecting sufficient labeled data, which is usually a crucial limitation in practical cases. The current solutions to this issue utilize Generative Adversarial Network (GAN) to augment data in semi-supervised learning. However, existing GAN-based methods treat image generations and hashing learning as two isolated processes, leading to generation ineffectiveness. Besides, most works fail to exploit the semantic information in unlabeled data. In this paper, we propose a novel Semi-supervised Self-pace Adversarial Hashing method, named SSAH to solve the above problems in a unified framework. The SSAH method consists of an adversarial network (A-Net) and a hashing network (H-Net). To improve the quality of generative images, first, the A-Net learns hard samples with multi-scale occlusions and multi-angle rotated deformations which compete against the learning of accurate hashing codes. Second, we design a novel self-paced hard generation policy to gradually increase the hashing difficulty of generated samples. To make use of the semantic information in unlabeled ones, we propose a semi-supervised consistent loss. The experimental results show that our method can significantly improve state-of-the-art models on both the widely-used hashing datasets and fine-grained datasets.


Sensors ◽  
2021 ◽  
Vol 21 (6) ◽  
pp. 2250
Author(s):  
Leyuan Liu ◽  
Rubin Jiang ◽  
Jiao Huo ◽  
Jingying Chen

Facial expression recognition (FER) is a challenging problem due to the intra-class variation caused by subject identities. In this paper, a self-difference convolutional network (SD-CNN) is proposed to address the intra-class variation issue in FER. First, the SD-CNN uses a conditional generative adversarial network to generate the six typical facial expressions for the same subject in the testing image. Second, six compact and light-weighted difference-based CNNs, called DiffNets, are designed for classifying facial expressions. Each DiffNet extracts a pair of deep features from the testing image and one of the six synthesized expression images, and compares the difference between the deep feature pair. In this way, any potential facial expression in the testing image has an opportunity to be compared with the synthesized “Self”—an image of the same subject with the same facial expression as the testing image. As most of the self-difference features of the images with the same facial expression gather tightly in the feature space, the intra-class variation issue is significantly alleviated. The proposed SD-CNN is extensively evaluated on two widely-used facial expression datasets: CK+ and Oulu-CASIA. Experimental results demonstrate that the SD-CNN achieves state-of-the-art performance with accuracies of 99.7% on CK+ and 91.3% on Oulu-CASIA, respectively. Moreover, the model size of the online processing part of the SD-CNN is only 9.54 MB (1.59 MB ×6), which enables the SD-CNN to run on low-cost hardware.


2021 ◽  
Vol 13 (4) ◽  
pp. 548
Author(s):  
Xiaokang Zhang ◽  
Man-On Pun ◽  
Ming Liu

Using remote sensing techniques to monitor landslides and their resultant land cover changes is fundamentally important for risk assessment and hazard prevention. Despite enormous efforts in developing intelligent landslide mapping (LM) approaches, LM remains challenging owing to high spectral heterogeneity of very-high-resolution (VHR) images and the daunting labeling efforts. To this end, a deep learning model based on semi-supervised multi-temporal deep representation fusion network, namely SMDRF-Net, is proposed for reliable and efficient LM. In comparison with previous methods, the SMDRF-Net possesses three distinct properties. (1) Unsupervised deep representation learning at the pixel- and object-level is performed by transfer learning using the Wasserstein generative adversarial network with gradient penalty to learn discriminative deep features and retain precise outlines of landslide objects in the high-level feature space. (2) Attention-based adaptive fusion of multi-temporal and multi-level deep representations is developed to exploit the spatio-temporal dependencies of deep representations and enhance the feature representation capability of the network. (3) The network is optimized using limited samples with pseudo-labels that are automatically generated based on a comprehensive uncertainty index. Experimental results from the analysis of VHR aerial orthophotos demonstrate the reliability and robustness of the proposed approach for LM in comparison with state-of-the-art methods.


Sign in / Sign up

Export Citation Format

Share Document