scholarly journals Dual Generative Network with Discriminative Information for Generalized Zero-Shot Learning

Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Tingting Xu ◽  
Ye Zhao ◽  
Xueliang Liu

Zero-shot learning is dedicated to solving the classification problem of unseen categories, while generalized zero-shot learning aims to classify the samples selected from both seen classes and unseen classes, in which “seen” and “unseen” classes indicate whether they can be used in the training process, and if so, they indicate seen classes, and vice versa. Nowadays, with the promotion of deep learning technology, the performance of zero-shot learning has been greatly improved. Generalized zero-shot learning is a challenging topic that has promising prospects in many realistic scenarios. Although the zero-shot learning task has made gratifying progress, there is still a strong deviation between seen classes and unseen classes in the existing methods. Recent methods focus on learning a unified semantic-aligned visual representation to transfer knowledge between two domains, while ignoring the intrinsic characteristics of visual features which are discriminative enough to be classified by itself. To solve the above problems, we propose a novel model that uses the discriminative information of visual features to optimize the generative module, in which the generative module is a dual generation network framework composed of conditional VAE and improved WGAN. Specifically, the model uses the discrimination information of visual features, according to the relevant semantic embedding, synthesizes the visual features of unseen categories by using the learned generator, and then trains the final softmax classifier by using the generated visual features, thus realizing the recognition of unseen categories. In addition, this paper also analyzes the effect of the additional classifiers with different structures on the transmission of discriminative information. We have conducted a lot of experiments on six commonly used benchmark datasets (AWA1, AWA2, APY, FLO, SUN, and CUB). The experimental results show that our model outperforms several state-of-the-art methods for both traditional as well as generalized zero-shot learning.

2018 ◽  
Vol 2018 ◽  
pp. 1-8 ◽  
Author(s):  
Sujin Lee ◽  
Incheol Kim

Video captioning refers to the task of generating a natural language sentence that explains the content of the input video clips. This study proposes a deep neural network model for effective video captioning. Apart from visual features, the proposed model learns additionally semantic features that describe the video content effectively. In our model, visual features of the input video are extracted using convolutional neural networks such as C3D and ResNet, while semantic features are obtained using recurrent neural networks such as LSTM. In addition, our model includes an attention-based caption generation network to generate the correct natural language captions based on the multimodal video feature sequences. Various experiments, conducted with the two large benchmark datasets, Microsoft Video Description (MSVD) and Microsoft Research Video-to-Text (MSR-VTT), demonstrate the performance of the proposed model.


2014 ◽  
Vol 536-537 ◽  
pp. 394-398 ◽  
Author(s):  
Tao Guo ◽  
Gui Yang Li

Multi-label classification (MLC) is a machine learning task aiming to predict multiple labels for a given instance. The widely known binary relevance (BR) learns one classifier for each label without considering the correlation among labels. In this paper, an improved binary relevance algorithm (IBRAM) is proposed. This algorithm is derived form binary relevance method. It sets two layers to decompose the multi-label classification problem into L independent binary classification problems respectively. In the first layer, binary classifier is built one for each label. In the second layer, the label information from the first layer is fully used to help to generate final predicting by consider the correlation among labels. Experiments on benchmark datasets validate the effectiveness of proposed approach against other well-established methods.


2021 ◽  
Vol 187 ◽  
pp. 106267
Author(s):  
Sumaira Ghazal ◽  
Waqar S. Qureshi ◽  
Umar S. Khan ◽  
Javaid Iqbal ◽  
Nasir Rashid ◽  
...  

2014 ◽  
Vol 2014 ◽  
pp. 1-11 ◽  
Author(s):  
Ziqiang Wang ◽  
Xia Sun ◽  
Lijun Sun ◽  
Yuchun Huang

In many image classification applications, it is common to extract multiple visual features from different views to describe an image. Since different visual features have their own specific statistical properties and discriminative powers for image classification, the conventional solution for multiple view data is to concatenate these feature vectors as a new feature vector. However, this simple concatenation strategy not only ignores the complementary nature of different views, but also ends up with “curse of dimensionality.” To address this problem, we propose a novel multiview subspace learning algorithm in this paper, named multiview discriminative geometry preserving projection (MDGPP) for feature extraction and classification. MDGPP can not only preserve the intraclass geometry and interclass discrimination information under a single view, but also explore the complementary property of different views to obtain a low-dimensional optimal consensus embedding by using an alternating-optimization-based iterative algorithm. Experimental results on face recognition and facial expression recognition demonstrate the effectiveness of the proposed algorithm.


2021 ◽  
Vol 2050 (1) ◽  
pp. 012006
Author(s):  
Xili Dai ◽  
Chunmei Ma ◽  
Jingwei Sun ◽  
Tao Zhang ◽  
Haigang Gong ◽  
...  

Abstract Training deep neural networks from only a few examples has been an interesting topic that motivated few shot learning. In this paper, we study the fine-grained image classification problem in a challenging few-shot learning setting, and propose the Self-Amplificated Network (SAN), a method based on meta-learning to tackle this problem. The SAN model consists of three parts, which are the Encoder, Amplification and Similarity Modules. The Encoder Module encodes a fine-grained image input into a feature vector. The Amplification Module is used to amplify subtle differences between fine-grained images based on the self attention mechanism which is composed of multi-head attention. The Similarity Module measures how similar the query image and the support set are in order to determine the classification result. In-depth experiments on three benchmark datasets have showcased that our network achieves superior performance over the competing baselines.


Author(s):  
Kensuke Naoe ◽  
Hideyasu Sasaki ◽  
Yoshiyasu Takefuji

The Service-Oriented Architecture (SOA) demands supportive technologies and new requirements for mobile collaboration across multiple platforms. One of its representative solutions is intelligent information security of enterprise resources for collaboration systems and services. Digital watermarking became a key technology for protecting copyrights. In this article, the authors propose a method of key generation scheme for static visual digital watermarking by using machine learning technology, neural network as its exemplary approach for machine learning method. The proposed method is to provide intelligent mobile collaboration with secure data transactions using machine learning approaches, herein neural network approach as an exemplary technology. First, the proposed method of key generation is to extract certain type of bit patterns in the forms of visual features out of visual objects or data as training data set for machine learning of digital watermark. Second, the proposed method of watermark extraction is processed by presenting visual features of the target visual image into extraction key or herein is a classifier generated in advance by the training approach of machine learning technology. Third, the training approach is to generate the extraction key, which is conditioned to generate watermark signal patterns, only if proper visual features are presented to the classifier. In the proposed method, this classifier which is generated by the machine learning process is used as watermark extraction key. The proposed method is to contribute to secure visual information hiding without losing any detailed data of visual objects or any additional resources of hiding visual objects as molds to embed hidden visual objects. In the experiments, they have shown that our proposed method is robust to high pass filtering and JPEG compression. The proposed method is limited in its applications on the positions of the feature sub-blocks, especially on geometric attacks like shrinking or rotation of the image.


2019 ◽  
Vol 9 (19) ◽  
pp. 4036 ◽  
Author(s):  
You ◽  
Wu ◽  
Lee ◽  
Liu

Multi-class classification is a very important technique in engineering applications, e.g., mechanical systems, mechanics and design innovations, applied materials in nanotechnologies, etc. A large amount of research is done for single-label classification where objects are associated with a single category. However, in many application domains, an object can belong to two or more categories, and multi-label classification is needed. Traditionally, statistical methods were used; recently, machine learning techniques, in particular neural networks, have been proposed to solve the multi-class classification problem. In this paper, we develop radial basis function (RBF)-based neural network schemes for single-label and multi-label classification, respectively. The number of hidden nodes and the parameters involved with the basis functions are determined automatically by applying an iterative self-constructing clustering algorithm to the given training dataset, and biases and weights are derived optimally by least squares. Dimensionality reduction techniques are adopted and integrated to help reduce the overfitting problem associated with the RBF networks. Experimental results from benchmark datasets are presented to show the effectiveness of the proposed schemes.


Robotics ◽  
2020 ◽  
Vol 9 (2) ◽  
pp. 40
Author(s):  
Hirokazu Madokoro ◽  
Hanwool Woo ◽  
Stephanie Nix ◽  
Kazuhito Sato

This study was conducted to develop original benchmark datasets that simultaneously include indoor–outdoor visual features. Indoor visual information related to images includes outdoor features to a degree that varies extremely by time, weather, and season. We obtained time-series scene images using a wide field of view (FOV) camera mounted on a mobile robot moving along a 392-m route in an indoor environment surrounded by transparent glass walls and windows for two directions in three seasons. For this study, we propose a unified method for extracting, characterizing, and recognizing visual landmarks that are robust to human occlusion in a real environment in which robots coexist with people. Using our method, we conducted an evaluation experiment to recognize scenes divided up to 64 zones with fixed intervals. The experimentally obtained results using the datasets revealed the performance and characteristics of meta-parameter optimization, mapping characteristics to category maps, and recognition accuracy. Moreover, we visualized similarities between scene images using category maps. We also identified cluster boundaries obtained from mapping weights.


2014 ◽  
Vol 551 ◽  
pp. 302-308
Author(s):  
Tao Guo ◽  
Gui Yang Li

In multi-label classification, each training example is associated with a set of labels and the task for classification is to predict the proper label set for each unseen instance. Recently, multi-label classification methods mainly focus on exploiting the label correlations to improve the accuracy of individual multi-label learner. In this paper, an improved method derived from binary relevance named double layer classifier chaining (DCC) is proposed. This algorithm decomposes the multi-label classification problem into two stages classification process to generate classifier chain. Each classifier in the chain is responsible for learning and predicting the binary association of the label given the attribute space, augmented by all prior binary relevance predictions in the chain. This chaining allows DCC to take into account correlations in the label space. Experiments on benchmark datasets validate the effectiveness of proposed approach comparing with other well-established methods.


Author(s):  
Yiyi Zhou ◽  
Rongrong Ji ◽  
Jinsong Su ◽  
Xiangming Li ◽  
Xiaoshuai Sun

In this paper, we uncover the issue of knowledge inertia in visual question answering (VQA), which commonly exists in most VQA models and forces the models to mainly rely on the question content to “guess” answer, without regard to the visual information. Such an issue not only impairs the performance of VQA models, but also greatly reduces the credibility of the answer prediction. To this end, simply highlighting the visual features in the model is undoable, since the prediction is built upon the joint modeling of two modalities and largely influenced by the data distribution. In this paper, we propose a Pairwise Inconformity Learning (PIL) to tackle the issue of knowledge inertia. In particular, PIL takes full advantage of the similar image pairs with diverse answers to an identical question provided in VQA2.0 dataset. It builds a multi-modal embedding space to project pos./neg. feature pairs, upon which word vectors of answers are modeled as anchors. By doing so, PIL strengthens the importance of visual features in prediction with a novel dynamic-margin based triplet loss that efficiently increases the semantic discrepancies between pos./neg. image pairs. To verify the proposed PIL, we plug it on a baseline VQA model as well as a set of recent VQA models, and conduct extensive experiments on two benchmark datasets, i.e., VQA1.0 and VQA2.0. Experimental results show that PIL can boost the accuracy of the existing VQA models (1.56%-2.93% gain) with a negligible increase in parameters (0.85%-5.4% parameters). Qualitative results also reveal the elimination of knowledge inertia in the existing VQA models after implementing our PIL.


Sign in / Sign up

Export Citation Format

Share Document