scholarly journals Generative Adversarial Networks in Computer Vision

2021 ◽  
Vol 54 (2) ◽  
pp. 1-38
Author(s):  
Zhengwei Wang ◽  
Qi She ◽  
Tomás E. Ward

Generative adversarial networks (GANs) have been extensively studied in the past few years. Arguably their most significant impact has been in the area of computer vision where great advances have been made in challenges such as plausible image generation, image-to-image translation, facial attribute manipulation, and similar domains. Despite the significant successes achieved to date, applying GANs to real-world problems still poses significant challenges, three of which we focus on here. These are as follows: (1) the generation of high quality images, (2) diversity of image generation, and (3) stabilizing training. Focusing on the degree to which popular GAN technologies have made progress against these challenges, we provide a detailed review of the state-of-the-art in GAN-related research in the published scientific literature. We further structure this review through a convenient taxonomy we have adopted based on variations in GAN architectures and loss functions. While several reviews for GANs have been presented to date, none have considered the status of this field based on their progress toward addressing practical challenges relevant to computer vision. Accordingly, we review and critically discuss the most popular architecture-variant, and loss-variant GANs, for tackling these challenges. Our objective is to provide an overview as well as a critical analysis of the status of GAN research in terms of relevant progress toward critical computer vision application requirements. As we do this we also discuss the most compelling applications in computer vision in which GANs have demonstrated considerable success along with some suggestions for future research directions. Codes related to the GAN-variants studied in this work is summarized on https://github.com/sheqi/GAN_Review.

Symmetry ◽  
2020 ◽  
Vol 12 (10) ◽  
pp. 1705
Author(s):  
Aziz Alotaibi

Many image processing, computer graphics, and computer vision problems can be treated as image-to-image translation tasks. Such translation entails learning to map one visual representation of a given input to another representation. Image-to-image translation with generative adversarial networks (GANs) has been intensively studied and applied to various tasks, such as multimodal image-to-image translation, super-resolution translation, object transfiguration-related translation, etc. However, image-to-image translation techniques suffer from some problems, such as mode collapse, instability, and a lack of diversity. This article provides a comprehensive overview of image-to-image translation based on GAN algorithms and its variants. It also discusses and analyzes current state-of-the-art image-to-image translation techniques that are based on multimodal and multidomain representations. Finally, open issues and future research directions utilizing reinforcement learning and three-dimensional (3D) modal translation are summarized and discussed.


2022 ◽  
pp. 98-110
Author(s):  
Md Fazle Rabby ◽  
Md Abdullah Al Momin ◽  
Xiali Hei

Generative adversarial networks have been a highly focused research topic in computer vision, especially in image synthesis and image-to-image translation. There are a lot of variations in generative nets, and different GANs are suitable for different applications. In this chapter, the authors investigated conditional generative adversarial networks to generate fake images, such as handwritten signatures. The authors demonstrated an implementation of conditional generative adversarial networks, which can generate fake handwritten signatures according to a condition vector tailored by humans.


2021 ◽  
Vol 54 (6) ◽  
pp. 1-38
Author(s):  
Zhipeng Cai ◽  
Zuobin Xiong ◽  
Honghui Xu ◽  
Peng Wang ◽  
Wei Li ◽  
...  

Generative Adversarial Networks (GANs) have promoted a variety of applications in computer vision and natural language processing, among others, due to its generative model’s compelling ability to generate realistic examples plausibly drawn from an existing distribution of samples. GAN not only provides impressive performance on data generation-based tasks but also stimulates fertilization for privacy and security oriented research because of its game theoretic optimization strategy. Unfortunately, there are no comprehensive surveys on GAN in privacy and security, which motivates this survey to summarize systematically. The existing works are classified into proper categories based on privacy and security functions, and this survey conducts a comprehensive analysis of their advantages and drawbacks. Considering that GAN in privacy and security is still at a very initial stage and has imposed unique challenges that are yet to be well addressed, this article also sheds light on some potential privacy and security applications with GAN and elaborates on some future research directions.


2020 ◽  
Vol 34 (07) ◽  
pp. 10981-10988
Author(s):  
Mengxiao Hu ◽  
Jinlong Li ◽  
Maolin Hu ◽  
Tao Hu

In conditional Generative Adversarial Networks (cGANs), when two different initial noises are concatenated with the same conditional information, the distance between their outputs is relatively smaller, which makes minor modes likely to collapse into large modes. To prevent this happen, we proposed a hierarchical mode exploring method to alleviate mode collapse in cGANs by introducing a diversity measurement into the objective function as the regularization term. We also introduced the Expected Ratios of Expansion (ERE) into the regularization term, by minimizing the sum of differences between the real change of distance and ERE, we can control the diversity of generated images w.r.t specific-level features. We validated the proposed algorithm on four conditional image synthesis tasks including categorical generation, paired and un-paired image translation and text-to-image generation. Both qualitative and quantitative results show that the proposed method is effective in alleviating the mode collapse problem in cGANs, and can control the diversity of output images w.r.t specific-level features.


Electronics ◽  
2021 ◽  
Vol 10 (10) ◽  
pp. 1216
Author(s):  
Sung-Wook Park ◽  
Jae-Sub Ko ◽  
Jun-Ho Huh ◽  
Jong-Chan Kim

The emergence of deep learning model GAN (Generative Adversarial Networks) is an important turning point in generative modeling. GAN is more powerful in feature and expression learning compared to machine learning-based generative model algorithms. Nowadays, it is also used to generate non-image data, such as voice and natural language. Typical technologies include BERT (Bidirectional Encoder Representations from Transformers), GPT-3 (Generative Pretrained Transformer-3), and MuseNet. GAN differs from the machine learning-based generative model and the objective function. Training is conducted by two networks: generator and discriminator. The generator converts random noise into a true-to-life image, whereas the discriminator distinguishes whether the input image is real or synthetic. As the training continues, the generator learns more sophisticated synthesis techniques, and the discriminator grows into a more accurate differentiator. GAN has problems, such as mode collapse, training instability, and lack of evaluation matrix, and many researchers have tried to solve these problems. For example, solutions such as one-sided label smoothing, instance normalization, and minibatch discrimination have been proposed. The field of application has also expanded. This paper provides an overview of GAN and application solutions for computer vision and artificial intelligence healthcare field researchers. The structure and principle of operation of GAN, the core models of GAN proposed to date, and the theory of GAN were analyzed. Application examples of GAN such as image classification and regression, image synthesis and inpainting, image-to-image translation, super-resolution and point registration were then presented. The discussion tackled GAN’s problems and solutions, and the future research direction was finally proposed.


2020 ◽  
Vol 1 ◽  
pp. 6
Author(s):  
Alexander Hepburn ◽  
Valero Laparra ◽  
Ryan McConville ◽  
Raul Santos-Rodriguez

In recent years there has been a growing interest in image generation through deep learning. While an important part of the evaluation of the generated images usually involves visual inspection, the inclusion of human perception as a factor in the training process is often overlooked. In this paper we propose an alternative perceptual regulariser for image-to-image translation using conditional generative adversarial networks (cGANs). To do so automatically (avoiding visual inspection), we use the Normalised Laplacian Pyramid Distance (NLPD) to measure the perceptual similarity between the generated image and the original image. The NLPD is based on the principle of normalising the value of coefficients with respect to a local estimate of mean energy at different scales and has already been successfully tested in different experiments involving human perception. We compare this regulariser with the originally proposed L1 distance and note that when using NLPD the generated images contain more realistic values for both local and global contrast.


Complexity ◽  
2018 ◽  
Vol 2018 ◽  
pp. 1-12 ◽  
Author(s):  
Yan Gan ◽  
Junxin Gong ◽  
Mao Ye ◽  
Yang Qian ◽  
Kedi Liu ◽  
...  

Unpaired image translation is a challenging problem in computer vision, while existing generative adversarial networks (GANs) models mainly use the adversarial loss and other constraints to model. But the degree of constraint imposed on the generator and the discriminator is not enough, which results in bad image quality. In addition, we find that the current GANs-based models have not yet been implemented by adding an auxiliary domain, which is used to constrain the generator. To solve the problem mentioned above, we propose a multiscale and multilevel GANs (MMGANs) model for image translation. In this model, we add an auxiliary domain to constrain generator, which combines this auxiliary domain with the original domains for modelling and helps generator learn the detailed content of the image. Then we use multiscale and multilevel feature matching to constrain the discriminator. The purpose is to make the training process as stable as possible. Finally, we conduct experiments on six image translation tasks. The results verify the validity of the proposed model.


2019 ◽  
Vol 2 (93) ◽  
pp. 64-68
Author(s):  
I. Konarieva ◽  
D. Pydorenko ◽  
O. Turuta

The given work considers the existing methods of text compression (finding keywords or creating summary) using RAKE, Lex Rank, Luhn, LSA, Text Rank algorithms; image generation; text-to-image and image-to-image translation including GANs (generative adversarial networks). Different types of GANs were described such as StyleGAN, GauGAN, Pix2Pix, CycleGAN, BigGAN, AttnGAN. This work aims to show ways to create illustrations for the text. First, key information should be obtained from the text. Second, this key information should be transformed into images. There were proposed several ways to transform keywords to images: generating images or selecting them from a dataset with further transforming like generating new images based on selected ow combining selected images e.g. with applying style from one image to another. Based on results, possibilities for further improving the quality of image generation were also planned: combining image generation with selecting images from a dataset, limiting topics of image generation.


Sign in / Sign up

Export Citation Format

Share Document