An Ensemble of Generation- and Retrieval-Based Image Captioning With Dual Generator Generative Adversarial Network

Human eye is affected by the different eye diseases including choroidal neovascularization (CNV), diabetic macular edema (DME) and age-related macular degeneration (AMD). This work aims to design an artificial intelligence (AI) based clinical decision support system for eye disease detection and classification to assist the ophthalmologists more effectively detecting and classifying CNV, DME and drusen by using the Optical Coherence Tomography (OCT) images depicting different tissues. The methodology used for designing this system involves different deep learning convolutional neural network (CNN) models and long short-term memory networks (LSTM). The best image captioning model is selected after performance analysis by comparing nine different image captioning systems for assisting ophthalmologists to detect and classify eye diseases. The quantitative data analysis results obtained for the image captioning models designed using DenseNet201 with LSTM have superior performance in terms of overall accuracy of 0.969, positive predictive value of 0.972 and true-positive rate of 0.969using OCT images enhanced by the generative adversarial network (GAN). The corresponding performance values for the Xception with LSTM image captioning models are 0.969, 0.969 and 0.938, respectively. Thus, these two models yield superior performance and have potential to assist ophthalmologists in making optimal diagnostic decision.

Download Full-text

HorNet: A Hierarchical Offshoot Recurrent Network for Improving Person Re-ID via Image Captioning

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/742 ◽

2019 ◽

Cited By ~ 1

Author(s):

Shiyang Yan ◽

Jun Xu ◽

Yuai Liu ◽

Lin Xu

Keyword(s):

State Of The Art ◽

Recurrent Network ◽

Image Captioning ◽

Generative Adversarial Network ◽

Visual Attributes ◽

Adversarial Network ◽

Language Representation ◽

Benchmark Datasets ◽

Similarity Preserving ◽

Domain Transfer

Person re-identification (re-ID) aims to recognize a person-of-interest across different cameras with notable appearance variance. Existing research works focused on the capability and robustness of visual representation. In this paper, instead, we propose a novel hierarchical offshoot recurrent network (HorNet) for improving person re-ID via image captioning. Image captions are semantically richer and more consistent than visual attributes, which could significantly alleviate the variance. We use the similarity preserving generative adversarial network (SPGAN) and an image captioner to fulfill domain transfer and language descriptions generation. Then the proposed HorNet can learn the visual and language representation from both the images and captions jointly, and thus enhance the performance of person re-ID. Extensive experiments are conducted on several benchmark datasets with or without image captions, i.e., CUHK03, Market-1501, and Duke-MTMC, demonstrating the superiority of the proposed method. Our method can generate and extract meaningful image captions while achieving state-of-the-art performance.

Download Full-text

Multi-Attention Generative Adversarial Network for image captioning

Neurocomputing ◽

10.1016/j.neucom.2019.12.073 ◽

2020 ◽

Vol 387 ◽

pp. 91-99 ◽

Cited By ~ 1

Author(s):

Yiwei Wei ◽

Leiquan Wang ◽

Haiwen Cao ◽

Mingwen Shao ◽

Chunlei Wu

Keyword(s):

Image Captioning ◽

Generative Adversarial Network ◽

Adversarial Network

Download Full-text

Interactions Guided Generative Adversarial Network for unsupervised image captioning

Neurocomputing ◽

10.1016/j.neucom.2020.08.019 ◽

2020 ◽

Vol 417 ◽

pp. 419-431

Author(s):

Shan Cao ◽

Gaoyun An ◽

Zhenxing Zheng ◽

Qiuqi Ruan

Keyword(s):

Image Captioning ◽

Generative Adversarial Network ◽

Adversarial Network

Download Full-text

Interactive Dual Generative Adversarial Networks for Image Captioning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6826 ◽

2020 ◽

Vol 34 (07) ◽

pp. 11588-11595 ◽

Cited By ~ 2

Author(s):

Junhao Liu ◽

Kai Wang ◽

Chunpu Xu ◽

Zhou Zhao ◽

Ruifeng Xu ◽

...

Keyword(s):

Generative Adversarial Networks ◽

Image Captioning ◽

Generative Adversarial Network ◽

Informative Feedback ◽

Adversarial Network ◽

Adversarial Networks

Image captioning is usually built on either generation-based or retrieval-based approaches. Both ways have certain strengths but suffer from their own limitations. In this paper, we propose an Interactive Dual Generative Adversarial Network (IDGAN) for image captioning, which mutually combines the retrieval-based and generation-based methods to learn a better image captioning ensemble. IDGAN consists of two generators and two discriminators, where the generation- and retrieval-based generators mutually benefit from each other's complementary targets that are learned from two dual adversarial discriminators. Specifically, the generation- and retrieval-based generators provide improved synthetic and retrieved candidate captions with informative feedback signals from the two respective discriminators that are trained to distinguish the generated captions from the true captions and assign top rankings to true captions respectively, thus featuring the merits of both retrieval-based and generation-based approaches. Extensive experiments on MSCOCO dataset demonstrate that the proposed IDGAN model significantly outperforms the compared methods for image captioning.

Download Full-text

Image Captioning with Generative Adversarial Network

2019 International Conference on Computational Science and Computational Intelligence (CSCI) ◽

10.1109/csci49370.2019.00055 ◽

2019 ◽

Cited By ~ 1

Author(s):

Soheyla Amirian ◽

Khaled Rasheed ◽

Thiab R. Taha ◽

Hamid R. Arabnia

Keyword(s):

Image Captioning ◽

Generative Adversarial Network ◽

Adversarial Network

Download Full-text

Super-Resolution Reconstruction of Underwater Image Based on Image Sequence Generative Adversarial Network

Mathematical Problems in Engineering ◽

10.1155/2020/8472875 ◽

2020 ◽

Vol 2020 ◽

pp. 1-10

Author(s):

Li Li ◽

Zijia Fan ◽

Mingyang Zhao ◽

Xinlei Wang ◽

Zhongyang Wang ◽

...

Keyword(s):

Network Architecture ◽

Signal To Noise Ratio ◽

Super Resolution ◽

Structural Similarity ◽

Image Sequence ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Underwater Image ◽

Dual Generator ◽

The Stability

Since the underwater image is not clear and difficult to recognize, it is necessary to obtain a clear image with the super-resolution (SR) method to further study underwater images. The obtained images with conventional underwater image super-resolution methods lack detailed information, which results in errors in subsequent recognition and other processes. Therefore, we propose an image sequence generative adversarial network (ISGAN) method for super-resolution based on underwater image sequences collected by multifocus from the same angle, which can obtain more details and improve the resolution of the image. At the same time, a dual generator method is used in order to optimize the network architecture and improve the stability of the generator. The preprocessed images are, respectively, passed through the dual generator, one of which is used as the main generator to generate the SR image of sequence images, and the other is used as the auxiliary generator to prevent the training from crashing or generating redundant details. Experimental results show that the proposed method can be improved on both peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) compared to the traditional GAN method in underwater image SR.

Download Full-text