An Ensemble of Generation- and Retrieval-Based Image Captioning With Dual Generator Generative Adversarial Network

2020 ◽  
Vol 29 ◽  
pp. 9627-9640
Author(s):  
Min Yang ◽  
Junhao Liu ◽  
Ying Shen ◽  
Zhou Zhao ◽  
Xiaojun Chen ◽  
...  
2020 ◽  
Vol 28 (5) ◽  
pp. 975-988
Author(s):  
Sivamurugan Vellakani ◽  
Indumathi Pushbam

Human eye is affected by the different eye diseases including choroidal neovascularization (CNV), diabetic macular edema (DME) and age-related macular degeneration (AMD). This work aims to design an artificial intelligence (AI) based clinical decision support system for eye disease detection and classification to assist the ophthalmologists more effectively detecting and classifying CNV, DME and drusen by using the Optical Coherence Tomography (OCT) images depicting different tissues. The methodology used for designing this system involves different deep learning convolutional neural network (CNN) models and long short-term memory networks (LSTM). The best image captioning model is selected after performance analysis by comparing nine different image captioning systems for assisting ophthalmologists to detect and classify eye diseases. The quantitative data analysis results obtained for the image captioning models designed using DenseNet201 with LSTM have superior performance in terms of overall accuracy of 0.969, positive predictive value of 0.972 and true-positive rate of 0.969using OCT images enhanced by the generative adversarial network (GAN). The corresponding performance values for the Xception with LSTM image captioning models are 0.969, 0.969 and 0.938, respectively. Thus, these two models yield superior performance and have potential to assist ophthalmologists in making optimal diagnostic decision.


Author(s):  
Shiyang Yan ◽  
Jun Xu ◽  
Yuai Liu ◽  
Lin Xu

Person re-identification (re-ID) aims to recognize a person-of-interest across different cameras with notable appearance variance. Existing research works focused on the capability and robustness of visual representation. In this paper, instead, we propose a novel hierarchical offshoot recurrent network (HorNet) for improving person re-ID via image captioning. Image captions are semantically richer and more consistent than visual attributes, which could significantly alleviate the variance. We use the similarity preserving generative adversarial network (SPGAN) and an image captioner to fulfill domain transfer and language descriptions generation. Then the proposed HorNet can learn the visual and language representation from both the images and captions jointly, and thus enhance the performance of person re-ID. Extensive experiments are conducted on several benchmark datasets with or without image captions, i.e., CUHK03, Market-1501, and Duke-MTMC, demonstrating the superiority of the proposed method. Our method can generate and extract meaningful image captions while achieving state-of-the-art performance.


2020 ◽  
Vol 387 ◽  
pp. 91-99 ◽  
Author(s):  
Yiwei Wei ◽  
Leiquan Wang ◽  
Haiwen Cao ◽  
Mingwen Shao ◽  
Chunlei Wu

2020 ◽  
Vol 417 ◽  
pp. 419-431
Author(s):  
Shan Cao ◽  
Gaoyun An ◽  
Zhenxing Zheng ◽  
Qiuqi Ruan

2020 ◽  
Vol 34 (07) ◽  
pp. 11588-11595 ◽  
Author(s):  
Junhao Liu ◽  
Kai Wang ◽  
Chunpu Xu ◽  
Zhou Zhao ◽  
Ruifeng Xu ◽  
...  

Image captioning is usually built on either generation-based or retrieval-based approaches. Both ways have certain strengths but suffer from their own limitations. In this paper, we propose an Interactive Dual Generative Adversarial Network (IDGAN) for image captioning, which mutually combines the retrieval-based and generation-based methods to learn a better image captioning ensemble. IDGAN consists of two generators and two discriminators, where the generation- and retrieval-based generators mutually benefit from each other's complementary targets that are learned from two dual adversarial discriminators. Specifically, the generation- and retrieval-based generators provide improved synthetic and retrieved candidate captions with informative feedback signals from the two respective discriminators that are trained to distinguish the generated captions from the true captions and assign top rankings to true captions respectively, thus featuring the merits of both retrieval-based and generation-based approaches. Extensive experiments on MSCOCO dataset demonstrate that the proposed IDGAN model significantly outperforms the compared methods for image captioning.


2020 ◽  
Vol 2020 ◽  
pp. 1-10
Author(s):  
Li Li ◽  
Zijia Fan ◽  
Mingyang Zhao ◽  
Xinlei Wang ◽  
Zhongyang Wang ◽  
...  

Since the underwater image is not clear and difficult to recognize, it is necessary to obtain a clear image with the super-resolution (SR) method to further study underwater images. The obtained images with conventional underwater image super-resolution methods lack detailed information, which results in errors in subsequent recognition and other processes. Therefore, we propose an image sequence generative adversarial network (ISGAN) method for super-resolution based on underwater image sequences collected by multifocus from the same angle, which can obtain more details and improve the resolution of the image. At the same time, a dual generator method is used in order to optimize the network architecture and improve the stability of the generator. The preprocessed images are, respectively, passed through the dual generator, one of which is used as the main generator to generate the SR image of sequence images, and the other is used as the auxiliary generator to prevent the training from crashing or generating redundant details. Experimental results show that the proposed method can be improved on both peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) compared to the traditional GAN method in underwater image SR.


Sign in / Sign up

Export Citation Format

Share Document