Image Caption Generation Model Based on Object Detector

Mathematical Problems of Computer Science ◽

10.51408/1963-0016 ◽

2018 ◽

pp. 5-14

Author(s):

Aghasi Poghosyan ◽

Hakob Sarukhanyan

Keyword(s):

Natural Language ◽

Object Detection ◽

Information Extraction ◽

Semantic Information ◽

Generation Model ◽

Single Model ◽

Detection Model ◽

Generator Performance ◽

Image Caption Generation ◽

Image Caption

Automated semantic information extraction from the image is a difficult task. There are works, which can extract image caption or object names and their coordinates. This work presents object detection and automated caption generation implemented via a single model. We have built an image caption generation model on top of object detection model. We have added extra layers on object detector to increase caption generator performance. We have developed a single model that can detect objects, localize them and generate image caption via natural language.

Download Full-text

Middle-Level Attribute-Based Language Retouching for Image Caption Generation

Applied Sciences ◽

10.3390/app8101850 ◽

2018 ◽

Vol 8 (10) ◽

pp. 1850 ◽

Cited By ~ 1

Author(s):

Zhibin Guan ◽

Kang Liu ◽

Yan Ma ◽

Xu Qian ◽

Tongkai Ji

Keyword(s):

Natural Language ◽

Language Processing ◽

Middle Level ◽

Generation Model ◽

Image Description ◽

Image Captioning ◽

Benchmark Datasets ◽

Intermediate Image ◽

Image Caption Generation ◽

Image Caption

Image caption generation is attractive research which focuses on generating natural language sentences to describe the visual content of a given image. It is an interdisciplinary subject combining computer vision (CV) and natural language processing (NLP). The existing image captioning methods are mainly focused on generating the final image caption directly, which may lose significant identification information of objects contained in the raw image. Therefore, we propose a new middle-level attribute-based language retouching (MLALR) method to solve this problem. Our proposed MLALR method uses the middle-level attributes predicted from the object regions to retouch the intermediate image description, which is generated by our language generation model. The advantage of our MLALR method is that it can correct descriptive errors in the intermediate image description and make the final image caption more accurate. Moreover, evaluation using benchmark datasets—MSCOCO, Flickr8K, and Flickr30K—validated the impressive performance of our MLALR method with evaluation metrics—BLEU, METEOR, ROUGE-L, CIDEr, and SPICE.

Download Full-text

Natural Language and Semantic Information Extraction

Theories of Geographic Concepts ◽

10.1201/9781420004670-18 ◽

2007 ◽

pp. 189-200

Keyword(s):

Natural Language ◽

Information Extraction ◽

Semantic Information

Download Full-text

Natural Language and Semantic Information Extraction

Theories of Geographic Concepts ◽

10.1201/9781420004670.ch12 ◽

2007 ◽

pp. 171-181

Keyword(s):

Natural Language ◽

Information Extraction ◽

Semantic Information

Download Full-text

Syntactic and semantic information extraction from NPP procedures utilizing natural language processing integrated with rules

Nuclear Engineering and Technology ◽

10.1016/j.net.2020.08.010 ◽

2020 ◽

Author(s):

Yongsun Choi ◽

Minh Duc Nguyen ◽

Thomas N. Kerr

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Information Extraction ◽

Language Processing ◽

Semantic Information

Download Full-text

An Overview of Image Caption Generation Methods

Computational Intelligence and Neuroscience ◽

10.1155/2020/3062706 ◽

2020 ◽

Vol 2020 ◽

pp. 1-13 ◽

Cited By ~ 2

Author(s):

Haoran Wang ◽

Yue Zhang ◽

Xiaosheng Yu

Keyword(s):

Artificial Intelligence ◽

Computer Vision ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Rapid Development ◽

Evaluation Criteria ◽

Arduous Task ◽

Image Caption Generation ◽

Image Caption

In recent years, with the rapid development of artificial intelligence, image caption has gradually attracted the attention of many researchers in the field of artificial intelligence and has become an interesting and arduous task. Image caption, automatically generating natural language descriptions according to the content observed in an image, is an important part of scene understanding, which combines the knowledge of computer vision and natural language processing. The application of image caption is extensive and significant, for example, the realization of human-computer interaction. This paper summarizes the related methods and focuses on the attention mechanism, which plays an important role in computer vision and is recently widely used in image caption generation tasks. Furthermore, the advantages and the shortcomings of these methods are discussed, providing the commonly used datasets and evaluation criteria in this field. Finally, this paper highlights some open challenges in the image caption task.

Download Full-text

Natural Language and Semantic Information Extraction

Theories of Geographic Concepts ◽

10.1201/9780849330896.ch12 ◽

2007 ◽

pp. 171-181

Keyword(s):

Natural Language ◽

Information Extraction ◽

Semantic Information

Download Full-text

Deep learning for ultrasound image caption generation based on object detection

Neurocomputing ◽

10.1016/j.neucom.2018.11.114 ◽

2020 ◽

Vol 392 ◽

pp. 132-141 ◽

Cited By ~ 3

Author(s):

Xianhua Zeng ◽

Li Wen ◽

Banggui Liu ◽

Xiaojun Qi

Keyword(s):

Deep Learning ◽

Object Detection ◽

Ultrasound Image ◽

Image Caption Generation ◽

Image Caption

Download Full-text

An Image Caption Generation Model Based on Visual Concept Attention and Residual Connection

Journal of Computer-Aided Design & Computer Graphics ◽

10.3724/sp.j.1089.2018.16825 ◽

2018 ◽

Vol 30 (8) ◽

pp. 1536

Author(s):

Zhiping Zhou ◽

Wei Zhang

Keyword(s):

Generation Model ◽

Visual Concept ◽

Model Based ◽

Image Caption Generation ◽

Image Caption

Download Full-text

Augment BERT with average pooling layer for Chinese summary generation

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-211229 ◽

2021 ◽

pp. 1-10

Author(s):

Shuai Zhao ◽

Fucheng You ◽

Wen Chang ◽

Tianyu Zhang ◽

Man Hu

Keyword(s):

Experimental Data ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Semantic Information ◽

Chinese Language ◽

Language Model ◽

Fine Tuning ◽

Generation Model ◽

Expected Effect

The BERT pre-trained language model has achieved good results in various subtasks of natural language processing, but its performance in generating Chinese summaries is not ideal. The most intuitive reason is that the BERT model is based on character-level composition, while the Chinese language is mostly in the form of phrases. Directly fine-tuning the BERT model cannot achieve the expected effect. This paper proposes a novel summary generation model with BERT augmented by the pooling layer. In our model, we perform an average pooling operation on token embedding to improve the model’s ability to capture phrase-level semantic information. We use LCSTS and NLPCC2017 to verify our proposed method. Experimental data shows that the average pooling model’s introduction can effectively improve the generated summary quality. Furthermore, different data needs to be set with varying pooling kernel sizes to achieve the best results through comparative analysis. In addition, our proposed method has strong generalizability. It can be applied not only to the task of generating summaries, but also to other natural language processing tasks.

Download Full-text

Image Caption Generation with Local Semantic Information and Global Information

2019 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI) ◽

10.1109/smartworld-uic-atc-scalcom-iop-sci.2019.00152 ◽

2019 ◽

Author(s):

Xing Liu ◽

Weibin Liu ◽

Weiwei Xing

Keyword(s):

Semantic Information ◽

Global Information ◽

Image Caption Generation ◽

Image Caption

Download Full-text