Dense Model for Automatic Image Description Generation with Game Theoretic Optimization

Sreela S R; Sumam Mary Idicula

doi:10.3390/info10110354

Dense Model for Automatic Image Description Generation with Game Theoretic Optimization

Information ◽

10.3390/info10110354 ◽

2019 ◽

Vol 10 (11) ◽

pp. 354

Author(s):

Sreela S R ◽

Sumam Mary Idicula

Keyword(s):

Neural Networks ◽

Computer Vision ◽

Natural Language ◽

Language Modeling ◽

Beam Search ◽

Image Description ◽

Image Encoding ◽

Proposed Model ◽

Game Theoretic ◽

Evaluation Measure

Due to the rapid growth of deep learning technologies, automatic image description generation is an interesting problem in computer vision and natural language generation. It helps to improve access to photo collections on social media and gives guidance for visually impaired people. Currently, deep neural networks play a vital role in computer vision and natural language processing tasks. The main objective of the work is to generate the grammatically correct description of the image using the semantics of the trained captions. An encoder-decoder framework using the deep neural system is used to implement an image description generation task. The encoder is an image parsing module, and the decoder is a surface realization module. The framework uses Densely connected convolutional neural networks (Densenet) for image encoding and Bidirectional Long Short Term Memory (BLSTM) for language modeling, and the outputs are given to bidirectional LSTM in the caption generator, which is trained to optimize the log-likelihood of the target description of the image. Most of the existing image captioning works use RNN and LSTM for language modeling. RNNs are computationally expensive with limited memory. LSTM checks the inputs in one direction. BLSTM is used in practice, which avoids the problem of RNN and LSTM. In this work, the selection of the best combination of words in caption generation is made using beam search and game theoretic search. The results show the game theoretic search outperforms beam search. The model was evaluated with the standard benchmark dataset Flickr8k. The Bilingual Evaluation Understudy (BLEU) score is taken as the evaluation measure of the system. A new evaluation measure called GCorrectwas used to check the grammatical correctness of the description. The performance of the proposed model achieves greater improvements over previous methods on the Flickr8k dataset. The proposed model produces grammatically correct sentences for images with a GCorrect of 0.040625 and a BLEU score of 69.96%

Download Full-text

Automatic Image Captioning Using Neural Networks

Journal of Innovations in Engineering Education ◽

10.3126/jiee.v3i1.34335 ◽

2020 ◽

Vol 3 (1) ◽

pp. 138-146

Author(s):

Subash Pandey ◽

Rabin Kumar Dhamala ◽

Bikram Karki ◽

Saroj Dahal ◽

Rama Bastola

Keyword(s):

Neural Network ◽

Artificial Intelligence ◽

Neural Networks ◽

Computer Vision ◽

Natural Language ◽

Language Processing ◽

Model Performance ◽

Image Description ◽

Image Captioning ◽

Top Down

Automatically generating a natural language description of an image is a major challenging task in the field of artificial intelligence. Generating description of an image bring together the fields: Natural Language Processing and Computer Vision. There are two types of approaches i.e. top-down and bottom-up. For this paper, we approached top-down that starts from the image and converts it into the word. Image is passed to Convolutional Neural Network (CNN) encoder and the output from it is fed further to Recurrent Neural Network (RNN) decoder that generates meaningful captions. We generated the image description by passing the real time images from the camera of a smartphone as well as tested with the test images from the dataset. To evaluate the model performance, we used BLEU (Bilingual Evaluation Understudy) score and match predicted words to the original caption.

Download Full-text

A Hindi Image Caption Generation Framework Using Deep Learning

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3432246 ◽

2021 ◽

Vol 20 (2) ◽

pp. 1-19

Author(s):

Santosh Kumar Mishra ◽

Rijul Dhir ◽

Sriparna Saha ◽

Pushpak Bhattacharyya

Keyword(s):

Computer Vision ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

English Language ◽

Image Captioning ◽

Textual Description ◽

Proposed Model ◽

Hindi Language ◽

The Given

Image captioning is the process of generating a textual description of an image that aims to describe the salient parts of the given image. It is an important problem, as it involves computer vision and natural language processing, where computer vision is used for understanding images, and natural language processing is used for language modeling. A lot of works have been done for image captioning for the English language. In this article, we have developed a model for image captioning in the Hindi language. Hindi is the official language of India, and it is the fourth most spoken language in the world, spoken in India and South Asia. To the best of our knowledge, this is the first attempt to generate image captions in the Hindi language. A dataset is manually created by translating well known MSCOCO dataset from English to Hindi. Finally, different types of attention-based architectures are developed for image captioning in the Hindi language. These attention mechanisms are new for the Hindi language, as those have never been used for the Hindi language. The obtained results of the proposed model are compared with several baselines in terms of BLEU scores, and the results show that our model performs better than others. Manual evaluation of the obtained captions in terms of adequacy and fluency also reveals the effectiveness of our proposed approach. Availability of resources : The codes of the article are available at https://github.com/santosh1821cs03/Image_Captioning_Hindi_Language ; The dataset will be made available: http://www.iitp.ac.in/∼ai-nlp-ml/resources.html .

Download Full-text

Multimodal Feature Learning for Video Captioning

Mathematical Problems in Engineering ◽

10.1155/2018/3125879 ◽

2018 ◽

Vol 2018 ◽

pp. 1-8 ◽

Cited By ~ 2

Author(s):

Sujin Lee ◽

Incheol Kim

Keyword(s):

Neural Networks ◽

Natural Language ◽

Feature Learning ◽

Visual Features ◽

Semantic Features ◽

Video Captioning ◽

Video Feature ◽

Proposed Model ◽

Generation Network ◽

Benchmark Datasets

Video captioning refers to the task of generating a natural language sentence that explains the content of the input video clips. This study proposes a deep neural network model for effective video captioning. Apart from visual features, the proposed model learns additionally semantic features that describe the video content effectively. In our model, visual features of the input video are extracted using convolutional neural networks such as C3D and ResNet, while semantic features are obtained using recurrent neural networks such as LSTM. In addition, our model includes an attention-based caption generation network to generate the correct natural language captions based on the multimodal video feature sequences. Various experiments, conducted with the two large benchmark datasets, Microsoft Video Description (MSVD) and Microsoft Research Video-to-Text (MSR-VTT), demonstrate the performance of the proposed model.

Download Full-text

Conversation Modeling with Neural Network

Asian Journal of Research in Computer Science ◽

10.9734/ajrcos/2018/v1i324738 ◽

2018 ◽

pp. 1-10

Author(s):

Jivan Y. Patil ◽

Girish P. Potdar

Keyword(s):

Neural Networks ◽

Intelligent System ◽

Large Data ◽

Learning Problems ◽

Beam Search ◽

Training Set ◽

Data Set ◽

Domain Specific ◽

Proposed Model ◽

End To End

The ability to process, understand and interact in natural language carries high importance for building a Intelligent system, as it will greatly affect the way of communicating with the system. Deep Neural Networks (DNNs) have achieved excellent performance for many of machine learning problems and are widely accepted for applications in the field of computer vision and supervised learning. Although DNNs work well with availability of large labeled training set, it cannot be used to map complex structures like sentences end-to-end. Existing approaches for conversational modeling are domain specific and require handcrafted rules. This paper proposes a simple approach based on use of neural networks’ recently proposed sequence to sequence framework. The proposed model generates reply by predicting sentence using chained probability for given sentence(s) in conversation. This model is trained end-to-end on large data set. Proposed approach uses Attention to focus text generation on intent of conversation as well as beam search to generate optimum output with some diversity.Primary findings show that model shows common sense reasoning on movie transcript data set.

Download Full-text

Image Description using Encoder and Decoder LSTM Methods: Some Issues

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.k7729.0991120 ◽

2020 ◽

Vol 9 (11) ◽

pp. 167-172

Keyword(s):

Neural Networks ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Recurrent Neural Networks ◽

Image Mining ◽

Image Description ◽

Other Information ◽

Image Descriptions ◽

Insight Into

Description of images has an important role in image mining. The description of images provides an insight into the location, its surroundings and other information related to it. Different procedures of describing the images exist in literature. However, a well trained description of images is still a tedious task to achieve. Several researchers have come up with solutions to this problem using various techniques. Herein, the concept of LSTM is used in generating a trained description of images. The said process is achieved through encoders and decoders. Encoders use techniques of maxpooling and convolution, while the decoders use the concept of recurrent neural networks. The combined architecture of encoders and decoders result in trained classifiers, which enable reliable description of images. The working has been implemented by considering a sample image. It has been found that slight variations with regard to accuracy, naturalness, missing concepts, deficiency of sufficient semantics and incomplete description of image still exist. Hence, it can be inferred that, with reasonable amount of enhancement in the technique and using the techniques of natural language processing, more accuracy in image descriptions could be achieved.

Download Full-text

Neural Language Modeling for Molecule Generation

10.26434/chemrxiv.14700831 ◽

2021 ◽

Author(s):

Sanjar Adilov

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Deep Learning ◽

Natural Language Processing ◽

Drug Design ◽

Natural Language ◽

Language Processing ◽

De Novo ◽

Language Modeling ◽

Machine Learning Methods

Generative neural networks have shown promising results in <i>de novo</i> drug design. Recent studies suggest that one of the efficient ways to produce novel molecules matching target properties is to model SMILES sequences using deep learning in a way similar to language modeling in natural language processing. In this paper, we present a survey of various machine learning methods for SMILES-based language modeling and propose our benchmarking results on a standardized subset of ChEMBL database.

Download Full-text

Neural Language Modeling for Molecule Generation

10.26434/chemrxiv.14700831.v1 ◽

2021 ◽

Author(s):

Sanjar Adilov

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Deep Learning ◽

Natural Language Processing ◽

Drug Design ◽

Natural Language ◽

Language Processing ◽

De Novo ◽

Language Modeling ◽

Machine Learning Methods

Download Full-text

Deep learning: A beginners guide to start coding neural networks and apply in areas of computer vision and natural language processing

Journal of Computer Science & Systems Biology ◽

10.4172/0974-7230-c1-017 ◽

2018 ◽

Vol 11 ◽

Author(s):

Tanya Dixit ◽

Sriharsha Allenki

Keyword(s):

Neural Networks ◽

Computer Vision ◽

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing

Download Full-text

Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures (Extended Abstract)

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/704 ◽

2017 ◽

Cited By ~ 7

Author(s):

Raffaella Bernardi ◽

Ruket Cakici ◽

Desmond Elliott ◽

Aykut Erdem ◽

Erkut Erdem ◽

...

Keyword(s):

Computer Vision ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Challenging Problem ◽

Image Description ◽

Future Directions ◽

Evaluation Measures ◽

Advantages And Disadvantages

Automatic image description generation is a challenging problem that has recently received a large amount of interest from the computer vision and natural language processing communities. In this survey, we classify the known approaches based on how they conceptualise this problem and provide a review of existing models, highlighting their advantages and disadvantages. Moreover, we give an overview of the benchmark image-text datasets and the evaluation measures that have been developed to assess the quality of machine-generated descriptions. Finally we explore future directions in the area of automatic image description.

Download Full-text

Physics Inspired Deep Neural Networks for Top Quark Reconstruction

EPJ Web of Conferences ◽

10.1051/epjconf/202024506029 ◽

2020 ◽

Vol 245 ◽

pp. 06029

Author(s):

Kevin Greif ◽

Kevin Lannon

Keyword(s):

Neural Network ◽

Neural Networks ◽

Computer Vision ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Top Quark ◽

Deep Neural Networks ◽

Great Success ◽

Fully Connected

Deep neural networks (DNNs) have been applied to the fields of computer vision and natural language processing with great success in recent years. The success of these applications has hinged on the development of specialized DNN architectures that take advantage of specific characteristics of the problem to be solved, namely convolutional neural networks for computer vision and recurrent neural networks for natural language processing. This research explores whether a neural network architecture specific to the task of identifying t → Wb decays in particle collision data yields better performance than a generic, fully-connected DNN. Although applied here to resolved top quark decays, this approach is inspired by an DNN technique for tagging boosted top quarks, which consists of defining custom neural network layers known as the combination and Lorentz layers. These layers encode knowledge of relativistic kinematics applied to combinations of particles, and the output of these specialized layers can then be fed into a fully connected neural network to learn tasks such as classification. This research compares the performance of these physics inspired networks to that of a generic, fully-connected DNN, to see if there is any advantage in terms of classification performance, size of the network, or ease of training.

Download Full-text