Parallel Image Captioning Using 2D Masked Convolution

Chanrith Poleak; Jangwoo Kwon

doi:10.3390/app9091871

Parallel Image Captioning Using 2D Masked Convolution

Applied Sciences ◽

10.3390/app9091871 ◽

2019 ◽

Vol 9 (9) ◽

pp. 1871 ◽

Cited By ~ 1

Author(s):

Chanrith Poleak ◽

Jangwoo Kwon

Keyword(s):

Language Processing ◽

Short Term Memory ◽

State Of The Art ◽

Graphics Processing Unit ◽

Language Model ◽

Processing Unit ◽

Image Captioning ◽

Convolutional Network ◽

Training Time ◽

Parallel Graphics

Automatically generating a novel description of an image is a challenging and important problem that brings together advanced research in both computer vision and natural language processing. In recent years, image captioning has significantly improved its performance by using long short-term memory (LSTM) as a decoder for the language model. However, despite this improvement, LSTM itself has its own shortcomings as a model because the structure is complicated and its nature is inherently sequential. This paper proposes a model using a simple convolutional network for both encoder and decoder functions of image captioning, instead of the current state-of-the-art approach. Our experiment with this model on a Microsoft Common Objects in Context (MSCOCO) captioning dataset yielded results that are competitive with the state-of-the-art image captioning model across different evaluation metrics, while having a much simpler model and enabling parallel graphics processing unit (GPU) computation during training, resulting in a faster training time.

Download Full-text

Fast Parallel Markov Clustering in Bioinformatics Using Massively Parallel Graphics Processing Unit Computing

2010 Ninth International Workshop on Parallel and Distributed Methods in Verification, and Second International Workshop on High Performance Computational Systems Biology ◽

10.1109/pdmc-hibi.2010.23 ◽

2010 ◽

Cited By ~ 5

Author(s):

Alhadi Bustamam ◽

Kevin Burrage ◽

Nicholas A. Hamilton

Keyword(s):

Graphics Processing Unit ◽

Massively Parallel ◽

Processing Unit ◽

Markov Clustering ◽

Parallel Graphics ◽

Graphics Processing

Download Full-text

Improved Framework using Rider Optimization Algorithm for Precise Image Caption Generation

International Journal of Image and Graphics ◽

10.1142/s0219467822500218 ◽

2021 ◽

pp. 2250021

Author(s):

Chaitrali Prasanna Chaudhari ◽

Satish Devane

Keyword(s):

Language Processing ◽

Optimization Algorithm ◽

Short Term Memory ◽

Activation Function ◽

Input Image ◽

Image Captioning ◽

Word Generation ◽

Adopted Model ◽

Conventional Models ◽

Image Caption

“Image Captioning is the process of generating a textual description of an image”. It deploys both computer vision and natural language processing for caption generation. However, the majority of the image captioning systems offer unclear depictions regarding the objects like “man”, “woman”, “group of people”, “building”, etc. Hence, this paper intends to develop an intelligent-based image captioning model. The adopted model comprises of few steps like word generation, sentence formation, and caption generation. Initially, the input image is subjected to the Deep learning classifier called Convolutional Neural Network (CNN). Since the classifier is already trained in the relevant words that are related to all images, it can easily classify the associated words of the given image. Further, a set of sentences is formed with the generated words using Long-Short Term Memory (LSTM) model. The likelihood of the formed sentences is computed using the Maximum Likelihood (ML) function, and the sentences with higher probability are taken, which is further used for generating the visual representation of the scene in terms of image caption. As a major novelty, this paper aims to enhance the performance of CNN by optimally tuning its weight and activation function. This paper introduces a new enhanced optimization algorithm Rider with Randomized Bypass and Over-taker update (RR-BOU) for this optimal selection. In the proposed RR-BOU is the enhanced version of the Rider Optimization Algorithm (ROA). Finally, the performance of the proposed captioning model is compared over other conventional models with respect to statistical analysis.

Download Full-text

Segment convolutional neural networks (Seg-CNNs) for classifying relations in clinical notes

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocx090 ◽

2017 ◽

Vol 25 (1) ◽

pp. 93-98 ◽

Cited By ~ 31

Author(s):

Yuan Luo ◽

Yu Cheng ◽

Özlem Uzuner ◽

Peter Szolovits ◽

Justin Starren

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

State Of The Art ◽

Graphics Processing Unit ◽

Medical Problem ◽

Feature Engineering ◽

Processing Unit ◽

Clinical Notes ◽

Overall Evaluation ◽

Relation Classification

Abstract We propose Segment Convolutional Neural Networks (Seg-CNNs) for classifying relations from clinical notes. Seg-CNNs use only word-embedding features without manual feature engineering. Unlike typical CNN models, relations between 2 concepts are identified by simultaneously learning separate representations for text segments in a sentence: preceding, concept1, middle, concept2, and succeeding. We evaluate Seg-CNN on the i2b2/VA relation classification challenge dataset. We show that Seg-CNN achieves a state-of-the-art micro-average F-measure of 0.742 for overall evaluation, 0.686 for classifying medical problem–treatment relations, 0.820 for medical problem–test relations, and 0.702 for medical problem–medical problem relations. We demonstrate the benefits of learning segment-level representations. We show that medical domain word embeddings help improve relation classification. Seg-CNNs can be trained quickly for the i2b2/VA dataset on a graphics processing unit (GPU) platform. These results support the use of CNNs computed over segments of text for classifying medical relations, as they show state-of-the-art performance while requiring no manual feature engineering.

Download Full-text

An Optimized Abstractive Text Summarization Model Using Peephole Convolutional LSTM

Symmetry ◽

10.3390/sym11101290 ◽

2019 ◽

Vol 11 (10) ◽

pp. 1290 ◽

Cited By ~ 2

Author(s):

Rahman ◽

Siddiqui

Keyword(s):

Language Processing ◽

Short Term Memory ◽

State Of The Art ◽

Text Summarization ◽

Short Term ◽

Term Memory ◽

Semantic Coherence ◽

Long Short Term Memory ◽

Central Composite ◽

Convolutional Lstm

Abstractive text summarization that generates a summary by paraphrasing a long text remains an open significant problem for natural language processing. In this paper, we present an abstractive text summarization model, multi-layered attentional peephole convolutional LSTM (long short-term memory) (MAPCoL) that automatically generates a summary from a long text. We optimize parameters of MAPCoL using central composite design (CCD) in combination with the response surface methodology (RSM), which gives the highest accuracy in terms of summary generation. We record the accuracy of our model (MAPCoL) on a CNN/DailyMail dataset. We perform a comparative analysis of the accuracy of MAPCoL with that of the state-of-the-art models in different experimental settings. The MAPCoL also outperforms the traditional LSTM-based models in respect of semantic coherence in the output summary.

Download Full-text

Accelerating Monte Carlo Simulation for Radiotherapy Dose Calculation using a Massively Parallel Graphics Processing Unit

International Journal of Radiation Oncology*Biology*Physics ◽

10.1016/j.ijrobp.2010.07.1863 ◽

2010 ◽

Vol 78 (3) ◽

pp. S804-S805

Author(s):

Y. Zhuge ◽

H. Xie ◽

R.W. Miller

Keyword(s):

Monte Carlo Simulation ◽

Monte Carlo ◽

Graphics Processing Unit ◽

Dose Calculation ◽

Massively Parallel ◽

Processing Unit ◽

Parallel Graphics ◽

Radiotherapy Dose ◽

Graphics Processing

Download Full-text

Accelerating Event Detection with DGCNN and FPGAs

Electronics ◽

10.3390/electronics9101666 ◽

2020 ◽

Vol 9 (10) ◽

pp. 1666

Author(s):

Zhe Han ◽

Jingfei Jiang ◽

Linbo Qiao ◽

Yong Dou ◽

Jinwei Xu ◽

...

Keyword(s):

Language Processing ◽

Event Detection ◽

Graphics Processing Unit ◽

Network Size ◽

Processing Unit ◽

Sigmoid Function ◽

Pipelined Architecture ◽

Proposed Model ◽

Field Programmable ◽

Graphics Processing

Recently, Deep Neural Networks (DNNs) have been widely used in natural language processing. However, DNNs are often computation-intensive and memory-expensive. Therefore, deploying DNNs in the real world is very difficult. In order to solve this problem, we proposed a network model based on the dilate gated convolutional neural network, which is very hardware-friendly. We further expanded the word representations and depth of the network to improve the performance of the model. We replaced the Sigmoid function to make it more friendly for hardware computation without loss, and we quantized the network weights and activations to compress the network size. We then proposed the first FPGA (Field Programmable Gate Array)-based event detection accelerator based on the proposed model. The accelerator significantly reduced the latency with the fully pipelined architecture. We implemented the accelerator on the Xilinx XCKU115 FPGA. The experimental results show that our model obtains the highest F1-score of 84.6% in the ACE 2005 corpus. Meanwhile, the accelerator achieved 95.2 giga operations (GOP)/s and 13.4 GOPS/W in performance and energy efficiency, which is 17/158 times higher than the Graphics Processing Unit (GPU).

Download Full-text

Image Caption Generation and Comprehensive Comparison of Image Encoders

10.54216/fpa.040202 ◽

2021 ◽

pp. 42-55

Author(s):

Shitiz Gupta ◽

◽

...

Keyword(s):

Language Processing ◽

State Of The Art ◽

Image Feature ◽

Image Captioning ◽

Interactive Machine Learning ◽

Learning Techniques ◽

Comprehensive Comparison ◽

Image Caption Generation ◽

Image Caption ◽

Made In

Image caption generation is a stimulating multimodal task. Substantial advancements have been made in thefield of deep learning notably in computer vision and natural language processing. Yet, human-generated captions are still considered better, which makes it a challenging application for interactive machine learning. In this paper, we aim to compare different transfer learning techniques and develop a novel architecture to improve image captioning accuracy. We compute image feature vectors using different state-of-the-art transferlearning models which are fed into an Encoder-Decoder network based on Stacked LSTMs with soft attention,along with embedded text to generate high accuracy captions. We have compared these models on severalbenchmark datasets based on different evaluation metrics like BLEU and METEOR.

Download Full-text

A Tweet Sentiment Classification Approach Using a Hybrid Stacked Ensemble Technique

Information ◽

10.3390/info12090374 ◽

2021 ◽

Vol 12 (9) ◽

pp. 374

Author(s):

Babacar Gaye ◽

Dezheng Zhang ◽

Aziguli Wulamu

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Deep Learning ◽

Sentiment Analysis ◽

Language Processing ◽

Short Term Memory ◽

State Of The Art ◽

Accuracy Score ◽

Learning Models ◽

Proposed Model

With the extensive availability of social media platforms, Twitter has become a significant tool for the acquisition of peoples’ views, opinions, attitudes, and emotions towards certain entities. Within this frame of reference, sentiment analysis of tweets has become one of the most fascinating research areas in the field of natural language processing. A variety of techniques have been devised for sentiment analysis, but there is still room for improvement where the accuracy and efficacy of the system are concerned. This study proposes a novel approach that exploits the advantages of the lexical dictionary, machine learning, and deep learning classifiers. We classified the tweets based on the sentiments extracted by TextBlob using a stacked ensemble of three long short-term memory (LSTM) as base classifiers and logistic regression (LR) as a meta classifier. The proposed model proved to be effective and time-saving since it does not require feature extraction, as LSTM extracts features without any human intervention. We also compared our proposed approach with conventional machine learning models such as logistic regression, AdaBoost, and random forest. We also included state-of-the-art deep learning models in comparison with the proposed model. Experiments were conducted on the sentiment140 dataset and were evaluated in terms of accuracy, precision, recall, and F1 Score. Empirical results showed that our proposed approach manifested state-of-the-art results by achieving an accuracy score of 99%.

Download Full-text

Transformer-Based Decoder Designs for Semantic Segmentation on Remotely Sensed Images

Remote Sensing ◽

10.3390/rs13245100 ◽

2021 ◽

Vol 13 (24) ◽

pp. 5100

Author(s):

Teerapong Panboonyuen ◽

Kulsawasd Jitkajornwanich ◽

Siam Lawawirojwong ◽

Panu Srestasathiern ◽

Peerapon Vateekul

Keyword(s):

Image Processing ◽

Deep Learning ◽

Natural Language Processing ◽

Language Processing ◽

State Of The Art ◽

Semantic Segmentation ◽

Landsat 8 ◽

Convolutional Network ◽

Image Labeling ◽

Feature Pyramid

Transformers have demonstrated remarkable accomplishments in several natural language processing (NLP) tasks as well as image processing tasks. Herein, we present a deep-learning (DL) model that is capable of improving the semantic segmentation network in two ways. First, utilizing the pre-training Swin Transformer (SwinTF) under Vision Transformer (ViT) as a backbone, the model weights downstream tasks by joining task layers upon the pretrained encoder. Secondly, decoder designs are applied to our DL network with three decoder designs, U-Net, pyramid scene parsing (PSP) network, and feature pyramid network (FPN), to perform pixel-level segmentation. The results are compared with other image labeling state of the art (SOTA) methods, such as global convolutional network (GCN) and ViT. Extensive experiments show that our Swin Transformer (SwinTF) with decoder designs reached a new state of the art on the Thailand Isan Landsat-8 corpus (89.8% F1 score), Thailand North Landsat-8 corpus (63.12% F1 score), and competitive results on ISPRS Vaihingen. Moreover, both our best-proposed methods (SwinTF-PSP and SwinTF-FPN) even outperformed SwinTF with supervised pre-training ViT on the ImageNet-1K in the Thailand, Landsat-8, and ISPRS Vaihingen corpora.

Download Full-text

State of the Art and Development of Wearable Computer Graphics Processing Unit

Open Journal of Circuits and Systems ◽

10.12677/ojcs.2015.43007 ◽

2015 ◽

Vol 04 (03) ◽

pp. 47-54

Author(s):

继业焦

Keyword(s):

Computer Graphics ◽

State Of The Art ◽

Graphics Processing Unit ◽

Processing Unit ◽

Wearable Computer ◽

Graphics Processing

Download Full-text