Image Captioning Using Motion-CNN with Object Detection

Kiyohiko Iwamura; Jun Younes Louhi Kasahara; Alessandro Moro; Atsushi Yamashita; Hajime Asama

doi:10.3390/s21041270

Image Captioning Using Motion-CNN with Object Detection

Sensors ◽

10.3390/s21041270 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1270

Author(s):

Kiyohiko Iwamura ◽

Jun Younes Louhi Kasahara ◽

Alessandro Moro ◽

Atsushi Yamashita ◽

Hajime Asama

Keyword(s):

Visually Impaired ◽

Image Features ◽

The Internet ◽

Image Captioning ◽

Visually Impaired People ◽

Proposed Model ◽

Impaired People ◽

Motion Features ◽

Image Caption Generation ◽

Image Caption

Automatic image captioning has many important applications, such as the depiction of visual contents for visually impaired people or the indexing of images on the internet. Recently, deep learning-based image captioning models have been researched extensively. For caption generation, they learn the relation between image features and words included in the captions. However, image features might not be relevant for certain words such as verbs. Therefore, our earlier reported method included the use of motion features along with image features for generating captions including verbs. However, all the motion features were used. Since not all motion features contributed positively to the captioning process, unnecessary motion features decreased the captioning accuracy. As described herein, we use experiments with motion features for thorough analysis of the reasons for the decline in accuracy. We propose a novel, end-to-end trainable method for image caption generation that alleviates the decreased accuracy of caption generation. Our proposed model was evaluated using three datasets: MSR-VTT2016-Image, MSCOCO, and several copyright-free images. Results demonstrate that our proposed method improves caption generation performance.

Download Full-text

Reading Assistance for Visually Impaired People Using TTL Serial Camera With Voice

Advances in Healthcare Information Systems and Administration - Contemporary Applications of Mobile Computing in Healthcare Settings ◽

10.4018/978-1-5225-5036-5.ch010 ◽

2018 ◽

pp. 170-180

Author(s):

Sruthi M. ◽

Rajasekaran R.

Keyword(s):

Internet Of Things ◽

Visually Impaired ◽

People With Disabilities ◽

The Internet ◽

Blind People ◽

Text Book ◽

Visually Impaired People ◽

Impaired People

Internet of things is where all the things are connected to the internet and communicate with each other. There are many applications of the IoT in various fields such as healthcare, agriculture, industries, and logistics, and even for empowering people with disabilities. There are many previous work for the blind people using IoT in finding the obstacles many navigation applications have been developed. In this chapter, a system is proposed to assist blind people in reading books. This method is based on capturing the text book pages as an image and processing them into text with speech as an output.

Download Full-text

Techno-Social Systems in the Internet as a Tool for Social Adaptation of the Visually Impaired

Techno-Social Systems for Modern Economical and Governmental Infrastructures - Advances in Finance, Accounting, and Economics ◽

10.4018/978-1-5225-5586-5.ch011 ◽

2019 ◽

pp. 295-319

Author(s):

Gennadi B. Pronchev ◽

Inna V. Goncharova ◽

Nadezhda G. Proncheva ◽

Danila N. Monakhov ◽

Irina V. Vasenina ◽

...

Keyword(s):

European Union ◽

Great Britain ◽

Visually Impaired ◽

Social Systems ◽

Banking System ◽

Social Adaptation ◽

The Internet ◽

The European Union ◽

Visually Impaired People ◽

Impaired People

The chapter deals with issues related to social adaptation of the visually impaired in techno-social systems of the internet. The current legislation providing access for visually impaired people to such techno-social systems is analyzed, as well as the way the legislation is implemented. Traditions and innovations in the field of accessibility of techno-social systems for visually impaired people in Russia are discussed. The opportunities of the electronic banking system of the European Union and Great Britain for the visually impaired are analyzed.

Download Full-text

Solutions for Elderly Visually Impaired People Using the Internet

People and Computers XIV — Usability or Else! ◽

10.1007/978-1-4471-0515-2_20 ◽

2000 ◽

pp. 299-307 ◽

Cited By ~ 17

Author(s):

Mary Zajicek ◽

Sue Hall

Keyword(s):

Visually Impaired ◽

The Internet ◽

Visually Impaired People ◽

Impaired People

Download Full-text

3M: Multi-style image caption generation using Multi-modality features under Multi-UPDOWN model

The International FLAIRS Conference Proceedings ◽

10.32473/flairs.v34i1.128380 ◽

2021 ◽

Vol 34 (1) ◽

Author(s):

Chengxi Li ◽

Brent Harrison

Keyword(s):

Qualitative Study ◽

State Of The Art ◽

Image Features ◽

Generative Model ◽

Image Captioning ◽

Text Features ◽

Image Caption Generation ◽

Image Caption

In this paper, we build a multi-style generative model for stylish image captioning which uses multi-modality image features, ResNeXt features, and text features generated by DenseCap. We propose the 3M model, a Multi-UPDOWN caption model that encodes multi-modality features and decodes them into captions. We demonstrate the effectiveness of our model on generating human-like captions by examining its performance on two datasets, the PERSONALITY-CAPTIONS dataset, and the FlickrStyle10K dataset. We compare against a variety of state-of-the-art baselines on various automatic NLP metrics such as BLEU, ROUGE-L, CIDEr, SPICE, etc \footnote{code will be available at https://github.com/cici-ai-club/3M}. A qualitative study has also been done to verify our 3M model can be used for generating different stylized captions.

Download Full-text

The accessibility and usage of smartphones by Arab-speaking visually impaired people

International Journal of Pervasive Computing and Communications ◽

10.1108/ijpcc-09-2015-0033 ◽

2015 ◽

Vol 11 (4) ◽

pp. 418-435 ◽

Cited By ~ 6

Author(s):

Najd Al-Mouh ◽

Hend S. Al-Khalifa

Keyword(s):

Saudi Arabia ◽

Mobile Phones ◽

Visually Impaired ◽

Design Methodology ◽

Internet Usage ◽

The Internet ◽

Content Type ◽

Visually Impaired People ◽

Impaired People ◽

Arabic Speaking

Purpose – This paper aims to investigate accessibility and usage of mobile smartphones by Arabic-speaking visually impaired people in Saudi Arabia. Design/methodology/approach – In total, 104 participants with visual impairments were interviewed about their use of mobile phones with the following questions: What is the most commonly used mobile phone? What is the popular domain for which they use mobile phones? What are their favorite applications? What accessibility challenges do they usually face while using mobile phones? How often do they use the Internet via mobile phones and what are the reasons behind that? Findings – This research is the first study with such magnitude to investigate smartphone usage by Arabic-speaking visually impaired people. The survey has revealed that Arabic-speaking visually impaired people utilize mobile phones in different ways and strategies. Getting assistance in performing daily tasks and navigating independently are two of the most common uses for mobile phones. Originality/value – Based on the findings, the authors are going to propose some guidelines to developers to improve smartphone accessibility, application design and Internet usage to improve accessibility for visually impaired people.

Download Full-text

Enhancing spatial learning and mobility training of visually impaired people-a technical paper on the Internet-based tactile and audio-tactile mapping

Canadian Geographer / Le Géographe canadien ◽

10.1111/j.0008-3658.2003.00037.x ◽

2003 ◽

Vol 47 (4) ◽

pp. 480-493 ◽

Cited By ~ 15

Author(s):

Eva Siekierska ◽

Richard Labelle ◽

Louis Brunet ◽

Bill Mccurdy ◽

Peter Pulsifer ◽

...

Keyword(s):

Spatial Learning ◽

Visually Impaired ◽

The Internet ◽

Technical Paper ◽

Visually Impaired People ◽

Impaired People ◽

Tactile Mapping ◽

Mobility Training

Download Full-text

Smart Assistive Shoes with Internet of Things Implementation for Visually Impaired People

Journal of Physics Conference Series ◽

10.1088/1742-6596/2107/1/012030 ◽

2021 ◽

Vol 2107 (1) ◽

pp. 012030

Author(s):

F S Kamaruddin ◽

N H Mahmood ◽

M A Abdul Razak ◽

N A Zakaria

Keyword(s):

Internet Of Things ◽

Mobile Phone ◽

Visually Impaired ◽

Detection System ◽

Obstacle Detection ◽

The Internet ◽

Transfer Data ◽

Visually Impaired People ◽

Impaired People ◽

Project Smart

Abstract Visually impaired people usually have a lot of difficulties involved in interacting with their environment. Physical movement is a major challenge for them, because it can be tricky to make a distinction about where they are and how they can move from one place to another. In this project, smart assistive shoes with Internet of Things (IoT) implementation is designed. These shoes are equipped with ultrasonic sensors and vibration motors that can warn users about obstacles. Next, the IoT system is implemented using Adafruit IO and If This, Then That (IFTTT) to transfer data between Google Assistant and buzzer for shoes position finder purposes. NodeMCU allows the buzzer on shoes to be controlled by the Internet using its WiFi module which is connected to the mobile phone hotspots. As a result, shoes with an obstacle detection system which can detect obstacles within 20 cm distance and shoes position finder using Google Assistant are designed. In conclusion, hopefully these shoes will become one of the alternatives to aid the independent movement of the visually impaired people in the future.

Download Full-text

Enabling visually impaired people to use the Internet

10.1049/ic:19970640 ◽

1997 ◽

Author(s):

M. Zajicek

Keyword(s):

Visually Impaired ◽

The Internet ◽

Visually Impaired People ◽

Impaired People

Download Full-text

MAT: A Multimodal Attentive Translator for Image Captioning

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/563 ◽

2017 ◽

Cited By ~ 21

Author(s):

Chang Liu ◽

Fuchun Sun ◽

Changhu Wang ◽

Feng Wang ◽

Alan Yuille

Keyword(s):

Neural Networks ◽

Recurrent Neural Networks ◽

Visual Information ◽

State Of The Art ◽

Input Image ◽

Target Sequence ◽

Image Captioning ◽

Proposed Model ◽

Image Caption Generation ◽

Image Caption

In this work we formulate the problem of image captioning as a multimodal translation task. Analogous to machine translation, we present a sequence-to-sequence recurrent neural networks (RNN) model for image caption generation. Different from most existing work where the whole image is represented by convolutional neural network (CNN) feature, we propose to represent the input image as a sequence of detected objects which feeds as the source sequence of the RNN model. In this way, the sequential representation of an image can be naturally translated to a sequence of words, as the target sequence of the RNN model. To represent the image in a sequential way, we extract the objects features in the image and arrange them in a order using convolutional neural networks. To further leverage the visual information from the encoded objects, a sequential attention layer is introduced to selectively attend to the objects that are related to generate corresponding words in the sentences. Extensive experiments are conducted to validate the proposed approach on popular benchmark dataset, i.e., MS COCO, and the proposed model surpasses the state-of-the-art methods in all metrics following the dataset splits of previous work. The proposed approach is also evaluated by the evaluation server of MS COCO captioning challenge, and achieves very competitive results, e.g., a CIDEr of 1.029 (c5) and 1.064 (c40).

Download Full-text

Implementation of Sound Workspace Awareness to Visually Impaired Users in Synchronous and Cooperative Web Applications

RENOTE ◽

10.22456/1679-1916.79199 ◽

2018 ◽

Vol 15 (2) ◽

Author(s):

Rodrigo Prestes Machado ◽

Débora Conforto ◽

Eliseo Reategui ◽

Lucila Santarosa

Keyword(s):

Data Collection ◽

Visually Impaired ◽

Web Applications ◽

The Internet ◽

Data Validation ◽

Screen Readers ◽

Visually Impaired People ◽

Data Collection And Analysis ◽

Impaired People ◽

Workspace Awareness

This paper discusses the implementation of accessibility features for visually impaired people in synchronous and cooperative systems on the Internet. Based on the sociocultural scenario of affirmation of difference and on the contexts of cooperation established by Web 2.0, we discuss the application of Sound Workspace Awareness in Groupware tools through Auditory Icons, Earcons, synthesized speech, Live Region, and Screen Readers in a Web chat system. The article presents a qualitative experimental research with internal data validation and an exploratory goal. The process of data collection and analysis was performed using protocols applied with five real users. The results of the experiments showed a satisfactory adherence of these technologies in the construction of Sound Workspace Awareness to screen reader users.

Download Full-text