scholarly journals Image Captioning Using Motion-CNN with Object Detection

Sensors ◽  
2021 ◽  
Vol 21 (4) ◽  
pp. 1270
Author(s):  
Kiyohiko Iwamura ◽  
Jun Younes Louhi Kasahara ◽  
Alessandro Moro ◽  
Atsushi Yamashita ◽  
Hajime Asama

Automatic image captioning has many important applications, such as the depiction of visual contents for visually impaired people or the indexing of images on the internet. Recently, deep learning-based image captioning models have been researched extensively. For caption generation, they learn the relation between image features and words included in the captions. However, image features might not be relevant for certain words such as verbs. Therefore, our earlier reported method included the use of motion features along with image features for generating captions including verbs. However, all the motion features were used. Since not all motion features contributed positively to the captioning process, unnecessary motion features decreased the captioning accuracy. As described herein, we use experiments with motion features for thorough analysis of the reasons for the decline in accuracy. We propose a novel, end-to-end trainable method for image caption generation that alleviates the decreased accuracy of caption generation. Our proposed model was evaluated using three datasets: MSR-VTT2016-Image, MSCOCO, and several copyright-free images. Results demonstrate that our proposed method improves caption generation performance.

Author(s):  
Sruthi M. ◽  
Rajasekaran R.

Internet of things is where all the things are connected to the internet and communicate with each other. There are many applications of the IoT in various fields such as healthcare, agriculture, industries, and logistics, and even for empowering people with disabilities. There are many previous work for the blind people using IoT in finding the obstacles many navigation applications have been developed. In this chapter, a system is proposed to assist blind people in reading books. This method is based on capturing the text book pages as an image and processing them into text with speech as an output.


Author(s):  
Gennadi B. Pronchev ◽  
Inna V. Goncharova ◽  
Nadezhda G. Proncheva ◽  
Danila N. Monakhov ◽  
Irina V. Vasenina ◽  
...  

The chapter deals with issues related to social adaptation of the visually impaired in techno-social systems of the internet. The current legislation providing access for visually impaired people to such techno-social systems is analyzed, as well as the way the legislation is implemented. Traditions and innovations in the field of accessibility of techno-social systems for visually impaired people in Russia are discussed. The opportunities of the electronic banking system of the European Union and Great Britain for the visually impaired are analyzed.


Author(s):  
Chengxi Li ◽  
Brent Harrison

In this paper, we build a multi-style generative model for stylish image captioning which uses multi-modality image features, ResNeXt features, and text features generated by DenseCap. We propose the 3M model, a Multi-UPDOWN caption model that encodes multi-modality features and decodes them into captions. We demonstrate the effectiveness of our model on generating human-like captions by examining its performance on two datasets, the PERSONALITY-CAPTIONS dataset, and the FlickrStyle10K dataset. We compare against a variety of state-of-the-art baselines on various automatic NLP metrics such as BLEU, ROUGE-L, CIDEr, SPICE, etc \footnote{code will be available at https://github.com/cici-ai-club/3M}. A qualitative study has also been done to verify our 3M model can be used for generating different stylized captions.


Author(s):  
Najd Al-Mouh ◽  
Hend S. Al-Khalifa

Purpose – This paper aims to investigate accessibility and usage of mobile smartphones by Arabic-speaking visually impaired people in Saudi Arabia. Design/methodology/approach – In total, 104 participants with visual impairments were interviewed about their use of mobile phones with the following questions: What is the most commonly used mobile phone? What is the popular domain for which they use mobile phones? What are their favorite applications? What accessibility challenges do they usually face while using mobile phones? How often do they use the Internet via mobile phones and what are the reasons behind that? Findings – This research is the first study with such magnitude to investigate smartphone usage by Arabic-speaking visually impaired people. The survey has revealed that Arabic-speaking visually impaired people utilize mobile phones in different ways and strategies. Getting assistance in performing daily tasks and navigating independently are two of the most common uses for mobile phones. Originality/value – Based on the findings, the authors are going to propose some guidelines to developers to improve smartphone accessibility, application design and Internet usage to improve accessibility for visually impaired people.


2021 ◽  
Vol 2107 (1) ◽  
pp. 012030
Author(s):  
F S Kamaruddin ◽  
N H Mahmood ◽  
M A Abdul Razak ◽  
N A Zakaria

Abstract Visually impaired people usually have a lot of difficulties involved in interacting with their environment. Physical movement is a major challenge for them, because it can be tricky to make a distinction about where they are and how they can move from one place to another. In this project, smart assistive shoes with Internet of Things (IoT) implementation is designed. These shoes are equipped with ultrasonic sensors and vibration motors that can warn users about obstacles. Next, the IoT system is implemented using Adafruit IO and If This, Then That (IFTTT) to transfer data between Google Assistant and buzzer for shoes position finder purposes. NodeMCU allows the buzzer on shoes to be controlled by the Internet using its WiFi module which is connected to the mobile phone hotspots. As a result, shoes with an obstacle detection system which can detect obstacles within 20 cm distance and shoes position finder using Google Assistant are designed. In conclusion, hopefully these shoes will become one of the alternatives to aid the independent movement of the visually impaired people in the future.


Author(s):  
Chang Liu ◽  
Fuchun Sun ◽  
Changhu Wang ◽  
Feng Wang ◽  
Alan Yuille

In this work we formulate the problem of image captioning as a multimodal translation task. Analogous to machine translation, we present a sequence-to-sequence recurrent neural networks (RNN) model for image caption generation. Different from most existing work where the whole image is represented by convolutional neural network (CNN) feature, we propose to represent the input image as a sequence of detected objects which feeds as the source sequence of the RNN model. In this way, the sequential representation of an image can be naturally translated to a sequence of words, as the target sequence of the RNN model. To represent the image in a sequential way, we extract the objects features in the image and arrange them in a order using convolutional neural networks. To further leverage the visual information from the encoded objects, a sequential attention layer is introduced to selectively attend to the objects that are related to generate corresponding words in the sentences. Extensive experiments are conducted to validate the proposed approach on popular benchmark dataset, i.e., MS COCO, and the proposed model surpasses the state-of-the-art methods in all metrics following the dataset splits of previous work. The proposed approach is also evaluated by the evaluation server of MS COCO captioning challenge, and achieves very competitive results, e.g., a CIDEr of 1.029 (c5) and 1.064 (c40).


RENOTE ◽  
2018 ◽  
Vol 15 (2) ◽  
Author(s):  
Rodrigo Prestes Machado ◽  
Débora Conforto ◽  
Eliseo Reategui ◽  
Lucila Santarosa

This paper discusses the implementation of accessibility features for visually impaired people in synchronous and cooperative systems on the Internet. Based on the sociocultural scenario of affirmation of difference and on the contexts of cooperation established by Web 2.0, we discuss the application of Sound Workspace Awareness in Groupware tools through Auditory Icons, Earcons, synthesized speech, Live Region, and Screen Readers in a Web chat system. The article presents a qualitative experimental research with internal data validation and an exploratory goal. The process of data collection and analysis was performed using protocols applied with five real users. The results of the experiments showed a satisfactory adherence of these technologies in the construction of Sound Workspace Awareness to screen reader users.


Sign in / Sign up

Export Citation Format

Share Document