CAPformer: Pedestrian Crossing Action Prediction Using Transformer

Javier Lorenzo; Ignacio Parra Alonso; Rubén Izquierdo; Augusto Luis Ballardini; Álvaro Hernández Saz; David Fernández Llorca; Miguel Ángel Sotelo

doi:10.3390/s21175694

CAPformer: Pedestrian Crossing Action Prediction Using Transformer

Sensors ◽

10.3390/s21175694 ◽

2021 ◽

Vol 21 (17) ◽

pp. 5694

Author(s):

Javier Lorenzo ◽

Ignacio Parra Alonso ◽

Rubén Izquierdo ◽

Augusto Luis Ballardini ◽

Álvaro Hernández Saz ◽

...

Keyword(s):

Language Processing ◽

Autonomous Vehicles ◽

State Of The Art ◽

Image Data ◽

Vehicle Speed ◽

Action Prediction ◽

Kinematic Data ◽

Proposed Model ◽

Temporal Models ◽

Pedestrian Crossing

Anticipating pedestrian crossing behavior in urban scenarios is a challenging task for autonomous vehicles. Early this year, a benchmark comprising JAAD and PIE datasets have been released. In the benchmark, several state-of-the-art methods have been ranked. However, most of the ranked temporal models rely on recurrent architectures. In our case, we propose, as far as we are concerned, the first self-attention alternative, based on transformer architecture, which has had enormous success in natural language processing (NLP) and recently in computer vision. Our architecture is composed of various branches which fuse video and kinematic data. The video branch is based on two possible architectures: RubiksNet and TimeSformer. The kinematic branch is based on different configurations of transformer encoder. Several experiments have been performed mainly focusing on pre-processing input data, highlighting problems with two kinematic data sources: pose keypoints and ego-vehicle speed. Our proposed model results are comparable to PCPA, the best performing model in the benchmark reaching an F1 Score of nearly 0.78 against 0.77. Furthermore, by using only bounding box coordinates and image data, our model surpasses PCPA by a larger margin (F1=0.75 vs. F1=0.72). Our model has proven to be a valid alternative to recurrent architectures, providing advantages such as parallelization and whole sequence processing, learning relationships between samples not possible with recurrent architectures.

Download Full-text

Enhancing the performance of cancer text classification model based on cancer hallmarks

IAES International Journal of Artificial Intelligence (IJ-AI) ◽

10.11591/ijai.v10.i2.pp316-323 ◽

2021 ◽

Vol 10 (2) ◽

pp. 316

Author(s):

Noha Ali ◽

Ahmed H. AbuEl-Atta ◽

Hala H. Zayed

Keyword(s):

Neural Network ◽

Language Processing ◽

Text Classification ◽

State Of The Art ◽

Classification Model ◽

Biomedical Text ◽

Cancer Hallmarks ◽

Embedding Technique ◽

Proposed Model ◽

Biomedical Text Classification

<span id="docs-internal-guid-cb130a3a-7fff-3e11-ae3d-ad2310e265f8"><span>Deep learning (DL) algorithms achieved state-of-the-art performance in computer vision, speech recognition, and natural language processing (NLP). In this paper, we enhance the convolutional neural network (CNN) algorithm to classify cancer articles according to cancer hallmarks. The model implements a recent word embedding technique in the embedding layer. This technique uses the concept of distributed phrase representation and multi-word phrases embedding. The proposed model enhances the performance of the existing model used for biomedical text classification. The result of the proposed model overcomes the previous model by achieving an F-score equal to 83.87% using an unsupervised technique that trained on PubMed abstracts called PMC vectors (PMCVec) embedding. Also, we made another experiment on the same dataset using the recurrent neural network (RNN) algorithm with two different word embeddings Google news and PMCVec which achieving F-score equal to 74.9% and 76.26%, respectively.</span></span>

Download Full-text

A Tweet Sentiment Classification Approach Using a Hybrid Stacked Ensemble Technique

Information ◽

10.3390/info12090374 ◽

2021 ◽

Vol 12 (9) ◽

pp. 374

Author(s):

Babacar Gaye ◽

Dezheng Zhang ◽

Aziguli Wulamu

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Deep Learning ◽

Sentiment Analysis ◽

Language Processing ◽

Short Term Memory ◽

State Of The Art ◽

Accuracy Score ◽

Learning Models ◽

Proposed Model

With the extensive availability of social media platforms, Twitter has become a significant tool for the acquisition of peoples’ views, opinions, attitudes, and emotions towards certain entities. Within this frame of reference, sentiment analysis of tweets has become one of the most fascinating research areas in the field of natural language processing. A variety of techniques have been devised for sentiment analysis, but there is still room for improvement where the accuracy and efficacy of the system are concerned. This study proposes a novel approach that exploits the advantages of the lexical dictionary, machine learning, and deep learning classifiers. We classified the tweets based on the sentiments extracted by TextBlob using a stacked ensemble of three long short-term memory (LSTM) as base classifiers and logistic regression (LR) as a meta classifier. The proposed model proved to be effective and time-saving since it does not require feature extraction, as LSTM extracts features without any human intervention. We also compared our proposed approach with conventional machine learning models such as logistic regression, AdaBoost, and random forest. We also included state-of-the-art deep learning models in comparison with the proposed model. Experiments were conducted on the sentiment140 dataset and were evaluated in terms of accuracy, precision, recall, and F1 Score. Empirical results showed that our proposed approach manifested state-of-the-art results by achieving an accuracy score of 99%.

Download Full-text

Coupling Intent and Action for Pedestrian Crossing Behavior Prediction

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/171 ◽

2021 ◽

Author(s):

Yu Yao ◽

Ella Atkins ◽

Matthew Johnson-Roberson ◽

Ram Vasudevan ◽

Xiaoxiao Du

Keyword(s):

Traffic Safety ◽

Autonomous Vehicles ◽

State Of The Art ◽

Semantic Interpretation ◽

Behavior Prediction ◽

Probabilistic Representation ◽

Class Actions ◽

The Future ◽

Naturalistic Driving ◽

Pedestrian Crossing

Accurate prediction of pedestrian crossing behaviors by autonomous vehicles can significantly improve traffic safety. Existing approaches often model pedestrian behaviors using trajectories or poses but do not offer a deeper semantic interpretation of a person's actions or how actions influence a pedestrian's intention to cross in the future. In this work, we follow the neuroscience and psychological literature to define pedestrian crossing behavior as a combination of an unobserved inner will (a probabilistic representation of binary intent of crossing vs. not crossing) and a set of multi-class actions (e.g., walking, standing, etc.). Intent generates actions, and the future actions in turn reflect the intent. We present a novel multi-task network that predicts future pedestrian actions and uses predicted future action as a prior to detect the present intent and action of the pedestrian. We also designed an attention relation network to incorporate external environmental contexts thus further improve intent and action detection performance. We evaluated our approach on two naturalistic driving datasets, PIE and JAAD, and extensive experiments show significantly improved and more explainable results for both intent detection and action prediction over state-of-the-art approaches. Our code is available at: https://github.com/umautobots/pedestrian_intent_action_detection

Download Full-text

Draft and Edit: Automatic Storytelling Through Multi-Pass Hierarchical Conditional Variational Autoencoder

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i02.5538 ◽

2020 ◽

Vol 34 (02) ◽

pp. 1741-1748 ◽

Cited By ~ 2

Author(s):

Meng-Hsuan Yu ◽

Juntao Li ◽

Danyang Liu ◽

Dongyan Zhao ◽

Rui Yan ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Hierarchical Structure ◽

Language Processing ◽

State Of The Art ◽

Substantial Improvement ◽

Proposed Model ◽

Variational Autoencoder ◽

The Hierarchical Structure

Automatic Storytelling has consistently been a challenging area in the field of natural language processing. Despite considerable achievements have been made, the gap between automatically generated stories and human-written stories is still significant. Moreover, the limitations of existing automatic storytelling methods are obvious, e.g., the consistency of content, wording diversity. In this paper, we proposed a multi-pass hierarchical conditional variational autoencoder model to overcome the challenges and limitations in existing automatic storytelling models. While the conditional variational autoencoder (CVAE) model has been employed to generate diversified content, the hierarchical structure and multi-pass editing scheme allow the story to create more consistent content. We conduct extensive experiments on the ROCStories Dataset. The results verified the validity and effectiveness of our proposed model and yields substantial improvement over the existing state-of-the-art approaches.

Download Full-text

Audio Captioning with Composition of Acoustic and Semantic Information

International Journal of Semantic Computing ◽

10.1142/s1793351x21400018 ◽

2021 ◽

Vol 15 (02) ◽

pp. 143-160

Author(s):

Ayşegül Özkaya Eren ◽

Mustafa Sert

Keyword(s):

Language Processing ◽

Semantic Information ◽

State Of The Art ◽

Research Area ◽

Audio Features ◽

Audio Clip ◽

Proposed Model ◽

Decoder Architecture ◽

Gated Recurrent Units ◽

New Research

Generating audio captions is a new research area that combines audio and natural language processing to create meaningful textual descriptions for audio clips. To address this problem, previous studies mostly use the encoder–decoder-based models without considering semantic information. To fill this gap, we present a novel encoder–decoder architecture using bi-directional Gated Recurrent Units (BiGRU) with audio and semantic embeddings. We extract semantic embedding by obtaining subjects and verbs from the audio clip captions and combine these embedding with audio embedding to feed the BiGRU-based encoder–decoder model. To enable semantic embeddings for the test audios, we introduce a Multilayer Perceptron classifier to predict the semantic embeddings of those clips. We also present exhaustive experiments to show the efficiency of different features and datasets for our proposed model the audio captioning task. To extract audio features, we use the log Mel energy features, VGGish embeddings, and a pretrained audio neural network (PANN) embeddings. Extensive experiments on two audio captioning datasets Clotho and AudioCaps show that our proposed model outperforms state-of-the-art audio captioning models across different evaluation metrics and using the semantic information improves the captioning performance.

Download Full-text

Applications of Artificial Intelligence in Ophthalmology: General Overview

Journal of Ophthalmology ◽

10.1155/2018/5278196 ◽

2018 ◽

Vol 2018 ◽

pp. 1-15 ◽

Cited By ~ 13

Author(s):

Wei Lu ◽

Yan Tong ◽

Yue Yu ◽

Yiqiao Xing ◽

Changzheng Chen ◽

...

Keyword(s):

Artificial Intelligence ◽

Language Processing ◽

Autonomous Vehicles ◽

Image Data ◽

Eye Diseases ◽

Human Beings ◽

Multimodal Data ◽

Imaging Center ◽

Future Work ◽

Better Than

With the emergence of unmanned plane, autonomous vehicles, face recognition, and language processing, the artificial intelligence (AI) has remarkably revolutionized our lifestyle. Recent studies indicate that AI has astounding potential to perform much better than human beings in some tasks, especially in the image recognition field. As the amount of image data in imaging center of ophthalmology is increasing dramatically, analyzing and processing these data is in urgent need. AI has been tried to apply to decipher medical data and has made extraordinary progress in intelligent diagnosis. In this paper, we presented the basic workflow for building an AI model and systematically reviewed applications of AI in the diagnosis of eye diseases. Future work should focus on setting up systematic AI platforms to diagnose general eye diseases based on multimodal data in the real world.

Download Full-text

Incorporating external knowledge for image captioning using CNN and LSTM

Modern Physics Letters B ◽

10.1142/s0217984920503157 ◽

2020 ◽

Vol 34 (28) ◽

pp. 2050315

Author(s):

Himanshu Sharma ◽

Anand Singh Jalal

Keyword(s):

Language Processing ◽

State Of The Art ◽

Knowledge Bases ◽

Complex Problem ◽

Detailed Knowledge ◽

Future Prospects ◽

Image Captioning ◽

External Knowledge ◽

Proposed Model ◽

Research Task

Image captioning is a multidisciplinary artificial intelligence (AI) research task that has captures the interest of both image and natural language processing experts. Image captioning is a complex problem as it sometimes requires accessing the information that may not be directly visualized in a given scene. It possibly will require common sense interpretation or the detailed knowledge about the object present in image. In this paper, we have given a method that utilizes both visual and external knowledge from knowledge bases such as ConceptNet for better description the images. We demonstrated the usefulness of the method on two publicly available datasets; Flickr8k and Flickr30k.The results explain that the proposed model outperforms the state-of-the art approaches for generating image captions. At last, we will talk about possible future prospects in image captioning.

Download Full-text

Probabilistic Modeling of Driver Behaviors at Urban Crossroad Interactions

Journal of Autonomous Vehicles and Systems ◽

10.1115/1.4048178 ◽

2020 ◽

Vol 1 (1) ◽

Author(s):

Yuan-Cheng Liu ◽

Kuei-Yuan Chan

Keyword(s):

Virtual Environments ◽

Autonomous Vehicles ◽

State Of The Art ◽

Autonomous Vehicle ◽

Vehicle Speed ◽

Crash Analysis ◽

Characteristic Parameters ◽

Human Behaviors ◽

Simulated Environments ◽

Traffic Characteristic

Abstract The interactions with human drivers is one of the major challenges for autonomous vehicles. In this study, we consider urban crossroads without signals where driver interactions are indispensable. Crossroads are parameterized to be used in studying how drivers pass the crossroad while maintaining a desired speed without collision. We define a probability of yielding for each car as a function of vehicle speed and the distance-to-intersection for both vehicles, while the interactions between vehicles are characterized by a point of action for incoming vehicles from different directions. Driver behaviors in terms of acceleration/deceleration given current circumstances are also modeled probabilistically. The method is then analyzed and validated by data collected from human drivers in the simulated environments. The result shows comparable prediction accuracy to the state-of-the-art method, where characteristic parameters of drivers are also shown to be critical for the behavior predictions. We also extend our model to two real-world urban crossroads applications : crash analysis and traffic characteristic parameters identification. In both cases, our prediction results are analogous to those acquired in virtual environments. For autonomous vehicle, our method can help building a computer-driving logic that matches human behaviors, such that interactions between different drivers will be more intuitive.

Download Full-text

ENSURING THE PEDESTRIAN’S SAFETY DESIGN OF URBAN STREET CROSSWALKS

Биосферная совместимость: человек, регион, технологии ◽

10.21869/23-11-1518-2019-26-2-103-110 ◽

2019 ◽

pp. 103-110

Author(s):

Dmitriy Nemchinov

Keyword(s):

Traffic Flow ◽

Minimum Distance ◽

Travel Speed ◽

Vehicle Speed ◽

Urban Street ◽

Safety Design ◽

Positive Practices ◽

Time Required ◽

Pedestrian Crossing ◽

The City

The article presents an analysis of positive practices for ensuring the safety of pedestrians at the inter-section of the city streets carriageway, as well as a description of some innovations of regulatory and tech-nical documents, including an increased number of cases when a safety island can be arranged at a pedestri-an crossing. requirements for providing visibility at a pedestrian crossing to determine the minimum distance of visibility at a pedestrian crossing based on the time required pedestrians for crossing the roadway, recommended options for using ground unregulated pedestrian crossings on trapezoidal artificial irregularities according to GOST R 52605; traffic flow) and Z-shaped (also in the direction of the traffic flow), the requirements for the size of the securi-ty island have been established to allow put bicycle inside of safety island, a recommended set of measures to reduce the vehicle speed and describes the types of activities and describes a method of their application, describes methods zones device with reduced travel speed - residential and school zones, set requirements for turboroundabouts and methods of their design.

Download Full-text

Enhanced context-aware recommendation using topic modeling and particle swarm optimization

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-210331 ◽

2021 ◽

pp. 1-16

Author(s):

Ibtissem Gasmi ◽

Mohamed Walid Azizi ◽

Hassina Seridi-Bouchelaghem ◽

Nabiha Azizi ◽

Samir Brahim Belhaouari

Keyword(s):

Topic Modeling ◽

Latent Dirichlet Allocation ◽

State Of The Art ◽

Weighting Function ◽

Contextual Factors ◽

Pearson Correlation ◽

Correlation Coefficients ◽

Pso Algorithm ◽

Context Aware ◽

Proposed Model

Context-Aware Recommender System (CARS) suggests more relevant services by adapting them to the user’s specific context situation. Nevertheless, the use of many contextual factors can increase data sparsity while few context parameters fail to introduce the contextual effects in recommendations. Moreover, several CARSs are based on similarity algorithms, such as cosine and Pearson correlation coefficients. These methods are not very effective in the sparse datasets. This paper presents a context-aware model to integrate contextual factors into prediction process when there are insufficient co-rated items. The proposed algorithm uses Latent Dirichlet Allocation (LDA) to learn the latent interests of users from the textual descriptions of items. Then, it integrates both the explicit contextual factors and their degree of importance in the prediction process by introducing a weighting function. Indeed, the PSO algorithm is employed to learn and optimize weights of these features. The results on the Movielens 1 M dataset show that the proposed model can achieve an F-measure of 45.51% with precision as 68.64%. Furthermore, the enhancement in MAE and RMSE can respectively reach 41.63% and 39.69% compared with the state-of-the-art techniques.

Download Full-text