scholarly journals CAPformer: Pedestrian Crossing Action Prediction Using Transformer

Sensors ◽  
2021 ◽  
Vol 21 (17) ◽  
pp. 5694
Author(s):  
Javier Lorenzo ◽  
Ignacio Parra Alonso ◽  
Rubén Izquierdo ◽  
Augusto Luis Ballardini ◽  
Álvaro Hernández Saz ◽  
...  

Anticipating pedestrian crossing behavior in urban scenarios is a challenging task for autonomous vehicles. Early this year, a benchmark comprising JAAD and PIE datasets have been released. In the benchmark, several state-of-the-art methods have been ranked. However, most of the ranked temporal models rely on recurrent architectures. In our case, we propose, as far as we are concerned, the first self-attention alternative, based on transformer architecture, which has had enormous success in natural language processing (NLP) and recently in computer vision. Our architecture is composed of various branches which fuse video and kinematic data. The video branch is based on two possible architectures: RubiksNet and TimeSformer. The kinematic branch is based on different configurations of transformer encoder. Several experiments have been performed mainly focusing on pre-processing input data, highlighting problems with two kinematic data sources: pose keypoints and ego-vehicle speed. Our proposed model results are comparable to PCPA, the best performing model in the benchmark reaching an F1 Score of nearly 0.78 against 0.77. Furthermore, by using only bounding box coordinates and image data, our model surpasses PCPA by a larger margin (F1=0.75 vs. F1=0.72). Our model has proven to be a valid alternative to recurrent architectures, providing advantages such as parallelization and whole sequence processing, learning relationships between samples not possible with recurrent architectures.

Author(s):  
Noha Ali ◽  
Ahmed H. AbuEl-Atta ◽  
Hala H. Zayed

<span id="docs-internal-guid-cb130a3a-7fff-3e11-ae3d-ad2310e265f8"><span>Deep learning (DL) algorithms achieved state-of-the-art performance in computer vision, speech recognition, and natural language processing (NLP). In this paper, we enhance the convolutional neural network (CNN) algorithm to classify cancer articles according to cancer hallmarks. The model implements a recent word embedding technique in the embedding layer. This technique uses the concept of distributed phrase representation and multi-word phrases embedding. The proposed model enhances the performance of the existing model used for biomedical text classification. The result of the proposed model overcomes the previous model by achieving an F-score equal to 83.87% using an unsupervised technique that trained on PubMed abstracts called PMC vectors (PMCVec) embedding. Also, we made another experiment on the same dataset using the recurrent neural network (RNN) algorithm with two different word embeddings Google news and PMCVec which achieving F-score equal to 74.9% and 76.26%, respectively.</span></span>


Information ◽  
2021 ◽  
Vol 12 (9) ◽  
pp. 374
Author(s):  
Babacar Gaye ◽  
Dezheng Zhang ◽  
Aziguli Wulamu

With the extensive availability of social media platforms, Twitter has become a significant tool for the acquisition of peoples’ views, opinions, attitudes, and emotions towards certain entities. Within this frame of reference, sentiment analysis of tweets has become one of the most fascinating research areas in the field of natural language processing. A variety of techniques have been devised for sentiment analysis, but there is still room for improvement where the accuracy and efficacy of the system are concerned. This study proposes a novel approach that exploits the advantages of the lexical dictionary, machine learning, and deep learning classifiers. We classified the tweets based on the sentiments extracted by TextBlob using a stacked ensemble of three long short-term memory (LSTM) as base classifiers and logistic regression (LR) as a meta classifier. The proposed model proved to be effective and time-saving since it does not require feature extraction, as LSTM extracts features without any human intervention. We also compared our proposed approach with conventional machine learning models such as logistic regression, AdaBoost, and random forest. We also included state-of-the-art deep learning models in comparison with the proposed model. Experiments were conducted on the sentiment140 dataset and were evaluated in terms of accuracy, precision, recall, and F1 Score. Empirical results showed that our proposed approach manifested state-of-the-art results by achieving an accuracy score of 99%.


Author(s):  
Yu Yao ◽  
Ella Atkins ◽  
Matthew Johnson-Roberson ◽  
Ram Vasudevan ◽  
Xiaoxiao Du

Accurate prediction of pedestrian crossing behaviors by autonomous vehicles can significantly improve traffic safety. Existing approaches often model pedestrian behaviors using trajectories or poses but do not offer a deeper semantic interpretation of a person's actions or how actions influence a pedestrian's intention to cross in the future. In this work, we follow the neuroscience and psychological literature to define pedestrian crossing behavior as a combination of an unobserved inner will (a probabilistic representation of binary intent of crossing vs. not crossing) and a set of multi-class actions (e.g., walking, standing, etc.). Intent generates actions, and the future actions in turn reflect the intent. We present a novel multi-task network that predicts future pedestrian actions and uses predicted future action as a prior to detect the present intent and action of the pedestrian. We also designed an attention relation network to incorporate external environmental contexts thus further improve intent and action detection performance. We evaluated our approach on two naturalistic driving datasets, PIE and JAAD, and extensive experiments show significantly improved and more explainable results for both intent detection and action prediction over state-of-the-art approaches. Our code is available at: https://github.com/umautobots/pedestrian_intent_action_detection


2020 ◽  
Vol 34 (02) ◽  
pp. 1741-1748 ◽  
Author(s):  
Meng-Hsuan Yu ◽  
Juntao Li ◽  
Danyang Liu ◽  
Dongyan Zhao ◽  
Rui Yan ◽  
...  

Automatic Storytelling has consistently been a challenging area in the field of natural language processing. Despite considerable achievements have been made, the gap between automatically generated stories and human-written stories is still significant. Moreover, the limitations of existing automatic storytelling methods are obvious, e.g., the consistency of content, wording diversity. In this paper, we proposed a multi-pass hierarchical conditional variational autoencoder model to overcome the challenges and limitations in existing automatic storytelling models. While the conditional variational autoencoder (CVAE) model has been employed to generate diversified content, the hierarchical structure and multi-pass editing scheme allow the story to create more consistent content. We conduct extensive experiments on the ROCStories Dataset. The results verified the validity and effectiveness of our proposed model and yields substantial improvement over the existing state-of-the-art approaches.


2021 ◽  
Vol 15 (02) ◽  
pp. 143-160
Author(s):  
Ayşegül Özkaya Eren ◽  
Mustafa Sert

Generating audio captions is a new research area that combines audio and natural language processing to create meaningful textual descriptions for audio clips. To address this problem, previous studies mostly use the encoder–decoder-based models without considering semantic information. To fill this gap, we present a novel encoder–decoder architecture using bi-directional Gated Recurrent Units (BiGRU) with audio and semantic embeddings. We extract semantic embedding by obtaining subjects and verbs from the audio clip captions and combine these embedding with audio embedding to feed the BiGRU-based encoder–decoder model. To enable semantic embeddings for the test audios, we introduce a Multilayer Perceptron classifier to predict the semantic embeddings of those clips. We also present exhaustive experiments to show the efficiency of different features and datasets for our proposed model the audio captioning task. To extract audio features, we use the log Mel energy features, VGGish embeddings, and a pretrained audio neural network (PANN) embeddings. Extensive experiments on two audio captioning datasets Clotho and AudioCaps show that our proposed model outperforms state-of-the-art audio captioning models across different evaluation metrics and using the semantic information improves the captioning performance.


2018 ◽  
Vol 2018 ◽  
pp. 1-15 ◽  
Author(s):  
Wei Lu ◽  
Yan Tong ◽  
Yue Yu ◽  
Yiqiao Xing ◽  
Changzheng Chen ◽  
...  

With the emergence of unmanned plane, autonomous vehicles, face recognition, and language processing, the artificial intelligence (AI) has remarkably revolutionized our lifestyle. Recent studies indicate that AI has astounding potential to perform much better than human beings in some tasks, especially in the image recognition field. As the amount of image data in imaging center of ophthalmology is increasing dramatically, analyzing and processing these data is in urgent need. AI has been tried to apply to decipher medical data and has made extraordinary progress in intelligent diagnosis. In this paper, we presented the basic workflow for building an AI model and systematically reviewed applications of AI in the diagnosis of eye diseases. Future work should focus on setting up systematic AI platforms to diagnose general eye diseases based on multimodal data in the real world.


2020 ◽  
Vol 34 (28) ◽  
pp. 2050315
Author(s):  
Himanshu Sharma ◽  
Anand Singh Jalal

Image captioning is a multidisciplinary artificial intelligence (AI) research task that has captures the interest of both image and natural language processing experts. Image captioning is a complex problem as it sometimes requires accessing the information that may not be directly visualized in a given scene. It possibly will require common sense interpretation or the detailed knowledge about the object present in image. In this paper, we have given a method that utilizes both visual and external knowledge from knowledge bases such as ConceptNet for better description the images. We demonstrated the usefulness of the method on two publicly available datasets; Flickr8k and Flickr30k.The results explain that the proposed model outperforms the state-of-the art approaches for generating image captions. At last, we will talk about possible future prospects in image captioning.


2020 ◽  
Vol 1 (1) ◽  
Author(s):  
Yuan-Cheng Liu ◽  
Kuei-Yuan Chan

Abstract The interactions with human drivers is one of the major challenges for autonomous vehicles. In this study, we consider urban crossroads without signals where driver interactions are indispensable. Crossroads are parameterized to be used in studying how drivers pass the crossroad while maintaining a desired speed without collision. We define a probability of yielding for each car as a function of vehicle speed and the distance-to-intersection for both vehicles, while the interactions between vehicles are characterized by a point of action for incoming vehicles from different directions. Driver behaviors in terms of acceleration/deceleration given current circumstances are also modeled probabilistically. The method is then analyzed and validated by data collected from human drivers in the simulated environments. The result shows comparable prediction accuracy to the state-of-the-art method, where characteristic parameters of drivers are also shown to be critical for the behavior predictions. We also extend our model to two real-world urban crossroads applications : crash analysis and traffic characteristic parameters identification. In both cases, our prediction results are analogous to those acquired in virtual environments. For autonomous vehicle, our method can help building a computer-driving logic that matches human behaviors, such that interactions between different drivers will be more intuitive.


Author(s):  
Dmitriy Nemchinov

The article presents an analysis of positive practices for ensuring the safety of pedestrians at the inter-section of the city streets carriageway, as well as a description of some innovations of regulatory and tech-nical documents, including an increased number of cases when a safety island can be arranged at a pedestri-an crossing. requirements for providing visibility at a pedestrian crossing to determine the minimum distance of visibility at a pedestrian crossing based on the time required pedestrians for crossing the roadway, recommended options for using ground unregulated pedestrian crossings on trapezoidal artificial irregularities according to GOST R 52605; traffic flow) and Z-shaped (also in the direction of the traffic flow), the requirements for the size of the securi-ty island have been established to allow put bicycle inside of safety island, a recommended set of measures to reduce the vehicle speed and describes the types of activities and describes a method of their application, describes methods zones device with reduced travel speed - residential and school zones, set requirements for turboroundabouts and methods of their design.


2021 ◽  
pp. 1-16
Author(s):  
Ibtissem Gasmi ◽  
Mohamed Walid Azizi ◽  
Hassina Seridi-Bouchelaghem ◽  
Nabiha Azizi ◽  
Samir Brahim Belhaouari

Context-Aware Recommender System (CARS) suggests more relevant services by adapting them to the user’s specific context situation. Nevertheless, the use of many contextual factors can increase data sparsity while few context parameters fail to introduce the contextual effects in recommendations. Moreover, several CARSs are based on similarity algorithms, such as cosine and Pearson correlation coefficients. These methods are not very effective in the sparse datasets. This paper presents a context-aware model to integrate contextual factors into prediction process when there are insufficient co-rated items. The proposed algorithm uses Latent Dirichlet Allocation (LDA) to learn the latent interests of users from the textual descriptions of items. Then, it integrates both the explicit contextual factors and their degree of importance in the prediction process by introducing a weighting function. Indeed, the PSO algorithm is employed to learn and optimize weights of these features. The results on the Movielens 1 M dataset show that the proposed model can achieve an F-measure of 45.51% with precision as 68.64%. Furthermore, the enhancement in MAE and RMSE can respectively reach 41.63% and 39.69% compared with the state-of-the-art techniques.


Sign in / Sign up

Export Citation Format

Share Document