Vehicle Speed Estimation Based on 3D ConvNets and Non-Local Blocks

Huanan Dong; Ming Wen; Zhouwang Yang

doi:10.3390/fi11060123

Vehicle Speed Estimation Based on 3D ConvNets and Non-Local Blocks

Future Internet ◽

10.3390/fi11060123 ◽

2019 ◽

Vol 11 (6) ◽

pp. 123

Author(s):

Huanan Dong ◽

Ming Wen ◽

Zhouwang Yang

Keyword(s):

Optical Flow ◽

Camera Calibration ◽

Absolute Error ◽

Speed Estimation ◽

Vehicle Speed ◽

Convolutional Network ◽

Video Footage ◽

Convolutional Networks ◽

Calibration Methods ◽

Non Local

Vehicle speed estimation is an important problem in traffic surveillance. Many existing approaches to this problem are based on camera calibration. Two shortcomings exist for camera calibration-based methods. First, camera calibration methods are sensitive to the environment, which means the accuracy of the results are compromised in some situations where the environmental condition is not satisfied. Furthermore, camera calibration-based methods rely on vehicle trajectories acquired by a two-stage tracking and detection process. In an effort to overcome these shortcomings, we propose an alternate end-to-end method based on 3-dimensional convolutional networks (3D ConvNets). The proposed method bases average vehicle speed estimation on information from video footage. Our methods are characterized by the following three features. First, we use non-local blocks in our model to better capture spatial–temporal long-range dependency. Second, we use optical flow as an input in the model. Optical flow includes the information on the speed and direction of pixel motion in an image. Third, we construct a multi-scale convolutional network. This network extracts information on various characteristics of vehicles in motion. The proposed method showcases promising experimental results on commonly used dataset with mean absolute error (MAE) as 2.71 km/h and mean square error (MSE) as 14.62 .

Download Full-text

Dynamic camera calibration of roadside traffic management cameras for vehicle speed estimation

IEEE Transactions on Intelligent Transportation Systems ◽

10.1109/tits.2003.821213 ◽

2003 ◽

Vol 4 (2) ◽

pp. 90-98 ◽

Cited By ~ 156

Author(s):

T.N. Schoepflin ◽

D.J. Dailey

Keyword(s):

Camera Calibration ◽

Traffic Management ◽

Speed Estimation ◽

Vehicle Speed

Download Full-text

Automated Video Behavior Recognition of Pigs Using Two-Stream Convolutional Networks

Sensors ◽

10.3390/s20041085 ◽

2020 ◽

Vol 20 (4) ◽

pp. 1085

Author(s):

Kaifeng Zhang ◽

Dan Li ◽

Jiayun Huang ◽

Yifei Chen

Keyword(s):

Feature Extraction ◽

Deep Learning ◽

Optical Flow ◽

Network Models ◽

Well Being ◽

Motion Information ◽

Behavior Recognition ◽

Convolutional Network ◽

Convolutional Networks ◽

Effective Manner

The detection of pig behavior helps detect abnormal conditions such as diseases and dangerous movements in a timely and effective manner, which plays an important role in ensuring the health and well-being of pigs. Monitoring pig behavior by staff is time consuming, subjective, and impractical. Therefore, there is an urgent need to implement methods for identifying pig behavior automatically. In recent years, deep learning has been gradually applied to the study of pig behavior recognition. Existing studies judge the behavior of the pig only based on the posture of the pig in a still image frame, without considering the motion information of the behavior. However, optical flow can well reflect the motion information. Thus, this study took image frames and optical flow from videos as two-stream input objects to fully extract the temporal and spatial behavioral characteristics. Two-stream convolutional network models based on deep learning were proposed, including inflated 3D convnet (I3D) and temporal segment networks (TSN) whose feature extraction network is Residual Network (ResNet) or the Inception architecture (e.g., Inception with Batch Normalization (BN-Inception), InceptionV3, InceptionV4, or InceptionResNetV2) to achieve pig behavior recognition. A standard pig video behavior dataset that included 1000 videos of feeding, lying, walking, scratching and mounting from five kinds of different behavioral actions of pigs under natural conditions was created. The dataset was used to train and test the proposed models, and a series of comparative experiments were conducted. The experimental results showed that the TSN model whose feature extraction network was ResNet101 was able to recognize pig feeding, lying, walking, scratching, and mounting behaviors with a higher average of 98.99%, and the average recognition time of each video was 0.3163 s. The TSN model (ResNet101) is superior to the other models in solving the task of pig behavior recognition.

Download Full-text

A methodology of vehicle speed estimation based on optical flow

Proceedings of 2014 IEEE International Conference on Service Operations and Logistics, and Informatics ◽

10.1109/soli.2014.6960689 ◽

2014 ◽

Cited By ~ 2

Author(s):

Xu Qimin ◽

Li Xu ◽

Wu Mingming ◽

Li Bin ◽

Song Xianghui

Keyword(s):

Optical Flow ◽

Speed Estimation ◽

Vehicle Speed

Download Full-text

Camera calibration and near-view vehicle speed estimation

10.1117/12.765077 ◽

2008 ◽

Cited By ~ 2

Author(s):

Futang Peng ◽

Changsong Liu ◽

Xiaoqing Ding

Keyword(s):

Camera Calibration ◽

Speed Estimation ◽

Vehicle Speed

Download Full-text

Densely Connected Graph Convolutional Networks for Graph-to-Sequence Learning

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00269 ◽

2019 ◽

Vol 7 ◽

pp. 297-312 ◽

Cited By ~ 3

Author(s):

Zhijiang Guo ◽

Yan Zhang ◽

Zhiyang Teng ◽

Wei Lu

Keyword(s):

Sequence Learning ◽

Connected Graph ◽

Structural Information ◽

Text Generation ◽

Structural Representation ◽

Convolutional Network ◽

Neural Machine Translation ◽

Convolutional Networks ◽

Deep Architecture ◽

Non Local

We focus on graph-to-sequence learning, which can be framed as transducing graph structures to sequences for text generation. To capture structural information associated with graphs, we investigate the problem of encoding graphs using graph convolutional networks (GCNs). Unlike various existing approaches where shallow architectures were used for capturing local structural information only, we introduce a dense connection strategy, proposing a novel Densely Connected Graph Convolutional Network (DCGCN). Such a deep architecture is able to integrate both local and non-local features to learn a better structural representation of a graph. Our model outperforms the state-of-the-art neural models significantly on AMR-to-text generation and syntax-based neural machine translation.

Download Full-text

Vehicle Speed Estimation Using Optical Flow

10.14209/sbrt.2015.68 ◽

2015 ◽

Author(s):

Fábio Crestani ◽

Daniel Pipa ◽

and Carnieri.

Keyword(s):

Optical Flow ◽

Speed Estimation ◽

Vehicle Speed

Download Full-text

Optical Flow Estimation Using a Non-Local Convolutional Network

Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence ◽

10.1145/3404555.3404616 ◽

2020 ◽

Author(s):

Liping Zhang ◽

Zongqing Lu

Keyword(s):

Optical Flow ◽

Convolutional Network ◽

Flow Estimation ◽

Optical Flow Estimation ◽

Non Local

Download Full-text

Efficient End-to-End Sentence-Level Lipreading with Temporal Convolutional Networks

Applied Sciences ◽

10.3390/app11156975 ◽

2021 ◽

Vol 11 (15) ◽

pp. 6975

Author(s):

Tao Zhang ◽

Lun He ◽

Xudong Li ◽

Guoqing Feng

Keyword(s):

Performance Improvement ◽

State Of The Art ◽

Error Rates ◽

Convolutional Network ◽

Convolutional Networks ◽

Sentence Level ◽

End To End ◽

High Level ◽

Improved Accuracy ◽

Talking Face

Lipreading aims to recognize sentences being spoken by a talking face. In recent years, the lipreading method has achieved a high level of accuracy on large datasets and made breakthrough progress. However, lipreading is still far from being solved, and existing methods tend to have high error rates on the wild data and have the defects of disappearing training gradient and slow convergence. To overcome these problems, we proposed an efficient end-to-end sentence-level lipreading model, using an encoder based on a 3D convolutional network, ResNet50, Temporal Convolutional Network (TCN), and a CTC objective function as the decoder. More importantly, the proposed architecture incorporates TCN as a feature learner to decode feature. It can partly eliminate the defects of RNN (LSTM, GRU) gradient disappearance and insufficient performance, and this yields notable performance improvement as well as faster convergence. Experiments show that the training and convergence speed are 50% faster than the state-of-the-art method, and improved accuracy by 2.4% on the GRID dataset.

Download Full-text

Deep Learning-Based Congestion Detection at Urban Intersections

Sensors ◽

10.3390/s21062052 ◽

2021 ◽

Vol 21 (6) ◽

pp. 2052

Author(s):

Xinghai Yang ◽

Fengjiao Wang ◽

Zhiquan Bai ◽

Feifei Xun ◽

Yulin Zhang ◽

...

Keyword(s):

Deep Learning ◽

Optical Flow ◽

Traffic Congestion ◽

Detection Algorithm ◽

Input Image ◽

Vehicle Speed ◽

Position Information ◽

Traffic State ◽

State Discrimination ◽

Discrimination Method

In this paper, a deep learning-based traffic state discrimination method is proposed to detect traffic congestion at urban intersections. The detection algorithm includes two parts, global speed detection and a traffic state discrimination algorithm. Firstly, the region of interest (ROI) is selected as the road intersection from the input image of the You Only Look Once (YOLO) v3 object detection algorithm for vehicle target detection. The Lucas-Kanade (LK) optical flow method is employed to calculate the vehicle speed. Then, the corresponding intersection state can be obtained based on the vehicle speed and the discrimination algorithm. The detection of the vehicle takes the position information obtained by YOLOv3 as the input of the LK optical flow algorithm and forms an optical flow vector to complete the vehicle speed detection. Experimental results show that the detection algorithm can detect the vehicle speed and traffic state discrimination method can judge the traffic state accurately, which has a strong anti-interference ability and meets the practical application requirements.

Download Full-text

Knowledge-aware Multi-modal Adaptive Graph Convolutional Networks for Fake News Detection

ACM Transactions on Multimedia Computing Communications and Applications ◽

10.1145/3451215 ◽

2021 ◽

Vol 17 (3) ◽

pp. 1-23

Author(s):

Shengsheng Qian ◽

Jun Hu ◽

Quan Fang ◽

Changsheng Xu

Keyword(s):

Social Media ◽

Visual Information ◽

Representation Learning ◽

Fake News ◽

Unified Framework ◽

Model Learning ◽

Convolutional Network ◽

Textual Information ◽

Convolutional Networks ◽

Real World Datasets

In this article, we focus on fake news detection task and aim to automatically identify the fake news from vast amount of social media posts. To date, many approaches have been proposed to detect fake news, which includes traditional learning methods and deep learning-based models. However, there are three existing challenges: (i) How to represent social media posts effectively, since the post content is various and highly complicated; (ii) how to propose a data-driven method to increase the flexibility of the model to deal with the samples in different contexts and news backgrounds; and (iii) how to fully utilize the additional auxiliary information (the background knowledge and multi-modal information) of posts for better representation learning. To tackle the above challenges, we propose a novel Knowledge-aware Multi-modal Adaptive Graph Convolutional Networks (KMAGCN) to capture the semantic representations by jointly modeling the textual information, knowledge concepts, and visual information into a unified framework for fake news detection. We model posts as graphs and use a knowledge-aware multi-modal adaptive graph learning principal for the effective feature learning. Compared with existing methods, the proposed KMAGCN addresses challenges from three aspects: (1) It models posts as graphs to capture the non-consecutive and long-range semantic relations; (2) it proposes a novel adaptive graph convolutional network to handle the variability of graph data; and (3) it leverages textual information, knowledge concepts and visual information jointly for model learning. We have conducted extensive experiments on three public real-world datasets and superior results demonstrate the effectiveness of KMAGCN compared with other state-of-the-art algorithms.

Download Full-text