MME-YOLO: Multi-Sensor Multi-Level Enhanced YOLO for Robust Vehicle Detection in Traffic Surveillance

Jianxiao Zhu; Xu Li; Peng Jin; Qimin Xu; Zhengliang Sun; Xiang Song

doi:10.3390/s21010027

MME-YOLO: Multi-Sensor Multi-Level Enhanced YOLO for Robust Vehicle Detection in Traffic Surveillance

Sensors ◽

10.3390/s21010027 ◽

2020 ◽

Vol 21 (1) ◽

pp. 27

Author(s):

Jianxiao Zhu ◽

Xu Li ◽

Peng Jin ◽

Qimin Xu ◽

Zhengliang Sun ◽

...

Keyword(s):

Network Architecture ◽

Visual Cues ◽

Effective Means ◽

Vehicle Detection ◽

Feature Maps ◽

Convolutional Network ◽

Robust Detection ◽

Tightly Coupled ◽

Multi Level ◽

Composite Module

As an effective means of solving collision problems caused by the limited perspective on board, the cooperative roadside system is gaining popularity. To improve the vehicle detection abilities in such online safety systems, in this paper, we propose a novel multi-sensor multi-level enhanced convolutional network model, called multi-sensor multi-level enhanced convolutional network architecture (MME-YOLO), with consideration of hybrid realistic scene of scales, illumination, and occlusion. MME-YOLO consists of two tightly coupled structures, i.e., the enhanced inference head and the LiDAR-Image composite module. More specifically, the enhanced inference head preliminarily equips the network with stronger inference abilities for redundant visual cues by attention-guided feature selection blocks and anchor-based/anchor-free ensemble head. Furthermore, the LiDAR-Image composite module cascades the multi-level feature maps from the LiDAR subnet to the image subnet, which strengthens the generalization of the detector in complex scenarios. Compared with YOLOv3, the enhanced inference head achieves a 5.83% and 4.88% mAP improvement on visual dataset LVSH and UA-DETRAC, respectively. Integrated with the composite module, the overall architecture gains 91.63% mAP in the collected Road-side Dataset. Experiments show that even under the abnormal lightings and the inconsistent scales at evening rush hours, the proposed MME-YOLO maintains reliable recognition accuracy and robust detection performance.

Download Full-text

ENCODER-DECODER ATTENTION NETWORK (EDANET) FOR POLYP SEGMENTATION IN COLONOSCOPY IMAGES

10.36106/ijsr/4333914 ◽

2021 ◽

pp. 8-10

Author(s):

Madhura Prakash M ◽

Krishnamurthy G. N

Keyword(s):

Neural Network ◽

Network Architecture ◽

Surgical Planning ◽

Vital Role ◽

Colorectal Polyps ◽

Colon Polyps ◽

Feature Maps ◽

Neural Network Architecture ◽

Multi Level ◽

Fully Connected

Colorectal cancer (CRC) is one of the most common malignancies that can develop from high-risk colon polyps. Colonoscopy is a standard for examination and detection of colorectal polyps.[1] Segmentation and distinction of polyps can play a vital role in treatment (e.g., surgical planning) and predictive decision making. This paper proposes a neural network architecture called EDANet, using attention gates to effectively combine multi-level features to yield accurate polyp segmentation. The Encoder is a fully connected Convolution Neural Network (CNN) and the decoder part is a Cascaded Partial Decoder. Encoder and Decoder sub-networks are connected through a series of nested, dense skip pathways. The skip pathways aim at reducing the semantic gap between the feature maps of the Encoder and Decoder sub-networks. The proposed system trains the model on several epochs and it unies the previous epoch mask with the feature map of the current training epoch. The previous epoch mask is then used to provide a hard attention to the learnt feature maps at different convolutional layers. Experimental results demonstrate that the model trained and tested on the Kvasir-SEG dataset achieves a dice coefcient of 0.7874, mean Intersection over Union (mIoU) of 0.7010, recall of 0.7987, and a precision of 0.8577.

Download Full-text

A Densely Connected Network Based on U-Net for Medical Image Segmentation

ACM Transactions on Multimedia Computing Communications and Applications ◽

10.1145/3446618 ◽

2021 ◽

Vol 17 (3) ◽

pp. 1-14

Author(s):

Zhenzhen Yang ◽

Pengfei Xu ◽

Yongpeng Yang ◽

Bing-Kun Bao

Keyword(s):

Feature Extraction ◽

Image Segmentation ◽

Loss Function ◽

Network Architecture ◽

Medical Image ◽

Medical Image Segmentation ◽

Cross Entropy ◽

Loss Functions ◽

Feature Maps ◽

Different Levels

The U-Net has become the most popular structure in medical image segmentation in recent years. Although its performance for medical image segmentation is outstanding, a large number of experiments demonstrate that the classical U-Net network architecture seems to be insufficient when the size of segmentation targets changes and the imbalance happens between target and background in different forms of segmentation. To improve the U-Net network architecture, we develop a new architecture named densely connected U-Net (DenseUNet) network in this article. The proposed DenseUNet network adopts a dense block to improve the feature extraction capability and employs a multi-feature fuse block fusing feature maps of different levels to increase the accuracy of feature extraction. In addition, in view of the advantages of the cross entropy and the dice loss functions, a new loss function for the DenseUNet network is proposed to deal with the imbalance between target and background. Finally, we test the proposed DenseUNet network and compared it with the multi-resolutional U-Net (MultiResUNet) and the classic U-Net networks on three different datasets. The experimental results show that the DenseUNet network has significantly performances compared with the MultiResUNet and the classic U-Net networks.

Download Full-text

Hybrid Graph Neural Networks for Crowd Counting

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6839 ◽

2020 ◽

Vol 34 (07) ◽

pp. 11693-11700 ◽

Cited By ~ 2

Author(s):

Ao Luo ◽

Fan Yang ◽

Xin Li ◽

Dong Nie ◽

Zhicheng Jiao ◽

...

Keyword(s):

Network Architecture ◽

Message Passing ◽

Large Scale ◽

State Of The Art ◽

Density Variation ◽

Feature Maps ◽

Crowd Counting ◽

Multi Scale ◽

Crowd Density ◽

Graph Neural Networks

Crowd counting is an important yet challenging task due to the large scale and density variation. Recent investigations have shown that distilling rich relations among multi-scale features and exploiting useful information from the auxiliary task, i.e., localization, are vital for this task. Nevertheless, how to comprehensively leverage these relations within a unified network architecture is still a challenging problem. In this paper, we present a novel network structure called Hybrid Graph Neural Network (HyGnn) which targets to relieve the problem by interweaving the multi-scale features for crowd density as well as its auxiliary task (localization) together and performing joint reasoning over a graph. Specifically, HyGnn integrates a hybrid graph to jointly represent the task-specific feature maps of different scales as nodes, and two types of relations as edges: (i) multi-scale relations capturing the feature dependencies across scales and (ii) mutual beneficial relations building bridges for the cooperation between counting and localization. Thus, through message passing, HyGnn can capture and distill richer relations between nodes to obtain more powerful representations, providing robust and accurate results. Our HyGnn performs significantly well on four challenging datasets: ShanghaiTech Part A, ShanghaiTech Part B, UCF_CC_50 and UCF_QNRF, outperforming the state-of-the-art algorithms by a large margin.

Download Full-text

FetNet: a recurrent convolutional network for occlusion identification in fetoscopic videos

International Journal of Computer Assisted Radiology and Surgery ◽

10.1007/s11548-020-02169-0 ◽

2020 ◽

Vol 15 (5) ◽

pp. 791-801 ◽

Cited By ~ 1

Author(s):

Sophia Bano ◽

Francisco Vasconcelos ◽

Emmanuel Vander Poorten ◽

Tom Vercauteren ◽

Sebastien Ourselin ◽

...

Keyword(s):

Neural Network ◽

Network Architecture ◽

Laser Photocoagulation ◽

Short Term Memory ◽

Total Duration ◽

Frame Rate ◽

Superior Performance ◽

Computer Assisted ◽

Convolutional Network ◽

Spatio Temporal

Abstract Purpose Fetoscopic laser photocoagulation is a minimally invasive surgery for the treatment of twin-to-twin transfusion syndrome (TTTS). By using a lens/fibre-optic scope, inserted into the amniotic cavity, the abnormal placental vascular anastomoses are identified and ablated to regulate blood flow to both fetuses. Limited field-of-view, occlusions due to fetus presence and low visibility make it difficult to identify all vascular anastomoses. Automatic computer-assisted techniques may provide better understanding of the anatomical structure during surgery for risk-free laser photocoagulation and may facilitate in improving mosaics from fetoscopic videos. Methods We propose FetNet, a combined convolutional neural network (CNN) and long short-term memory (LSTM) recurrent neural network architecture for the spatio-temporal identification of fetoscopic events. We adapt an existing CNN architecture for spatial feature extraction and integrated it with the LSTM network for end-to-end spatio-temporal inference. We introduce differential learning rates during the model training to effectively utilising the pre-trained CNN weights. This may support computer-assisted interventions (CAI) during fetoscopic laser photocoagulation. Results We perform quantitative evaluation of our method using 7 in vivo fetoscopic videos captured from different human TTTS cases. The total duration of these videos was 5551 s (138,780 frames). To test the robustness of the proposed approach, we perform 7-fold cross-validation where each video is treated as a hold-out or test set and training is performed using the remaining videos. Conclusion FetNet achieved superior performance compared to the existing CNN-based methods and provided improved inference because of the spatio-temporal information modelling. Online testing of FetNet, using a Tesla V100-DGXS-32GB GPU, achieved a frame rate of 114 fps. These results show that our method could potentially provide a real-time solution for CAI and automating occlusion and photocoagulation identification during fetoscopic procedures.

Download Full-text

A data-driven neural network architecture for sentiment analysis

Data Technologies and Applications ◽

10.1108/dta-03-2018-0017 ◽

2019 ◽

Vol 53 (1) ◽

pp. 2-19 ◽

Cited By ~ 1

Author(s):

Erion Çano ◽

Maurizio Morisio

Keyword(s):

Neural Network ◽

Sentiment Analysis ◽

Network Architecture ◽

Network Models ◽

Data Sets ◽

Feature Maps ◽

Neural Network Architecture ◽

Neural Network Models ◽

Content Type ◽

Max Pooling

Purpose The fabulous results of convolution neural networks in image-related tasks attracted attention of text mining, sentiment analysis and other text analysis researchers. It is, however, difficult to find enough data for feeding such networks, optimize their parameters, and make the right design choices when constructing network architectures. The purpose of this paper is to present the creation steps of two big data sets of song emotions. The authors also explore usage of convolution and max-pooling neural layers on song lyrics, product and movie review text data sets. Three variants of a simple and flexible neural network architecture are also compared. Design/methodology/approach The intention was to spot any important patterns that can serve as guidelines for parameter optimization of similar models. The authors also wanted to identify architecture design choices which lead to high performing sentiment analysis models. To this end, the authors conducted a series of experiments with neural architectures of various configurations. Findings The results indicate that parallel convolutions of filter lengths up to 3 are usually enough for capturing relevant text features. Also, max-pooling region size should be adapted to the length of text documents for producing the best feature maps. Originality/value Top results the authors got are obtained with feature maps of lengths 6–18. An improvement on future neural network models for sentiment analysis could be generating sentiment polarity prediction of documents using aggregation of predictions on smaller excerpt of the entire text.

Download Full-text

Driver inattention monitoring system based on multimodal fusion with visual cues to improve driving safety

Transactions of the Institute of Measurement and Control ◽

10.1177/0142331216670451 ◽

2016 ◽

Vol 40 (3) ◽

pp. 885-895 ◽

Cited By ~ 2

Author(s):

Xuanpeng Li ◽

Emmanuel Seignez

Keyword(s):

Real Time ◽

Monitoring System ◽

Visual Cues ◽

Real Life ◽

Multimodal Fusion ◽

Driving Safety ◽

Driver Drowsiness ◽

Driver Monitoring ◽

Driver Inattention ◽

Multi Level

Driver inattention, either driver drowsiness or distraction, is a major contributor to serious traffic crashes. In general, most research on this topic studies driver drowsiness and distraction separately, and is often conducted in a well-controlled, simulated environment. By considering the reliability and flexibility of real-time driver monitoring systems, it is possible to evaluate driver inattention by the fusion of multiple selected cues in real life scenarios. This paper presents a real-time, visual-cue-based driver monitoring system, which can track both multi-level driver drowsiness and distraction simultaneously. A set of visual cues are adopted via analysis of drivers’ physical behaviour and driving performance. Driver drowsiness is evaluated using a multi-level scale, by applying evidence theory. Additionally, a general framework of extensive hierarchical combinations is used to generate a probabilistic evaluation of driving risk in real time. This driver inattention monitoring system with multimodal fusion has been proven to improve the accuracy of risk evaluation and reduce the rate of false alarms, and acceptance of the system is recommended.

Download Full-text

A new approach for three component seismic array processing

Annals of Geophysics ◽

10.4401/ag-4205 ◽

1994 ◽

Vol 37 (3) ◽

Author(s):

O. K. Kedrov ◽

V. E. Permyakova

Keyword(s):

Noise Suppression ◽

Effective Means ◽

Seismic Arrays ◽

Robust Detection ◽

Detection And Identification ◽

Seismic Vibrations ◽

Band Pass ◽

Polarization Filtering ◽

Detection Of Signals ◽

New Procedures

The new concept and methodology of regional seismic arrays (RSA) equipped by three component (3 C) sensors (Z, NS, EH9, are proposed. Such system could be more perfect tool of Earth interior investigations. This aim can be achieved by introduction of polarization filtering of 3 C seismic vibrations as an effective means of noise suppression and robust detection and identification of secondary body phases of the signals. The proposed algorithm is based on: 1) linear phase band pass frequency filtering of N 3 C records in M bands; 2) polarization filtering of all 3 C records in all L directions where array beams are routinely oriented; 3) calculation of L beams in M bands using polarized P, SV and SH traces of individual sensors; 4) detection of signals on the L*M P, SV and SH traces; 5) location of the event. The main new procedures are 2) and 3). Due to these new approaches the procedures 4) and 5) will be improved in comparison with,those routinely used today at RSA's. This work includes the theoretical consideration of proposed method efficiency and preliminary experimental results.

Download Full-text

3D Vehicle Detection Using Multi-Level Fusion From Point Clouds and Images

IEEE Transactions on Intelligent Transportation Systems ◽

10.1109/tits.2021.3137392 ◽

2022 ◽

pp. 1-9

Author(s):

Kun Zhao ◽

Lingfei Ma ◽

Yu Meng ◽

Li Liu ◽

Junbo Wang ◽

...

Keyword(s):

Vehicle Detection ◽

Point Clouds ◽

Multi Level ◽

Level Fusion

Download Full-text

Emergency lane vehicle detection and classification method based on logistic regression and a deep convolutional network

Neural Computing and Applications ◽

10.1007/s00521-021-06468-8 ◽

2021 ◽

Author(s):

Guangming Li ◽

Qingjun Wang ◽

Congrui Zuo

Keyword(s):

Logistic Regression ◽

Vehicle Detection ◽

Classification Method ◽

Convolutional Network ◽

Deep Convolutional Network

Download Full-text

Deep Siamese Metric Learning: A Highly Scalable Approach to Searching Unordered Sets of Trajectories

ACM Transactions on Intelligent Systems and Technology ◽

10.1145/3465057 ◽

2022 ◽

Vol 13 (1) ◽

pp. 1-23

Author(s):

Christoffer Löffler ◽

Luca Reeb ◽

Daniel Dzibela ◽

Robert Marzilger ◽

Nicolas Witt ◽

...

Keyword(s):

Assignment Problem ◽

Network Architecture ◽

Metric Learning ◽

Representation Learning ◽

Trajectory Data ◽

Convolutional Network ◽

Professional Soccer ◽

Gating Mechanism ◽

Previous State ◽

Low Dimensional

This work proposes metric learning for fast similarity-based scene retrieval of unstructured ensembles of trajectory data from large databases. We present a novel representation learning approach using Siamese Metric Learning that approximates a distance preserving low-dimensional representation and that learns to estimate reasonable solutions to the assignment problem. To this end, we employ a Temporal Convolutional Network architecture that we extend with a gating mechanism to enable learning from sparse data, leading to solutions to the assignment problem exhibiting varying degrees of sparsity. Our experimental results on professional soccer tracking data provides insights on learned features and embeddings, as well as on generalization, sensitivity, and network architectural considerations. Our low approximation errors for learned representations and the interactive performance with retrieval times several magnitudes smaller shows that we outperform previous state of the art.

Download Full-text