scholarly journals MME-YOLO: Multi-Sensor Multi-Level Enhanced YOLO for Robust Vehicle Detection in Traffic Surveillance

Sensors ◽  
2020 ◽  
Vol 21 (1) ◽  
pp. 27
Author(s):  
Jianxiao Zhu ◽  
Xu Li ◽  
Peng Jin ◽  
Qimin Xu ◽  
Zhengliang Sun ◽  
...  

As an effective means of solving collision problems caused by the limited perspective on board, the cooperative roadside system is gaining popularity. To improve the vehicle detection abilities in such online safety systems, in this paper, we propose a novel multi-sensor multi-level enhanced convolutional network model, called multi-sensor multi-level enhanced convolutional network architecture (MME-YOLO), with consideration of hybrid realistic scene of scales, illumination, and occlusion. MME-YOLO consists of two tightly coupled structures, i.e., the enhanced inference head and the LiDAR-Image composite module. More specifically, the enhanced inference head preliminarily equips the network with stronger inference abilities for redundant visual cues by attention-guided feature selection blocks and anchor-based/anchor-free ensemble head. Furthermore, the LiDAR-Image composite module cascades the multi-level feature maps from the LiDAR subnet to the image subnet, which strengthens the generalization of the detector in complex scenarios. Compared with YOLOv3, the enhanced inference head achieves a 5.83% and 4.88% mAP improvement on visual dataset LVSH and UA-DETRAC, respectively. Integrated with the composite module, the overall architecture gains 91.63% mAP in the collected Road-side Dataset. Experiments show that even under the abnormal lightings and the inconsistent scales at evening rush hours, the proposed MME-YOLO maintains reliable recognition accuracy and robust detection performance.

2021 ◽  
pp. 8-10
Author(s):  
Madhura Prakash M ◽  
Krishnamurthy G. N

Colorectal cancer (CRC) is one of the most common malignancies that can develop from high-risk colon polyps. Colonoscopy is a standard for examination and detection of colorectal polyps.[1] Segmentation and distinction of polyps can play a vital role in treatment (e.g., surgical planning) and predictive decision making. This paper proposes a neural network architecture called EDANet, using attention gates to effectively combine multi-level features to yield accurate polyp segmentation. The Encoder is a fully connected Convolution Neural Network (CNN) and the decoder part is a Cascaded Partial Decoder. Encoder and Decoder sub-networks are connected through a series of nested, dense skip pathways. The skip pathways aim at reducing the semantic gap between the feature maps of the Encoder and Decoder sub-networks. The proposed system trains the model on several epochs and it unies the previous epoch mask with the feature map of the current training epoch. The previous epoch mask is then used to provide a hard attention to the learnt feature maps at different convolutional layers. Experimental results demonstrate that the model trained and tested on the Kvasir-SEG dataset achieves a dice coefcient of 0.7874, mean Intersection over Union (mIoU) of 0.7010, recall of 0.7987, and a precision of 0.8577.


Author(s):  
Zhenzhen Yang ◽  
Pengfei Xu ◽  
Yongpeng Yang ◽  
Bing-Kun Bao

The U-Net has become the most popular structure in medical image segmentation in recent years. Although its performance for medical image segmentation is outstanding, a large number of experiments demonstrate that the classical U-Net network architecture seems to be insufficient when the size of segmentation targets changes and the imbalance happens between target and background in different forms of segmentation. To improve the U-Net network architecture, we develop a new architecture named densely connected U-Net (DenseUNet) network in this article. The proposed DenseUNet network adopts a dense block to improve the feature extraction capability and employs a multi-feature fuse block fusing feature maps of different levels to increase the accuracy of feature extraction. In addition, in view of the advantages of the cross entropy and the dice loss functions, a new loss function for the DenseUNet network is proposed to deal with the imbalance between target and background. Finally, we test the proposed DenseUNet network and compared it with the multi-resolutional U-Net (MultiResUNet) and the classic U-Net networks on three different datasets. The experimental results show that the DenseUNet network has significantly performances compared with the MultiResUNet and the classic U-Net networks.


2020 ◽  
Vol 34 (07) ◽  
pp. 11693-11700 ◽  
Author(s):  
Ao Luo ◽  
Fan Yang ◽  
Xin Li ◽  
Dong Nie ◽  
Zhicheng Jiao ◽  
...  

Crowd counting is an important yet challenging task due to the large scale and density variation. Recent investigations have shown that distilling rich relations among multi-scale features and exploiting useful information from the auxiliary task, i.e., localization, are vital for this task. Nevertheless, how to comprehensively leverage these relations within a unified network architecture is still a challenging problem. In this paper, we present a novel network structure called Hybrid Graph Neural Network (HyGnn) which targets to relieve the problem by interweaving the multi-scale features for crowd density as well as its auxiliary task (localization) together and performing joint reasoning over a graph. Specifically, HyGnn integrates a hybrid graph to jointly represent the task-specific feature maps of different scales as nodes, and two types of relations as edges: (i) multi-scale relations capturing the feature dependencies across scales and (ii) mutual beneficial relations building bridges for the cooperation between counting and localization. Thus, through message passing, HyGnn can capture and distill richer relations between nodes to obtain more powerful representations, providing robust and accurate results. Our HyGnn performs significantly well on four challenging datasets: ShanghaiTech Part A, ShanghaiTech Part B, UCF_CC_50 and UCF_QNRF, outperforming the state-of-the-art algorithms by a large margin.


Author(s):  
Sophia Bano ◽  
Francisco Vasconcelos ◽  
Emmanuel Vander Poorten ◽  
Tom Vercauteren ◽  
Sebastien Ourselin ◽  
...  

Abstract Purpose Fetoscopic laser photocoagulation is a minimally invasive surgery for the treatment of twin-to-twin transfusion syndrome (TTTS). By using a lens/fibre-optic scope, inserted into the amniotic cavity, the abnormal placental vascular anastomoses are identified and ablated to regulate blood flow to both fetuses. Limited field-of-view, occlusions due to fetus presence and low visibility make it difficult to identify all vascular anastomoses. Automatic computer-assisted techniques may provide better understanding of the anatomical structure during surgery for risk-free laser photocoagulation and may facilitate in improving mosaics from fetoscopic videos. Methods We propose FetNet, a combined convolutional neural network (CNN) and long short-term memory (LSTM) recurrent neural network architecture for the spatio-temporal identification of fetoscopic events. We adapt an existing CNN architecture for spatial feature extraction and integrated it with the LSTM network for end-to-end spatio-temporal inference. We introduce differential learning rates during the model training to effectively utilising the pre-trained CNN weights. This may support computer-assisted interventions (CAI) during fetoscopic laser photocoagulation. Results We perform quantitative evaluation of our method using 7 in vivo fetoscopic videos captured from different human TTTS cases. The total duration of these videos was 5551 s (138,780 frames). To test the robustness of the proposed approach, we perform 7-fold cross-validation where each video is treated as a hold-out or test set and training is performed using the remaining videos. Conclusion FetNet achieved superior performance compared to the existing CNN-based methods and provided improved inference because of the spatio-temporal information modelling. Online testing of FetNet, using a Tesla V100-DGXS-32GB GPU, achieved a frame rate of 114 fps. These results show that our method could potentially provide a real-time solution for CAI and automating occlusion and photocoagulation identification during fetoscopic procedures.


2019 ◽  
Vol 53 (1) ◽  
pp. 2-19 ◽  
Author(s):  
Erion Çano ◽  
Maurizio Morisio

Purpose The fabulous results of convolution neural networks in image-related tasks attracted attention of text mining, sentiment analysis and other text analysis researchers. It is, however, difficult to find enough data for feeding such networks, optimize their parameters, and make the right design choices when constructing network architectures. The purpose of this paper is to present the creation steps of two big data sets of song emotions. The authors also explore usage of convolution and max-pooling neural layers on song lyrics, product and movie review text data sets. Three variants of a simple and flexible neural network architecture are also compared. Design/methodology/approach The intention was to spot any important patterns that can serve as guidelines for parameter optimization of similar models. The authors also wanted to identify architecture design choices which lead to high performing sentiment analysis models. To this end, the authors conducted a series of experiments with neural architectures of various configurations. Findings The results indicate that parallel convolutions of filter lengths up to 3 are usually enough for capturing relevant text features. Also, max-pooling region size should be adapted to the length of text documents for producing the best feature maps. Originality/value Top results the authors got are obtained with feature maps of lengths 6–18. An improvement on future neural network models for sentiment analysis could be generating sentiment polarity prediction of documents using aggregation of predictions on smaller excerpt of the entire text.


2016 ◽  
Vol 40 (3) ◽  
pp. 885-895 ◽  
Author(s):  
Xuanpeng Li ◽  
Emmanuel Seignez

Driver inattention, either driver drowsiness or distraction, is a major contributor to serious traffic crashes. In general, most research on this topic studies driver drowsiness and distraction separately, and is often conducted in a well-controlled, simulated environment. By considering the reliability and flexibility of real-time driver monitoring systems, it is possible to evaluate driver inattention by the fusion of multiple selected cues in real life scenarios. This paper presents a real-time, visual-cue-based driver monitoring system, which can track both multi-level driver drowsiness and distraction simultaneously. A set of visual cues are adopted via analysis of drivers’ physical behaviour and driving performance. Driver drowsiness is evaluated using a multi-level scale, by applying evidence theory. Additionally, a general framework of extensive hierarchical combinations is used to generate a probabilistic evaluation of driving risk in real time. This driver inattention monitoring system with multimodal fusion has been proven to improve the accuracy of risk evaluation and reduce the rate of false alarms, and acceptance of the system is recommended.


1994 ◽  
Vol 37 (3) ◽  
Author(s):  
O. K. Kedrov ◽  
V. E. Permyakova

The new concept and methodology of regional seismic arrays (RSA) equipped by three component (3 C) sensors (Z, NS, EH9, are proposed. Such system could be more perfect tool of Earth interior investigations. This aim can be achieved by introduction of polarization filtering of 3 C seismic vibrations as an effective means of noise suppression and robust detection and identification of secondary body phases of the signals. The proposed algorithm is based on: 1) linear phase band pass frequency filtering of N 3 C records in M bands; 2) polarization filtering of all 3 C records in all L directions where array beams are routinely oriented; 3) calculation of L beams in M bands using polarized P, SV and SH traces of individual sensors; 4) detection of signals on the L*M P, SV and SH traces; 5) location of the event. The main new procedures are 2) and 3). Due to these new approaches the procedures 4) and 5) will be improved in comparison with,those routinely used today at RSA's. This work includes the theoretical consideration of proposed method efficiency and preliminary experimental results.


Author(s):  
Kun Zhao ◽  
Lingfei Ma ◽  
Yu Meng ◽  
Li Liu ◽  
Junbo Wang ◽  
...  

2022 ◽  
Vol 13 (1) ◽  
pp. 1-23
Author(s):  
Christoffer Löffler ◽  
Luca Reeb ◽  
Daniel Dzibela ◽  
Robert Marzilger ◽  
Nicolas Witt ◽  
...  

This work proposes metric learning for fast similarity-based scene retrieval of unstructured ensembles of trajectory data from large databases. We present a novel representation learning approach using Siamese Metric Learning that approximates a distance preserving low-dimensional representation and that learns to estimate reasonable solutions to the assignment problem. To this end, we employ a Temporal Convolutional Network architecture that we extend with a gating mechanism to enable learning from sparse data, leading to solutions to the assignment problem exhibiting varying degrees of sparsity. Our experimental results on professional soccer tracking data provides insights on learned features and embeddings, as well as on generalization, sensitivity, and network architectural considerations. Our low approximation errors for learned representations and the interactive performance with retrieval times several magnitudes smaller shows that we outperform previous state of the art.


Sign in / Sign up

Export Citation Format

Share Document