Recognition of Taxi Violations Based on Semantic Segmentation of PSPNet and Improved YOLOv3

Scientific Programming ◽

10.1155/2021/4520190 ◽

2021 ◽

Vol 2021 ◽

pp. 1-13

Author(s):

Qiong Yang ◽

Lifeng Yu

Keyword(s):

Image Analysis ◽

Detection Rate ◽

Intelligent Transportation System ◽

Recognition Performance ◽

Semantic Segmentation ◽

Image Features ◽

License Plate ◽

Global Information ◽

Illegal Behavior ◽

Accurate Judgment

Taxi has the characteristics of strong mobility and wide dispersion, which makes it difficult for relevant law enforcement officers to make accurate judgment on their illegal acts quickly and accurately. With the investment of intelligent transportation system, image analysis technology has become a new method to determine the illegal behavior of taxis, but the current image analysis method is still difficult to support the detection of illegal behavior of taxis in the actual complex image scene. To solve this problem, this study proposed a method of taxi violation recognition based on semantic segmentation of PSPNet and improved YOLOv3. (1) Based on YOLOv3, the proposed method introduces spatial pyramid pooling (SPP) for taxi recognition, which can convert vehicle feature images with different resolutions into feature vectors with the same dimension as the full connection layer and solve the problem of repeated extraction of YOLOv3 vehicle image features. (2) This method can recognize two different violations of taxi (blocking license plate and illegal parking) rather than only one. (3) Based on PSPNet semantic segmentation network, a taxi illegal parking detection method is proposed. This method can collect the global information of road condition images and aggregate the image information of different regions, so as to improve the ability to obtain the global information orderly and improve the accuracy of taxi illegal parking detection. The experimental results show that the proposed method has excellent recognition performance for the detection rate of license plate occlusion behavior DR is 85.3%, and the detection rate of taxi illegal parking phenomenon DR is 96.1%.

Download Full-text

Dense cellular segmentation for EM using 2D–3D neural network ensembles

Scientific Reports ◽

10.1038/s41598-021-81590-0 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Matthew D. Guay ◽

Zeyad A. S. Emam ◽

Adam B. Anderson ◽

Maria A. Aronova ◽

Irina D. Pokrovskaya ◽

...

Keyword(s):

Blood Platelets ◽

Semantic Segmentation ◽

3D Models ◽

Image Features ◽

Cell Models ◽

Whole Cells ◽

Large Numbers ◽

The Creation ◽

Segmentation Task ◽

Neural Network Ensembles

AbstractBiologists who use electron microscopy (EM) images to build nanoscale 3D models of whole cells and their organelles have historically been limited to small numbers of cells and cellular features due to constraints in imaging and analysis. This has been a major factor limiting insight into the complex variability of cellular environments. Modern EM can produce gigavoxel image volumes containing large numbers of cells, but accurate manual segmentation of image features is slow and limits the creation of cell models. Segmentation algorithms based on convolutional neural networks can process large volumes quickly, but achieving EM task accuracy goals often challenges current techniques. Here, we define dense cellular segmentation as a multiclass semantic segmentation task for modeling cells and large numbers of their organelles, and give an example in human blood platelets. We present an algorithm using novel hybrid 2D–3D segmentation networks to produce dense cellular segmentations with accuracy levels that outperform baseline methods and approach those of human annotators. To our knowledge, this work represents the first published approach to automating the creation of cell models with this level of structural detail.

Download Full-text

Chinese Image Captioning via Fuzzy Attention-based DenseNet-BiLSTM

ACM Transactions on Multimedia Computing Communications and Applications ◽

10.1145/3422668 ◽

2021 ◽

Vol 17 (1s) ◽

pp. 1-18 ◽

Cited By ~ 1

Author(s):

Huimin Lu ◽

Rui Yang ◽

Zhenrong Deng ◽

Yonglin Zhang ◽

Guangwei Gao ◽

...

Keyword(s):

Feature Extraction ◽

Contextual Information ◽

Image Features ◽

Context Information ◽

Global Information ◽

Image Description ◽

Image Captioning ◽

Image Content ◽

Single Feature ◽

Proposed Model

Chinese image description generation tasks usually have some challenges, such as single-feature extraction, lack of global information, and lack of detailed description of the image content. To address these limitations, we propose a fuzzy attention-based DenseNet-BiLSTM Chinese image captioning method in this article. In the proposed method, we first improve the densely connected network to extract features of the image at different scales and to enhance the model’s ability to capture the weak features. At the same time, a bidirectional LSTM is used as the decoder to enhance the use of context information. The introduction of an improved fuzzy attention mechanism effectively improves the problem of correspondence between image features and contextual information. We conduct experiments on the AI Challenger dataset to evaluate the performance of the model. The results show that compared with other models, our proposed model achieves higher scores in objective quantitative evaluation indicators, including BLEU , BLEU , METEOR, ROUGEl, and CIDEr. The generated description sentence can accurately express the image content.

Download Full-text

Online Biological Literature MRI Image Classification Study

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.694-697.2336 ◽

2013 ◽

Vol 694-697 ◽

pp. 2336-2340

Author(s):

Yun Feng Yang ◽

Feng Xian Tang

Keyword(s):

Recognition Performance ◽

Recognition Rate ◽

Image Features ◽

Feature Identification ◽

Identification Method ◽

Image Library ◽

Text Information ◽

Correct Recognition Rate ◽

Biological Literature ◽

Mri Image

In order to construct a certain standard structure MRI (Magnetic resonance imaging) image library by extracting and collating unstructured literature data information, an identification method of the image and text information fusion is proposed. The method makes use of PHOW (Pyramid Histogram Of Words) to represent image features, combines with the word frequency characteristics of the embedded icon note (text), and then uses posterior multiplication fusion method to complete the classification and identification of the online biological literature MRI image. The experimental results show that this method has better correct recognition rate and better recognition performance than feature identification method only based on PHOW or text. The study can offer use for reference to construct other structured professional database from online literature.

Download Full-text

SEE-LPR: A Semantic Segmentation Based End-to-End System for Unconstrained License Plate Detection and Recognition

MultiMedia Modeling - Lecture Notes in Computer Science ◽

10.1007/978-3-030-37731-1_44 ◽

2019 ◽

pp. 543-554

Author(s):

Dongqi Tang ◽

Hao Kong ◽

Xi Meng ◽

Ruo-Ze Liu ◽

Tong Lu

Keyword(s):

Semantic Segmentation ◽

License Plate ◽

License Plate Detection ◽

End To End ◽

Detection And Recognition

Download Full-text

Memory-Augmented Transformer for Remote Sensing Image Semantic Segmentation

Remote Sensing ◽

10.3390/rs13224518 ◽

2021 ◽

Vol 13 (22) ◽

pp. 4518

Author(s):

Xin Zhao ◽

Jiayi Guo ◽

Yueting Zhang ◽

Yirong Wu

Keyword(s):

Remote Sensing ◽

Feature Extraction ◽

Semantic Segmentation ◽

Local Features ◽

Local Feature ◽

Global Information ◽

Deep Convolutional Neural Networks ◽

Global Representation ◽

Local Feature Extraction ◽

Global And Local

The semantic segmentation of remote sensing images requires distinguishing local regions of different classes and exploiting a uniform global representation of the same-class instances. Such requirements make it necessary for the segmentation methods to extract discriminative local features between different classes and to explore representative features for all instances of a given class. While common deep convolutional neural networks (DCNNs) can effectively focus on local features, they are limited by their receptive field to obtain consistent global information. In this paper, we propose a memory-augmented transformer (MAT) to effectively model both the local and global information. The feature extraction pipeline of the MAT is split into a memory-based global relationship guidance module and a local feature extraction module. The local feature extraction module mainly consists of a transformer, which is used to extract features from the input images. The global relationship guidance module maintains a memory bank for the consistent encoding of the global information. Global guidance is performed by memory interaction. Bidirectional information flow between the global and local branches is conducted by a memory-query module, as well as a memory-update module, respectively. Experiment results on the ISPRS Potsdam and ISPRS Vaihingen datasets demonstrated that our method can perform competitively with state-of-the-art methods.

Download Full-text

Vehicle Reidentification via Multifeature Hypergraph Fusion

International Journal of Digital Multimedia Broadcasting ◽

10.1155/2021/6641633 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Wang Li ◽

Zhang Yong ◽

Yuan Wei ◽

Shi Hongxing

Keyword(s):

Intelligent Transportation System ◽

Image Features ◽

Feature Descriptor ◽

Query Image ◽

Spatiotemporal Correlation ◽

Multiple Feature ◽

Image Library ◽

Hypergraph Learning ◽

Vehicle Reidentification ◽

Critical Problems

Vehicle reidentification refers to the mission of matching vehicles across nonoverlapping cameras, which is one of the critical problems of the intelligent transportation system. Due to the resemblance of the appearance of the vehicles on road, traditional methods could not perform well on vehicles with high similarity. In this paper, we utilize hypergraph representation to integrate image features and tackle the issue of vehicles re-ID via hypergraph learning algorithms. A feature descriptor can only extract features from a single aspect. To merge multiple feature descriptors, an efficient and appropriate representation is particularly necessary, and a hypergraph is naturally suitable for modeling high-order relationships. In addition, the spatiotemporal correlation of traffic status between cameras is the constraint beyond the image, which can greatly improve the re-ID accuracy of different vehicles with similar appearances. The method proposed in this paper uses hypergraph optimization to learn about the similarity between the query image and images in the library. By using the pair and higher-order relationship between query objects and image library, the similarity measurement method is improved compared to direct matching. The experiments conducted on the image library constructed in this paper demonstrates the effectiveness of using multifeature hypergraph fusion and the spatiotemporal correlation model to address issues in vehicle reidentification.

Download Full-text

Evolutionary Tuning of Compound Image Analysis Systems for Effective License Plate Recognition

Computational Collective Intelligence. Technologies and Applications - Lecture Notes in Computer Science ◽

10.1007/978-3-642-23935-9_20 ◽

2011 ◽

pp. 203-212 ◽

Cited By ~ 1

Author(s):

Krzysztof Krawiec ◽

Mateusz Nawrocki

Keyword(s):

Image Analysis ◽

License Plate ◽

License Plate Recognition ◽

Evolutionary Tuning ◽

Compound Image

Download Full-text

Latent 3D Volume for Joint Depth Estimation and Semantic Segmentation from a Single Image

Sensors ◽

10.3390/s20205765 ◽

2020 ◽

Vol 20 (20) ◽

pp. 5765 ◽

Cited By ~ 1

Author(s):

Seiya Ito ◽

Naoshi Kaneko ◽

Kazuhiko Sumi

Keyword(s):

3D Structure ◽

Three Dimensional ◽

Semantic Segmentation ◽

Depth Estimation ◽

Image Features ◽

Feature Representation ◽

Single Image ◽

Feature Vectors ◽

3D Space ◽

3D Volume

This paper proposes a novel 3D representation, namely, a latent 3D volume, for joint depth estimation and semantic segmentation. Most previous studies encoded an input scene (typically given as a 2D image) into a set of feature vectors arranged over a 2D plane. However, considering the real world is three-dimensional, this 2D arrangement reduces one dimension and may limit the capacity of feature representation. In contrast, we examine the idea of arranging the feature vectors in 3D space rather than in a 2D plane. We refer to this 3D volumetric arrangement as a latent 3D volume. We will show that the latent 3D volume is beneficial to the tasks of depth estimation and semantic segmentation because these tasks require an understanding of the 3D structure of the scene. Our network first constructs an initial 3D volume using image features and then generates latent 3D volume by passing the initial 3D volume through several 3D convolutional layers. We apply depth regression and semantic segmentation by projecting the latent 3D volume onto a 2D plane. The evaluation results show that our method outperforms previous approaches on the NYU Depth v2 dataset.

Download Full-text

Image Deblurring Using Multi-Stream Bottom-Top-Bottom Attention Network and Global Information-Based Fusion and Reconstruction Network

Sensors ◽

10.3390/s20133724 ◽

2020 ◽

Vol 20 (13) ◽

pp. 3724

Author(s):

Quan Zhou ◽

Mingyue Ding ◽

Xuming Zhang

Keyword(s):

Deep Learning ◽

Spatial Information ◽

Structural Similarity ◽

Image Deblurring ◽

Image Features ◽

Human Vision ◽

Global Information ◽

Attention Network ◽

Multi Scale ◽

Blind Image Deblurring

Image deblurring has been a challenging ill-posed problem in computer vision. Gaussian blur is a common model for image and signal degradation. The deep learning-based deblurring methods have attracted much attention due to their advantages over the traditional methods relying on hand-designed features. However, the existing deep learning-based deblurring techniques still cannot perform well in restoring the fine details and reconstructing the sharp edges. To address this issue, we have designed an effective end-to-end deep learning-based non-blind image deblurring algorithm. In the proposed method, a multi-stream bottom-top-bottom attention network (MBANet) with the encoder-to-decoder structure is designed to integrate low-level cues and high-level semantic information, which can facilitate extracting image features more effectively and improve the computational efficiency of the network. Moreover, the MBANet adopts a coarse-to-fine multi-scale strategy to process the input images to improve image deblurring performance. Furthermore, the global information-based fusion and reconstruction network is proposed to fuse multi-scale output maps to improve the global spatial information and recurrently refine the output deblurred image. The experiments were done on the public GoPro dataset and the realistic and dynamic scenes (REDS) dataset to evaluate the effectiveness and robustness of the proposed method. The experimental results show that the proposed method generally outperforms some traditional deburring methods and deep learning-based state-of-the-art deblurring methods such as scale-recurrent network (SRN) and denoising prior driven deep neural network (DPDNN) in terms of such quantitative indexes as peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) and human vision.

Download Full-text

TCDNet: Trilateral Change Detection Network for Google Earth Image

Remote Sensing ◽

10.3390/rs12172669 ◽

2020 ◽

Vol 12 (17) ◽

pp. 2669

Author(s):

Junhao Qian ◽

Min Xia ◽

Yonghong Zhang ◽

Jia Liu ◽

Yiqing Xu

Keyword(s):

Deep Learning ◽

Change Detection ◽

Remote Sensing Data ◽

Semantic Segmentation ◽

Google Earth ◽

Detection Methods ◽

Global Information ◽

Supervised Methods ◽

Image Pairs ◽

Main Module

Change detection is a very important technique for remote sensing data analysis. Its mainstream solutions are either supervised or unsupervised. In supervised methods, most of the existing change detection methods using deep learning are related to semantic segmentation. However, these methods only use deep learning models to process the global information of an image but do not carry out specific trainings on changed and unchanged areas. As a result, many details of local changes could not be detected. In this work, a trilateral change detection network is proposed. The proposed network has three branches (a main module and two auxiliary modules, all of them are composed of convolutional neural networks (CNNs)), which focus on the overall information of bitemporal Google Earth image pairs, the changed areas and the unchanged areas, respectively. The proposed method is end-to-end trainable, and each component in the network does not need to be trained separately.

Download Full-text