Pedestrian Attribute Recognition with Graph Convolutional Network in Surveillance Scenarios

Xiangpeng Song; Hongbin Yang; Congcong Zhou

doi:10.3390/fi11110245

Pedestrian Attribute Recognition with Graph Convolutional Network in Surveillance Scenarios

Future Internet ◽

10.3390/fi11110245 ◽

2019 ◽

Vol 11 (11) ◽

pp. 245 ◽

Cited By ~ 2

Author(s):

Xiangpeng Song ◽

Hongbin Yang ◽

Congcong Zhou

Keyword(s):

Neural Network ◽

Spatial Distribution ◽

Correlation Matrix ◽

State Of The Art ◽

Recognition Performance ◽

Image Representation ◽

Normal Operation ◽

Convolutional Network ◽

Attribute Recognition ◽

Novel Model

Pedestrian attribute recognition is to predict a set of attribute labels of the pedestrian from surveillance scenarios, which is a very challenging task for computer vision due to poor image quality, continual appearance variations, as well as diverse spatial distribution of imbalanced attributes. It is desirable to model the label dependencies between different attributes to improve the recognition performance as each pedestrian normally possesses many attributes. In this paper, we treat pedestrian attribute recognition as multi-label classification and propose a novel model based on the graph convolutional network (GCN). The model is mainly divided into two parts, we first use convolutional neural network (CNN) to extract pedestrian feature, which is a normal operation processing image in deep learning, then we transfer attribute labels to word embedding and construct a correlation matrix between labels to help GCN propagate information between nodes. This paper applies the object classifiers learned by GCN to the image representation extracted by CNN to enable the model to have the ability to be end-to-end trainable. Experiments on pedestrian attribute recognition dataset show that the approach obviously outperforms other existing state-of-the-art methods.

Download Full-text

Visual-Semantic Graph Reasoning for Pedestrian Attribute Recognition

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33018634 ◽

2019 ◽

Vol 33 ◽

pp. 8634-8641 ◽

Cited By ~ 4

Author(s):

Qiaozhe Li ◽

Xin Zhao ◽

Ran He ◽

Kaiqi Huang

Keyword(s):

Large Scale ◽

State Of The Art ◽

Relational Learning ◽

Spatial Relations ◽

Semantic Relations ◽

Prediction Problem ◽

Convolutional Network ◽

Semantic Graph ◽

Spatial Graph ◽

Attribute Recognition

Pedestrian attribute recognition in surveillance is a challenging task due to poor image quality, significant appearance variations and diverse spatial distribution of different attributes. This paper treats pedestrian attribute recognition as a sequential attribute prediction problem and proposes a novel visual-semantic graph reasoning framework to address this problem. Our framework contains a spatial graph and a directed semantic graph. By performing reasoning using the Graph Convolutional Network (GCN), one graph captures spatial relations between regions and the other learns potential semantic relations between attributes. An end-to-end architecture is presented to perform mutual embedding between these two graphs to guide the relational learning for each other. We verify the proposed framework on three large scale pedestrian attribute datasets including PETA, RAP, and PA100k. Experiments show superiority of the proposed method over state-of-the-art methods and effectiveness of our joint GCN structures for sequential attribute prediction.

Download Full-text

Deep Fractional Max Pooling Neural Network for COVID-19 Recognition

Frontiers in Public Health ◽

10.3389/fpubh.2021.726144 ◽

2021 ◽

Vol 9 ◽

Author(s):

Shui-Hua Wang ◽

Suresh Chandra Satapathy ◽

Donovan Anderson ◽

Shi-Xin Chen ◽

Yu-Dong Zhang

Keyword(s):

Neural Network ◽

Data Augmentation ◽

State Of The Art ◽

Model Averaging ◽

Community Acquired Pneumonia ◽

Test Set ◽

Max Pooling ◽

L2 Norm ◽

Healthy Control ◽

Novel Model

Aim: Coronavirus disease 2019 (COVID-19) is a form of disease triggered by a new strain of coronavirus. This paper proposes a novel model termed “deep fractional max pooling neural network (DFMPNN)” to diagnose COVID-19 more efficiently.Methods: This 12-layer DFMPNN replaces max pooling (MP) and average pooling (AP) in ordinary neural networks with the help of a novel pooling method called “fractional max-pooling” (FMP). In addition, multiple-way data augmentation (DA) is employed to reduce overfitting. Model averaging (MA) is used to reduce randomness.Results: We ran our algorithm on a four-category dataset that contained COVID-19, community-acquired pneumonia, secondary pulmonary tuberculosis (SPT), and healthy control (HC). The 10 runs on the test set show that the micro-averaged F1 (MAF) score of our DFMPNN is 95.88%.Discussions: This proposed DFMPNN is superior to 10 state-of-the-art models. Besides, FMP outperforms traditional MP, AP, and L2-norm pooling (L2P).

Download Full-text

Graph Contextualized Self-Attention Network for Session-based Recommendation

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/547 ◽

2019 ◽

Cited By ~ 20

Author(s):

Chengfeng Xu ◽

Pengpeng Zhao ◽

Yanchi Liu ◽

Victor S. Sheng ◽

Jiajie Xu ◽

...

Keyword(s):

Neural Network ◽

State Of The Art ◽

Attention Mechanism ◽

The Self ◽

Graph Structure ◽

Convolutional Network ◽

Attention Network ◽

Attention Model ◽

Sequence Modeling ◽

Real World Datasets

Session-based recommendation, which aims to predict the user's immediate next action based on anonymous sessions, is a key task in many online services (e.g., e-commerce, media streaming). Recently, Self-Attention Network (SAN) has achieved significant success in various sequence modeling tasks without using either recurrent or convolutional network. However, SAN lacks local dependencies that exist over adjacent items and limits its capacity for learning contextualized representations of items in sequences. In this paper, we propose a graph contextualized self-attention model (GC-SAN), which utilizes both graph neural network and self-attention mechanism, for session-based recommendation. In GC-SAN, we dynamically construct a graph structure for session sequences and capture rich local dependencies via graph neural network (GNN). Then each session learns long-range dependencies by applying the self-attention mechanism. Finally, each session is represented as a linear combination of the global preference and the current interest of that session. Extensive experiments on two real-world datasets show that GC-SAN outperforms state-of-the-art methods consistently.

Download Full-text

An Input-aware Factorization Machine for Sparse Prediction

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/203 ◽

2019 ◽

Cited By ~ 2

Author(s):

Yantao Yu ◽

Zhen Wang ◽

Bo Yuan

Keyword(s):

Neural Network ◽

Deep Learning ◽

Real World ◽

State Of The Art ◽

Overall Performance ◽

Factorization Machine ◽

The Impact ◽

Novel Model ◽

Individual Input ◽

Better Than

Factorization machines (FMs) are a class of general predictors working effectively with sparse data, which represents features using factorized parameters and weights. However, the accuracy of FMs can be adversely affected by the fixed representation trained for each feature, as the same feature is usually not equally predictive and useful in different instances. In fact, the inaccurate representation of features may even introduce noise and degrade the overall performance. In this work, we improve FMs by explicitly considering the impact of individual input upon the representation of features. We propose a novel model named \textit{Input-aware Factorization Machine} (IFM), which learns a unique input-aware factor for the same feature in different instances via a neural network. Comprehensive experiments on three real-world recommendation datasets are used to demonstrate the effectiveness and mechanism of IFM. Empirical results indicate that IFM is significantly better than the standard FM model and consistently outperforms four state-of-the-art deep learning based methods.

Download Full-text

Latent Opinions Transfer Network for Target-Oriented Opinion Words Extraction

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6469 ◽

2020 ◽

Vol 34 (05) ◽

pp. 9298-9305

Author(s):

Zhen Wu ◽

Fei Zhao ◽

Xin-Yu Dai ◽

Shujian Huang ◽

Jiajun Chen

Keyword(s):

Neural Network ◽

Transfer Process ◽

State Of The Art ◽

Transformation Method ◽

Sentiment Classification ◽

Experimental Results ◽

Online Review ◽

Neural Models ◽

Network Methods ◽

Novel Model

Target-oriented opinion words extraction (TOWE) is a new subtask of ABSA, which aims to extract the corresponding opinion words for a given opinion target in a sentence. Recently, neural network methods have been applied to this task and achieve promising results. However, the difficulty of annotation causes the datasets of TOWE to be insufficient, which heavily limits the performance of neural models. By contrast, abundant review sentiment classification data are easily available at online review sites. These reviews contain substantial latent opinions information and semantic patterns. In this paper, we propose a novel model to transfer these opinions knowledge from resource-rich review sentiment classification datasets to low-resource task TOWE. To address the challenges in the transfer process, we design an effective transformation method to obtain latent opinions, then integrate them into TOWE. Extensive experimental results show that our model achieves better performance compared to other state-of-the-art methods and significantly outperforms the base model without transferring opinions knowledge. Further analysis validates the effectiveness of our model.

Download Full-text

Feedforward Chaotic Neural Network Model for Rotor Rub-Impact Fault Recognition Using Acoustic Emission Method

Journal of Electrical and Computer Engineering ◽

10.1155/2018/9718951 ◽

2018 ◽

Vol 2018 ◽

pp. 1-9

Author(s):

Wei Peng ◽

Weidong Liu ◽

Xinmin Cheng ◽

Liping Shi

Keyword(s):

Neural Network ◽

Acoustic Emission ◽

Recognition Performance ◽

Recognition Rate ◽

Network Models ◽

Normal Operation ◽

Maximum Efficiency ◽

Chaotic Neural Network ◽

Bp Network ◽

Fault Recognition

The rubbing faults caused by dynamic and static components in large rotatory machine are dangerous in manufacture process. This paper applies a feedforward chaotic neural network (FCNN) to recognize acoustic emission (AE) source in rotor rubbing and diagnose the rotor operational condition. This method adds the dynamic chaotic neurons based on logistic mapping into the multilayer perceptron (MLP) model to avoid the network falling into a local minimum, the delayed and feedback structure for maximum efficiency of recognition performance. The AE data was rotor rubbing process sampled from the test rig of rotatory machine, classification by fault degree. The experimental results indicate that the recognition rate is superior to the traditional BP network models. It is an effective method to recognize the rubbing faults for the machine normal operation.

Download Full-text

Multilevel Attention Residual Neural Network for Multimodal Online Social Network Rumor Detection

Frontiers in Physics ◽

10.3389/fphy.2021.711221 ◽

2021 ◽

Vol 9 ◽

Author(s):

Zhuang Wang ◽

Jie Sui

Keyword(s):

Neural Network ◽

Social Network ◽

State Of The Art ◽

Online Social Network ◽

Image Representation ◽

Rapid Rise ◽

Text And Image ◽

Detection Model ◽

Fused Image ◽

Rumor Detection

In recent years, with the rapid rise of social networks, such as Weibo and Twitter, multimodal social network rumors have also spread. Unlike traditional unimodal rumor detection, the main difficulty of multimodal rumor detection is in avoiding the generation of noise information while using the complementarity of different modal features. In this article, we propose a multimodal online social network rumor detection model based on the multilevel attention residual neural network (MARN). First, the features of text and image are extracted by Bert and ResNet-18, respectively, and the cross-attention residual mechanism is used to enhance the representation of images with a text vector. Second, the enhanced image vector and text vector are concatenated and fused by the self-attention residual mechanism. Finally, the fused image–text vectors are classified into two categories. Among them, the attention mechanism can effectively enhance the image representation and further improve the fusion effect between the image and the text, while the residual mechanism retains the unique attributes of each original modal feature while using different modal features. To assess the performance of the MARN model, we conduct experiments on the Weibo dataset, and the results show that the MARN model outperforms the state-of-the-art models in terms of accuracy and F1 value.

Download Full-text

Bidirectional LSTM with saliency-aware 3D-CNN features for human action recognition

Journal of Engineering Research ◽

10.36909/jer.v9i3a.8383 ◽

2021 ◽

Vol 9 (3A) ◽

Author(s):

Sheeraz Arif ◽

◽

Jing Wang ◽

Adnan Ahmed Siddiqui ◽

Rashid Hussain ◽

...

Keyword(s):

Neural Network ◽

Action Recognition ◽

Temporal Dynamics ◽

Recognition Performance ◽

Research Work ◽

Video Stream ◽

Video Frame ◽

Convolutional Network ◽

Bidirectional Lstm

Deep convolutional neural network (DCNN) and recurrent neural network (RNN) have been proved as an imperious research area in multimedia understanding and obtained remarkable action recognition performance. However, videos contain rich motion information with varying dimensions. Existing recurrent based pipelines fail to capture long-term motion dynamics in videos with various motion scales and complex actions performed by multiple actors. Consideration of contextual and salient features is more important than mapping a video frame into a static video representation. This research work provides a novel pipeline by analyzing and processing the video information using a 3D convolution (C3D) network and newly introduced deep bidirectional LSTM. Like popular two-stream convent, we also introduce a two-stream framework with one modification; that is, we replace the optical flow stream by saliency-aware stream to avoid the computational complexity. First, we generate a saliency-aware video stream by applying the saliency-aware method. Secondly, a two-stream 3D-convolutional network (C3D) is utilized with two different types of streams, i.e., RGB stream and saliency-aware video stream, to collect both spatial and semantic temporal features. Next, a deep bidirectional LSTM network is used to learn sequential deep temporal dynamics. Finally, time-series-pooling-layer and softmax-layers classify human activity and behavior. The introduced system can learn long-term temporal dependencies and can predict complex human actions. Experimental results demonstrate the significant improvement in action recognition accuracy on different benchmark datasets.

Download Full-text

Pedestrian attribute recognition using two-branch trainable Gabor wavelets network

PLoS ONE ◽

10.1371/journal.pone.0251667 ◽

2021 ◽

Vol 16 (6) ◽

pp. e0251667

Author(s):

Imran N. Junejo

Keyword(s):

Neural Network ◽

Computer Vision ◽

Building Block ◽

Deep Neural Network ◽

State Of The Art ◽

Gabor Wavelets ◽

Age Group ◽

Clothing Style ◽

Attribute Recognition ◽

Mixed Layers

Keeping an eye on pedestrians as they navigate through a scene, surveillance cameras are everywhere. With this context, our paper addresses the problem of pedestrian attribute recognition (PAR). This problem entails recognizing attributes such as age-group, clothing style, accessories, footwear style etc. This multi-label problem is extremely challenging even for human observers and has rightly garnered attention from the computer vision community. Towards a solution to this problem, in this paper, we adopt trainable Gabor wavelets (TGW) layers and cascade them with a convolution neural network (CNN). Whereas other researchers are using fixed Gabor filters with the CNN, the proposed layers are learnable and adapt to the dataset for a better recognition. We propose a two-branch neural network where mixed layers, a combination of the TGW and convolutional layers, make up the building block of our deep neural network. We test our method on twoo challenging publicly available datasets and compare our results with state of the art.

Download Full-text

Identifying drug–target interactions based on graph convolutional network and deep neural network

Briefings in Bioinformatics ◽

10.1093/bib/bbaa044 ◽

2020 ◽

Cited By ~ 6

Author(s):

Tianyi Zhao ◽

Yang Hu ◽

Linda R Valsdottir ◽

Tianyi Zang ◽

Jiajie Peng

Keyword(s):

Neural Network ◽

Drug Discovery ◽

Drug Target ◽

Deep Neural Network ◽

State Of The Art ◽

Feature Representation ◽

Convolutional Network ◽

Computational Approaches ◽

Large Margin ◽

Multiple Drugs

Abstract Identification of new drug–target interactions (DTIs) is an important but a time-consuming and costly step in drug discovery. In recent years, to mitigate these drawbacks, researchers have sought to identify DTIs using computational approaches. However, most existing methods construct drug networks and target networks separately, and then predict novel DTIs based on known associations between the drugs and targets without accounting for associations between drug–protein pairs (DPPs). To incorporate the associations between DPPs into DTI modeling, we built a DPP network based on multiple drugs and proteins in which DPPs are the nodes and the associations between DPPs are the edges of the network. We then propose a novel learning-based framework, ‘graph convolutional network (GCN)-DTI’, for DTI identification. The model first uses a graph convolutional network to learn the features for each DPP. Second, using the feature representation as an input, it uses a deep neural network to predict the final label. The results of our analysis show that the proposed framework outperforms some state-of-the-art approaches by a large margin.

Download Full-text