Multi-view Clustering of Visual Words Using Canonical Correlation Analysis for Human Action Recognition

This thesis presents three frameworks of human action recognition to facilitate better recognition performance. The first framework fuses handcrafted features from four different modalities including RGB, depth, skeleton, and accelerometer data. In addition, a new descriptor for skeleton data is proposed that provides a discriminative representation for the poses of an action. Since the goal of the first framework is to find a more discriminative subspace, a generalized fusion technique Multimodal Hybrid Centroid Canonical Correlation Analysis (MHCCCA) is proposed for two or more sets of features or modalities. The second framework fuses handcrafted and deep learning features from three modalities including RGB, depth, and skeleton. In this framework a new depth representation is introduced that extracts the final representation using Deep ConvNet. The proposed fusion technique forms the backbone of the framework: Multiset Globality Locality Preserving Canonical Correlation Analysis (MGLPCCA) for two or more sets of features or modalities. MGLPCCA aims to preserve the local and global structures of data while maximizing the correlation among different modalities or sets. The third framework uses the deep learning techniques to improve the long term temporal modelling through two proposed techniques: Temporal Relational Network (TRN) and Temporal Second Order Pooling Based Network (T-SOPN). Additionally, Global-Local Network (GLN) and Fuse-Inception Network (FIN) are proposed to encourage the network to learn complementary information about the action and scene itself. Qualitative and quantitative experiments are conducted on nine different datasets demonstrating the effectiveness of the proposed framework over state-of-the-art methods.

Download Full-text

Multimodal Learning for Human Action Recognition Via Bimodal/Multimodal Hybrid Centroid Canonical Correlation Analysis

IEEE Transactions on Multimedia ◽

10.1109/tmm.2018.2875510 ◽

2019 ◽

Vol 21 (5) ◽

pp. 1317-1331 ◽

Cited By ~ 6

Author(s):

Nour El Din Elmadany ◽

Yifeng He ◽

Ling Guan

Keyword(s):

Correlation Analysis ◽

Action Recognition ◽

Canonical Correlation Analysis ◽

Canonical Correlation ◽

Human Action Recognition ◽

Human Action ◽

Multimodal Learning

Download Full-text

Multimodal Information Fusion for Human Action Recognition

10.32920/ryerson.14648964 ◽

2021 ◽

Author(s):

Nour Elmadany

Keyword(s):

Deep Learning ◽

Correlation Analysis ◽

Action Recognition ◽

Canonical Correlation Analysis ◽

Canonical Correlation ◽

Human Action Recognition ◽

Human Action ◽

Local Network ◽

Accelerometer Data ◽

Fusion Technique

This thesis presents three frameworks of human action recognition to facilitate better recognition performance. The first framework fuses handcrafted features from four different modalities including RGB, depth, skeleton, and accelerometer data. In addition, a new descriptor for skeleton data is proposed that provides a discriminative representation for the poses of an action. Since the goal of the first framework is to find a more discriminative subspace, a generalized fusion technique Multimodal Hybrid Centroid Canonical Correlation Analysis (MHCCCA) is proposed for two or more sets of features or modalities. The second framework fuses handcrafted and deep learning features from three modalities including RGB, depth, and skeleton. In this framework a new depth representation is introduced that extracts the final representation using Deep ConvNet. The proposed fusion technique forms the backbone of the framework: Multiset Globality Locality Preserving Canonical Correlation Analysis (MGLPCCA) for two or more sets of features or modalities. MGLPCCA aims to preserve the local and global structures of data while maximizing the correlation among different modalities or sets. The third framework uses the deep learning techniques to improve the long term temporal modelling through two proposed techniques: Temporal Relational Network (TRN) and Temporal Second Order Pooling Based Network (T-SOPN). Additionally, Global-Local Network (GLN) and Fuse-Inception Network (FIN) are proposed to encourage the network to learn complementary information about the action and scene itself. Qualitative and quantitative experiments are conducted on nine different datasets demonstrating the effectiveness of the proposed framework over state-of-the-art methods.

Download Full-text

Human Action Recognition Using Hybrid Centroid Canonical Correlation Analysis

2015 IEEE International Symposium on Multimedia (ISM) ◽

10.1109/ism.2015.118 ◽

2015 ◽

Cited By ~ 4

Author(s):

Nour El Din El Madany ◽

Yifeng He ◽

Ling Guan

Keyword(s):

Correlation Analysis ◽

Action Recognition ◽

Canonical Correlation Analysis ◽

Canonical Correlation ◽

Human Action Recognition ◽

Human Action

Download Full-text

Human action recognition by fusing deep features with Globality Locality Preserving Canonical Correlation Analysis

2017 IEEE International Conference on Image Processing (ICIP) ◽

10.1109/icip.2017.8296807 ◽

2017 ◽

Cited By ~ 1

Author(s):

Nour El Din El Madany ◽

Yifeng He ◽

Ling Guan

Keyword(s):

Correlation Analysis ◽

Action Recognition ◽

Canonical Correlation Analysis ◽

Canonical Correlation ◽

Human Action Recognition ◽

Human Action ◽

Locality Preserving

Download Full-text

Human Action Recognition using Spatial-Temporal Analysis and Bag of Visual Words

2018 14th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS) ◽

10.1109/sitis.2018.00112 ◽

2018 ◽

Cited By ~ 1

Author(s):

Denver Naidoo ◽

Jules-Raymond Tapamo ◽

Tom Walingo

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Temporal Analysis ◽

Human Action ◽

Bag Of Visual Words ◽

Visual Words ◽

Spatial Temporal Analysis

Download Full-text

A new bag of visual words encoding method for human action recognition

2018 24th International Conference on Pattern Recognition (ICPR) ◽

10.1109/icpr.2018.8545886 ◽

2018 ◽

Cited By ~ 2

Author(s):

Xavier Cortes ◽

Donatello Conte ◽

Hubert Cardot

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Bag Of Visual Words ◽

Visual Words ◽

Encoding Method

Download Full-text

Inter and Intra Class Correlation Analysis (IICCA) for Human Action Recognition in Realistic Scenarios

8th International Conference of Pattern Recognition Systems (ICPRS 2017) ◽

10.1049/cp.2017.0149 ◽

2017 ◽

Cited By ~ 2

Author(s):

S. Nazir ◽

M.H. Yousaf ◽

S.A. Velastin

Keyword(s):

Correlation Analysis ◽

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Intra Class Correlation

Download Full-text

Multi-Channel MKL for Video Human Action Recognition

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.599-601.1571 ◽

2014 ◽

Vol 599-601 ◽

pp. 1571-1574

Author(s):

Jia Ding ◽

Yang Yi ◽

Ze Min Qiu ◽

Jun Shi Liu

Keyword(s):

Action Recognition ◽

Recognition Rate ◽

Multiple Kernel Learning ◽

Human Action Recognition ◽

Human Action ◽

Image Understanding ◽

Kernel Learning ◽

Bag Of Visual Words ◽

Visual Words ◽

Multiple Kernel

Human action recognition in videos plays an important role in the field of computer vision and image understanding. A novel method of multi-channel bag of visual words and multiple kernel learning is proposed in this paper. The videos are described by multi-channel bag of visual words, and a multiple kernel learning classifier is used for action classification, in which each kernel function of the classifier corresponds to a video channel in order to avoid the noise interference from other channels. The proposed approach improves the ability in distinguishing easily confused actions. Experiments on KTH show that the presented method achieves remarkable performance on the average recognition rate, and obtains comparable recognition rate with state-of-the-art methods.

Download Full-text