Inter and Intra Class Correlation Analysis (IICCA) for Human Action Recognition in Realistic Scenarios

This thesis presents three frameworks of human action recognition to facilitate better recognition performance. The first framework fuses handcrafted features from four different modalities including RGB, depth, skeleton, and accelerometer data. In addition, a new descriptor for skeleton data is proposed that provides a discriminative representation for the poses of an action. Since the goal of the first framework is to find a more discriminative subspace, a generalized fusion technique Multimodal Hybrid Centroid Canonical Correlation Analysis (MHCCCA) is proposed for two or more sets of features or modalities. The second framework fuses handcrafted and deep learning features from three modalities including RGB, depth, and skeleton. In this framework a new depth representation is introduced that extracts the final representation using Deep ConvNet. The proposed fusion technique forms the backbone of the framework: Multiset Globality Locality Preserving Canonical Correlation Analysis (MGLPCCA) for two or more sets of features or modalities. MGLPCCA aims to preserve the local and global structures of data while maximizing the correlation among different modalities or sets. The third framework uses the deep learning techniques to improve the long term temporal modelling through two proposed techniques: Temporal Relational Network (TRN) and Temporal Second Order Pooling Based Network (T-SOPN). Additionally, Global-Local Network (GLN) and Fuse-Inception Network (FIN) are proposed to encourage the network to learn complementary information about the action and scene itself. Qualitative and quantitative experiments are conducted on nine different datasets demonstrating the effectiveness of the proposed framework over state-of-the-art methods.

Download Full-text

Multimodal Learning for Human Action Recognition Via Bimodal/Multimodal Hybrid Centroid Canonical Correlation Analysis

IEEE Transactions on Multimedia ◽

10.1109/tmm.2018.2875510 ◽

2019 ◽

Vol 21 (5) ◽

pp. 1317-1331 ◽

Cited By ~ 6

Author(s):

Nour El Din Elmadany ◽

Yifeng He ◽

Ling Guan

Keyword(s):

Correlation Analysis ◽

Action Recognition ◽

Canonical Correlation Analysis ◽

Canonical Correlation ◽

Human Action Recognition ◽

Human Action ◽

Multimodal Learning

Download Full-text

Multimodal Information Fusion for Human Action Recognition

10.32920/ryerson.14648964 ◽

2021 ◽

Author(s):

Nour Elmadany

Keyword(s):

Deep Learning ◽

Correlation Analysis ◽

Action Recognition ◽

Canonical Correlation Analysis ◽

Canonical Correlation ◽

Human Action Recognition ◽

Human Action ◽

Local Network ◽

Accelerometer Data ◽

Fusion Technique

This thesis presents three frameworks of human action recognition to facilitate better recognition performance. The first framework fuses handcrafted features from four different modalities including RGB, depth, skeleton, and accelerometer data. In addition, a new descriptor for skeleton data is proposed that provides a discriminative representation for the poses of an action. Since the goal of the first framework is to find a more discriminative subspace, a generalized fusion technique Multimodal Hybrid Centroid Canonical Correlation Analysis (MHCCCA) is proposed for two or more sets of features or modalities. The second framework fuses handcrafted and deep learning features from three modalities including RGB, depth, and skeleton. In this framework a new depth representation is introduced that extracts the final representation using Deep ConvNet. The proposed fusion technique forms the backbone of the framework: Multiset Globality Locality Preserving Canonical Correlation Analysis (MGLPCCA) for two or more sets of features or modalities. MGLPCCA aims to preserve the local and global structures of data while maximizing the correlation among different modalities or sets. The third framework uses the deep learning techniques to improve the long term temporal modelling through two proposed techniques: Temporal Relational Network (TRN) and Temporal Second Order Pooling Based Network (T-SOPN). Additionally, Global-Local Network (GLN) and Fuse-Inception Network (FIN) are proposed to encourage the network to learn complementary information about the action and scene itself. Qualitative and quantitative experiments are conducted on nine different datasets demonstrating the effectiveness of the proposed framework over state-of-the-art methods.

Download Full-text

Human Action Recognition Using Hybrid Centroid Canonical Correlation Analysis

2015 IEEE International Symposium on Multimedia (ISM) ◽

10.1109/ism.2015.118 ◽

2015 ◽

Cited By ~ 4

Author(s):

Nour El Din El Madany ◽

Yifeng He ◽

Ling Guan

Keyword(s):

Correlation Analysis ◽

Action Recognition ◽

Canonical Correlation Analysis ◽

Canonical Correlation ◽

Human Action Recognition ◽

Human Action

Download Full-text

Human action recognition by fusing deep features with Globality Locality Preserving Canonical Correlation Analysis

2017 IEEE International Conference on Image Processing (ICIP) ◽

10.1109/icip.2017.8296807 ◽

2017 ◽

Cited By ~ 1

Author(s):

Nour El Din El Madany ◽

Yifeng He ◽

Ling Guan

Keyword(s):

Correlation Analysis ◽

Action Recognition ◽

Canonical Correlation Analysis ◽

Canonical Correlation ◽

Human Action Recognition ◽

Human Action ◽

Locality Preserving

Download Full-text

Human action recognition using simple geometric features and a finite state machine

Image Processing & Communications ◽

10.2478/v10248-012-0079-y ◽

2013 ◽

Vol 18 (2-3) ◽

pp. 49-60 ◽

Cited By ~ 2

Author(s):

Damian Dudzńiski ◽

Tomasz Kryjak ◽

Zbigniew Mikrut

Keyword(s):

Action Recognition ◽

Finite State Machine ◽

Recognition Rate ◽

Human Action Recognition ◽

Human Action ◽

Video Stream ◽

State Machine ◽

Recognition Algorithm ◽

Finite State ◽

Correct Recognition Rate

Abstract In this paper a human action recognition algorithm, which uses background generation with shadow elimination, silhouette description based on simple geometrical features and a finite state machine for recognizing particular actions is described. The performed tests indicate that this approach obtains a 81 % correct recognition rate allowing real-time image processing of a 360 X 288 video stream.

Download Full-text