Complex indoor scene classification based on a new feature descriptor

Author(s):  
C. Toledo Ferraz ◽  
A. Gonzaga ◽  
M. Garcia Manzato
Author(s):  
Yaning Wang ◽  
Weifeng Liu ◽  
Jianning Li ◽  
Zhangming Peng

Author(s):  
Qian Liu ◽  
Feng Yang ◽  
XiaoFen Tang

In view of the issue of the mechanism for enhancing the neighbourhood relationship of blocks of HOG, this paper proposes neighborhood descriptor of oriented gradients (NDOG), an improved feature descriptor based on HOG, for pedestrian detection. To obtain the NDOG feature vector, the algorithm calculates the local weight vector of the HOG feature descriptor, while integrating spatial correlation among blocks, concatenates this weight vector to the tail of the HOG feature descriptor, and uses the gradient norm to normalize this new feature vector. With the proposed NDOG feature vector along with a linear SVM classifier, this paper develops a complete pedestrian detection approach. Experimental results for the INRIA, Caltech-USA, and ETH pedestrian datasets show that the approach achieves a lower miss rate and a higher average precision compared with HOG and other advanced methods for pedestrian detection especially in the case of insufficient training samples.


Author(s):  
Jiajia Luo ◽  
Wei Wang ◽  
Hairong Qi

Multi-view human action recognition has gained a lot of attention in recent years for its superior performance as compared to single view recognition. In this paper, we propose a new framework for the real-time realization of human action recognition in distributed camera networks (DCNs). We first present a new feature descriptor (Mltp-hist) that is tolerant to illumination change, robust in homogeneous region and computationally efficient. Taking advantage of the proposed Mltp-hist, the noninformative 3-D patches generated from the background can be further removed automatically that effectively highlights the foreground patches. Next, a new feature representation method based on sparse coding is presented to generate the histogram representation of local videos to be transmitted to the base station for classification. Due to the sparse representation of extracted features, the approximation error is reduced. Finally, at the base station, a probability model is produced to fuse the information from various views and a class label is assigned accordingly. Compared to the existing algorithms, the proposed framework has three advantages while having less requirements on memory and bandwidth consumption: 1) no preprocessing is required; 2) communication among cameras is unnecessary; and 3) positions and orientations of cameras do not need to be fixed. We further evaluate the proposed framework on the most popular multi-view action dataset IXMAS. Experimental results indicate that our proposed framework repeatedly achieves state-of-the-art results when various numbers of views are tested. In addition, our approach is tolerant to the various combination of views and benefit from introducing more views at the testing stage. Especially, our results are still satisfactory even when large misalignment exists between the training and testing samples.


Sign in / Sign up

Export Citation Format

Share Document