Research on Vocabulary Sizes and Codebook Universality

Abstract and Applied Analysis ◽

10.1155/2014/697245 ◽

2014 ◽

Vol 2014 ◽

pp. 1-7

Author(s):

Wei-Xue Liu ◽

Jian Hou ◽

Hamid Reza Karimi

Keyword(s):

Image Representation ◽

Classification Performance ◽

Image Feature ◽

Vocabulary Size ◽

Image Descriptors ◽

Knn Classifier ◽

Single Dataset ◽

Representation Method ◽

Almost All ◽

Local Image

Codebook is an effective image representation method. By clustering in local image descriptors, a codebook is shown to be a distinctive image feature and widely applied in object classification. In almost all existing works on codebooks, the building of the visual vocabulary follows a basic routine, that is, extracting local image descriptors and clustering with a user-designated number of clusters. The problem with this routine lies in that building a codebook for each single dataset is not efficient. In order to deal with this problem, we investigate the influence of vocabulary sizes on classification performance and vocabulary universality with the kNN classifier. Experimental results indicate that, under the condition that the vocabulary size is large enough, the vocabularies built from different datasets are exchangeable and universal.

Download Full-text

On Building a Universal and Compact Visual Vocabulary

Mathematical Problems in Engineering ◽

10.1155/2013/163976 ◽

2013 ◽

Vol 2013 ◽

pp. 1-8 ◽

Cited By ~ 3

Author(s):

Jian Hou ◽

Wei-Xue Liu ◽

Xu E ◽

Hamid Reza Karimi

Keyword(s):

Image Representation ◽

Classification Performance ◽

Great Success ◽

Bag Of Words ◽

Bag Of Visual Words ◽

Vocabulary Size ◽

Visual Words ◽

Visual Vocabulary ◽

Similarity Threshold ◽

Small Dataset

Bag-of-visual-words has been shown to be a powerful image representation and attained great success in many computer vision and pattern recognition applications. Usually, for a given dataset, researchers choose to build a specific visual vocabulary from the dataset, and the problem of deriving a universal visual vocabulary is rarely addressed. Based on previous work on the classification performance with respect to visual vocabulary sizes, we arrive at a hypothesis that a universal visual vocabulary can be obtained by taking-into account the similarity extent of keypoints represented by one visual word. We then propose to use a similarity threshold-based clustering method to calculate the optimal vocabulary size, where the universal similarity threshold can be obtained empirically. With the optimal vocabulary size, the optimal visual vocabularies of limited sizes from three datasets are shown to be exchangeable and therefore universal. This result indicates that a universal and compact visual vocabulary can be built from a not too small dataset. Our work narrows the gab between bag-of-visual-words and bag-of-words, where a relatively fixed vocabulary can be used with different text datasets.

Download Full-text

Image Representation Method Based on Relative Layer Entropy for Insulator Recognition

Entropy ◽

10.3390/e22040419 ◽

2020 ◽

Vol 22 (4) ◽

pp. 419

Author(s):

Zhenbing Zhao ◽

Hongyu Qi ◽

Xiaoqing Fan ◽

Guozhi Xu ◽

Yincheng Qi ◽

...

Keyword(s):

Image Representation ◽

Recognition Task ◽

Training Sample ◽

Feature Representation ◽

Image Feature ◽

Feature Maps ◽

Deep Convolutional Neural Networks ◽

Object Appearance ◽

Indoor Scenes ◽

Representation Method

Deep convolutional neural networks (DCNNs) with alternating convolutional, pooling and decimation layers are widely used in computer vision, yet current works tend to focus on deeper networks with many layers and neurons, resulting in a high computational complexity. However, the recognition task is still challenging for insufficient and uncomprehensive object appearance and training sample types such as infrared insulators. In view of this, more attention is focused on the application of a pretrained network for image feature representation, but the rules on how to select the feature representation layer are scarce. In this paper, we proposed a new concept, the layer entropy and relative layer entropy, which can be referred to as an image representation method based on relative layer entropy (IRM_RLE). It was designed to excavate the most suitable convolution layer for image recognition. First, the image was fed into an ImageNet pretrained DCNN model, and deep convolutional activations were extracted. Then, the appropriate feature layer was selected by calculating the layer entropy and relative layer entropy of each convolution layer. Finally, the number of the feature map was selected according to the importance degree and the feature maps of the convolution layer, which were vectorized and pooled by VLAD (vector of locally aggregated descriptors) coding and quantifying for final image representation. The experimental results show that the proposed approach performs competitively against previous methods across all datasets. Furthermore, for the indoor scenes and actions datasets, the proposed approach outperforms the state-of-the-art methods.

Download Full-text

Fast fully automatic detection, classification and 3D reconstruction of pulmonary nodules in CT images by local image feature analysis

Biomedical Signal Processing and Control ◽

10.1016/j.bspc.2021.102790 ◽

2021 ◽

Vol 68 ◽

pp. 102790

Author(s):

Chung-Feng Jeffrey Kuo ◽

Jagadish Barman ◽

Chia Wen Hsieh ◽

Hsian-He Hsu

Keyword(s):

3D Reconstruction ◽

Pulmonary Nodules ◽

Automatic Detection ◽

Ct Images ◽

Image Feature ◽

Feature Analysis ◽

Fully Automatic ◽

Local Image

Download Full-text

3D Affine: An Embedding of Local Image Features for Viewpoint Invariance Using RGB-D Sensor Data

Sensors ◽

10.3390/s19020291 ◽

2019 ◽

Vol 19 (2) ◽

pp. 291 ◽

Cited By ~ 1

Author(s):

Hamdi Sahloul ◽

Shouhei Shirafuji ◽

Jun Ota

Keyword(s):

Low Cost ◽

Image Features ◽

Image Feature ◽

Sensor Data ◽

Invariant Representation ◽

Depth Sensors ◽

Viewpoint Invariance ◽

Surface Discontinuities ◽

Local Image ◽

Local Image Features

Local image features are invariant to in-plane rotations and robust to minor viewpoint changes. However, the current detectors and descriptors for local image features fail to accommodate out-of-plane rotations larger than 25°–30°. Invariance to such viewpoint changes is essential for numerous applications, including wide baseline matching, 6D pose estimation, and object reconstruction. In this study, we present a general embedding that wraps a detector/descriptor pair in order to increase viewpoint invariance by exploiting input depth maps. The proposed embedding locates smooth surfaces within the input RGB-D images and projects them into a viewpoint invariant representation, enabling the detection and description of more viewpoint invariant features. Our embedding can be utilized with different combinations of descriptor/detector pairs, according to the desired application. Using synthetic and real-world objects, we evaluated the viewpoint invariance of various detectors and descriptors, for both standalone and embedded approaches. While standalone local image features fail to accommodate average viewpoint changes beyond 33.3°, our proposed embedding boosted the viewpoint invariance to different levels, depending on the scene geometry. Objects with distinct surface discontinuities were on average invariant up to 52.8°, and the overall average for all evaluated datasets was 45.4°. Similarly, out of a total of 140 combinations involving 20 local image features and various objects with distinct surface discontinuities, only a single standalone local image feature exceeded the goal of 60° viewpoint difference in just two combinations, as compared with 19 different local image features succeeding in 73 combinations when wrapped in the proposed embedding. Furthermore, the proposed approach operates robustly in the presence of input depth noise, even that of low-cost commodity depth sensors, and well beyond.

Download Full-text

Differential Spatial Resection - Pose Estimation Using a Single Local Image Feature

Lecture Notes in Computer Science - Computer Vision – ECCV 2008 ◽

10.1007/978-3-540-88693-8_23 ◽

2008 ◽

pp. 312-325 ◽

Cited By ~ 10

Author(s):

Kevin Köser ◽

Reinhard Koch

Keyword(s):

Pose Estimation ◽

Image Feature ◽

Local Image

Download Full-text

Adaptive Bayesian Object Tracking with Histograms of Dense Local Image Descriptors

International Journal of Fuzzy Logic and Intelligent Systems ◽

10.5391/ijfis.2016.16.2.104 ◽

2016 ◽

Vol 16 (2) ◽

pp. 104-110 ◽

Cited By ~ 1

Author(s):

Minyoung Kim

Keyword(s):

Object Tracking ◽

Image Descriptors ◽

Local Image

Download Full-text

Local image feature matching for object recognition

2010 11th International Conference on Control Automation Robotics & Vision ◽

10.1109/icarcv.2010.5707249 ◽

2010 ◽

Cited By ~ 1

Author(s):

Oleg O. Sushkov ◽

Claude Sammut

Keyword(s):

Object Recognition ◽

Feature Matching ◽

Image Feature ◽

Local Image

Download Full-text

Learning local image descriptors using binary decision trees

IEEE Winter Conference on Applications of Computer Vision ◽

10.1109/wacv.2014.6836079 ◽

2014 ◽

Cited By ~ 1

Author(s):

Juha Ylioinas ◽

Juho Kannala ◽

Abdenour Hadid ◽

Matti Pietikainen

Keyword(s):

Decision Trees ◽

Binary Decision ◽

Image Descriptors ◽

Binary Decision Trees ◽

Local Image

Download Full-text

An Explainable Framework for Diagnosis of COVID-19 Pneumonia via Transfer Learning and Discriminant Correlation Analysis

ACM Transactions on Multimedia Computing Communications and Applications ◽

10.1145/3449785 ◽

2021 ◽

Vol 17 (3s) ◽

pp. 1-16

Author(s):

Siyuan Lu ◽

Di Wu ◽

Zheng Zhang ◽

Shui-Hua Wang

Keyword(s):

Correlation Analysis ◽

Feature Fusion ◽

Classification Performance ◽

Image Features ◽

Image Feature ◽

Functional Link ◽

Image Representations ◽

Learning Machine ◽

Lung Window ◽

Fold Cross Validation

The new coronavirus COVID-19 has been spreading all over the world in the last six months, and the death toll is still rising. The accurate diagnosis of COVID-19 is an emergent task as to stop the spreading of the virus. In this paper, we proposed to leverage image feature fusion for the diagnosis of COVID-19 in lung window computed tomography (CT). Initially, ResNet-18 and ResNet-50 were selected as the backbone deep networks to generate corresponding image representations from the CT images. Second, the representative information extracted from the two networks was fused by discriminant correlation analysis to obtain refined image features. Third, three randomized neural networks (RNNs): extreme learning machine, Schmidt neural network and random vector functional-link net, were trained using the refined features, and the predictions of the three RNNs were ensembled to get a more robust classification performance. Experiment results based on five-fold cross validation suggested that our method outperformed state-of-the-art algorithms in the diagnosis of COVID-19.

Download Full-text

Design and Development of EMG Conditioning System and Hand Gesture Recognition Based on Principal Component Analysis Feature Reduction Technique

Advances in Medical Technologies and Clinical Practice - Applications, Challenges, and Advancements in Electromyography Signal Processing ◽

10.4018/978-1-4666-6090-8.ch014 ◽

2014 ◽

pp. 304-320

Author(s):

P. Geethanjali

Keyword(s):

Principal Component Analysis ◽

Principal Component ◽

Digital Signal ◽

Component Analysis ◽

Classification Performance ◽

Feature Reduction ◽

Design And Development ◽

Emg Signal ◽

Knn Classifier ◽

Power Line Interference

This chapter discusses design and development of a surface Electromyogram (EMG) signal detection and conditioning system along with the issues of gratuitous spurious signals such as power line interference, artifacts, etc., which make signals plausible. In order to construe the recognition of hand gestures from EMG signals, Time Domain (TD) and well as Autoregressive (AR) coefficients features are extracted. The extracted features are diminished using the Principal Component Analysis (PCA) to alleviate the burden of the classifier. A four-channel continuous EMG signal conditioning system is developed and EMG signals are acquired from 10 able-bodied subjects to classify the 6 unique movements of hand and wrist. The reduced statistical TD and AR features are used to classify the signal patterns through k Nearest Neighbour (kNN) as well as Neural Network (NN) classifier. Further, EMG signals acquired from a transradial amputee using 8-channel systems for the 6 amenable motions are also classified. Statistical Analysis of Variance (ANOVA) results on classification performance of able-bodied subject divulge that the performance TD-PCA features are more significant than the AR-PCA features. Further, no significant difference in the performance of NN classifier and kNN classifier is construed with TD reduced features. Since the average classification error of kNN classifier with TD features is found to be less, kNN classifier is implemented in off-line using the TMS2407eZdsp digital signal controller to study the actuation of three low-power DC drives in the identification of intended motion with an able-bodied subject.

Download Full-text