scholarly journals Research on Vocabulary Sizes and Codebook Universality

2014 ◽  
Vol 2014 ◽  
pp. 1-7
Author(s):  
Wei-Xue Liu ◽  
Jian Hou ◽  
Hamid Reza Karimi

Codebook is an effective image representation method. By clustering in local image descriptors, a codebook is shown to be a distinctive image feature and widely applied in object classification. In almost all existing works on codebooks, the building of the visual vocabulary follows a basic routine, that is, extracting local image descriptors and clustering with a user-designated number of clusters. The problem with this routine lies in that building a codebook for each single dataset is not efficient. In order to deal with this problem, we investigate the influence of vocabulary sizes on classification performance and vocabulary universality with the kNN classifier. Experimental results indicate that, under the condition that the vocabulary size is large enough, the vocabularies built from different datasets are exchangeable and universal.

2013 ◽  
Vol 2013 ◽  
pp. 1-8 ◽  
Author(s):  
Jian Hou ◽  
Wei-Xue Liu ◽  
Xu E ◽  
Hamid Reza Karimi

Bag-of-visual-words has been shown to be a powerful image representation and attained great success in many computer vision and pattern recognition applications. Usually, for a given dataset, researchers choose to build a specific visual vocabulary from the dataset, and the problem of deriving a universal visual vocabulary is rarely addressed. Based on previous work on the classification performance with respect to visual vocabulary sizes, we arrive at a hypothesis that a universal visual vocabulary can be obtained by taking-into account the similarity extent of keypoints represented by one visual word. We then propose to use a similarity threshold-based clustering method to calculate the optimal vocabulary size, where the universal similarity threshold can be obtained empirically. With the optimal vocabulary size, the optimal visual vocabularies of limited sizes from three datasets are shown to be exchangeable and therefore universal. This result indicates that a universal and compact visual vocabulary can be built from a not too small dataset. Our work narrows the gab between bag-of-visual-words and bag-of-words, where a relatively fixed vocabulary can be used with different text datasets.


Entropy ◽  
2020 ◽  
Vol 22 (4) ◽  
pp. 419
Author(s):  
Zhenbing Zhao ◽  
Hongyu Qi ◽  
Xiaoqing Fan ◽  
Guozhi Xu ◽  
Yincheng Qi ◽  
...  

Deep convolutional neural networks (DCNNs) with alternating convolutional, pooling and decimation layers are widely used in computer vision, yet current works tend to focus on deeper networks with many layers and neurons, resulting in a high computational complexity. However, the recognition task is still challenging for insufficient and uncomprehensive object appearance and training sample types such as infrared insulators. In view of this, more attention is focused on the application of a pretrained network for image feature representation, but the rules on how to select the feature representation layer are scarce. In this paper, we proposed a new concept, the layer entropy and relative layer entropy, which can be referred to as an image representation method based on relative layer entropy (IRM_RLE). It was designed to excavate the most suitable convolution layer for image recognition. First, the image was fed into an ImageNet pretrained DCNN model, and deep convolutional activations were extracted. Then, the appropriate feature layer was selected by calculating the layer entropy and relative layer entropy of each convolution layer. Finally, the number of the feature map was selected according to the importance degree and the feature maps of the convolution layer, which were vectorized and pooled by VLAD (vector of locally aggregated descriptors) coding and quantifying for final image representation. The experimental results show that the proposed approach performs competitively against previous methods across all datasets. Furthermore, for the indoor scenes and actions datasets, the proposed approach outperforms the state-of-the-art methods.


Sensors ◽  
2019 ◽  
Vol 19 (2) ◽  
pp. 291 ◽  
Author(s):  
Hamdi Sahloul ◽  
Shouhei Shirafuji ◽  
Jun Ota

Local image features are invariant to in-plane rotations and robust to minor viewpoint changes. However, the current detectors and descriptors for local image features fail to accommodate out-of-plane rotations larger than 25°–30°. Invariance to such viewpoint changes is essential for numerous applications, including wide baseline matching, 6D pose estimation, and object reconstruction. In this study, we present a general embedding that wraps a detector/descriptor pair in order to increase viewpoint invariance by exploiting input depth maps. The proposed embedding locates smooth surfaces within the input RGB-D images and projects them into a viewpoint invariant representation, enabling the detection and description of more viewpoint invariant features. Our embedding can be utilized with different combinations of descriptor/detector pairs, according to the desired application. Using synthetic and real-world objects, we evaluated the viewpoint invariance of various detectors and descriptors, for both standalone and embedded approaches. While standalone local image features fail to accommodate average viewpoint changes beyond 33.3°, our proposed embedding boosted the viewpoint invariance to different levels, depending on the scene geometry. Objects with distinct surface discontinuities were on average invariant up to 52.8°, and the overall average for all evaluated datasets was 45.4°. Similarly, out of a total of 140 combinations involving 20 local image features and various objects with distinct surface discontinuities, only a single standalone local image feature exceeded the goal of 60° viewpoint difference in just two combinations, as compared with 19 different local image features succeeding in 73 combinations when wrapped in the proposed embedding. Furthermore, the proposed approach operates robustly in the presence of input depth noise, even that of low-cost commodity depth sensors, and well beyond.


Author(s):  
Siyuan Lu ◽  
Di Wu ◽  
Zheng Zhang ◽  
Shui-Hua Wang

The new coronavirus COVID-19 has been spreading all over the world in the last six months, and the death toll is still rising. The accurate diagnosis of COVID-19 is an emergent task as to stop the spreading of the virus. In this paper, we proposed to leverage image feature fusion for the diagnosis of COVID-19 in lung window computed tomography (CT). Initially, ResNet-18 and ResNet-50 were selected as the backbone deep networks to generate corresponding image representations from the CT images. Second, the representative information extracted from the two networks was fused by discriminant correlation analysis to obtain refined image features. Third, three randomized neural networks (RNNs): extreme learning machine, Schmidt neural network and random vector functional-link net, were trained using the refined features, and the predictions of the three RNNs were ensembled to get a more robust classification performance. Experiment results based on five-fold cross validation suggested that our method outperformed state-of-the-art algorithms in the diagnosis of COVID-19.


Author(s):  
P. Geethanjali

This chapter discusses design and development of a surface Electromyogram (EMG) signal detection and conditioning system along with the issues of gratuitous spurious signals such as power line interference, artifacts, etc., which make signals plausible. In order to construe the recognition of hand gestures from EMG signals, Time Domain (TD) and well as Autoregressive (AR) coefficients features are extracted. The extracted features are diminished using the Principal Component Analysis (PCA) to alleviate the burden of the classifier. A four-channel continuous EMG signal conditioning system is developed and EMG signals are acquired from 10 able-bodied subjects to classify the 6 unique movements of hand and wrist. The reduced statistical TD and AR features are used to classify the signal patterns through k Nearest Neighbour (kNN) as well as Neural Network (NN) classifier. Further, EMG signals acquired from a transradial amputee using 8-channel systems for the 6 amenable motions are also classified. Statistical Analysis of Variance (ANOVA) results on classification performance of able-bodied subject divulge that the performance TD-PCA features are more significant than the AR-PCA features. Further, no significant difference in the performance of NN classifier and kNN classifier is construed with TD reduced features. Since the average classification error of kNN classifier with TD features is found to be less, kNN classifier is implemented in off-line using the TMS2407eZdsp digital signal controller to study the actuation of three low-power DC drives in the identification of intended motion with an able-bodied subject.


Sign in / Sign up

Export Citation Format

Share Document