scholarly journals A New Approach for Large-Scale Scene Image Retrieval Based on Improved Parallelk-Means Algorithm in MapReduce Environment

2016 ◽  
Vol 2016 ◽  
pp. 1-17 ◽  
Author(s):  
Jianfang Cao ◽  
Min Wang ◽  
Hao Shi ◽  
Guohua Hu ◽  
Yun Tian

The rapid growth of digital images has caused the traditional image retrieval technology to be faced with new challenge. In this paper we introduce a new approach for large-scale scene image retrieval to solve the problems of massive image processing using traditional image retrieval methods. First, we improved traditionalk-Means clustering algorithm, which optimized the selection of the initial cluster centers and iteration procedure. Second, we presented a parallel design and realization method for improvedk-Means algorithm applied it to feature clustering of scene images. Finally, a storage and retrieval scheme for large-scale scene images was put forward using the large storage capacity and powerful parallel computing ability of the Hadoop distributed platform. The experimental results demonstrated that the proposed method achieved good performance. Compared with the traditional algorithms with single node architecture and parallelk-Means algorithm, the proposed method has obvious advantages for use in large-scale scene image data retrieval in terms of retrieval accuracy, retrieval time overhead, and computational performance (speedup and efficiency, sizeup, and scaleup), which is a significant improvement from applying parallel processing to intelligent algorithms with large-scale datasets.

Author(s):  
Jane You ◽  
Qin Li ◽  
Jinghua Wang

This paper presents a new approach to content-based image retrieval by using dynamic indexing and guided search in a hierarchical structure, and extending data mining and data warehousing techniques. The proposed algorithms include a wavelet-based scheme for multiple image feature extraction, the extension of a conventional data warehouse and an image database to an image data warehouse for dynamic image indexing. It also provides an image data schema for hierarchical image representation and dynamic image indexing, a statistically based feature selection scheme to achieve flexible similarity measures, and a feature component code to facilitate query processing and guide the search for the best matching. A series of case studies are reported, which include a wavelet-based image color hierarchy, classification of satellite images, tropical cyclone pattern recognition, and personal identification using multi-level palmprint and face features. Experimental results confirm that the new approach is feasible for content-based image retrieval.


2019 ◽  
Vol 5 (3) ◽  
pp. 33 ◽  
Author(s):  
Pouria Sadeghi-Tehran ◽  
Plamen Angelov ◽  
Nicolas Virlet ◽  
Malcolm Hawkesford

Digitalisation has opened a wealth of new data opportunities by revolutionizing how images are captured. Although the cost of data generation is no longer a major concern, the data management and processing have become a bottleneck. Any successful visual trait system requires automated data structuring and a data retrieval model to manage, search, and retrieve unstructured and complex image data. This paper investigates a highly scalable and computationally efficient image retrieval system for real-time content-based searching through large-scale image repositories in the domain of remote sensing and plant biology. Images are processed independently without considering any relevant context between sub-sets of images. We utilize a deep Convolutional Neural Network (CNN) model as a feature extractor to derive deep feature representations from the imaging data. In addition, we propose an effective scheme to optimize data structure that can facilitate faster querying at search time based on the hierarchically nested structure and recursive similarity measurements. A thorough series of tests were carried out for plant identification and high-resolution remote sensing data to evaluate the accuracy and the computational efficiency of the proposed approach against other content-based image retrieval (CBIR) techniques, such as the bag of visual words (BOVW) and multiple feature fusion techniques. The results demonstrate that the proposed scheme is effective and considerably faster than conventional indexing structures.


2021 ◽  
Vol 12 (1) ◽  
pp. 77-94
Author(s):  
Yanxia Jin ◽  
Xin Zhang ◽  
Yao Jia

In image retrieval, the major challenge is that the number of images in the gallery is large and irregular, which results in low retrieval accuracy. This paper analyzes the disadvantages of the PAM (partitioning around medoid) clustering algorithm in image data classification and the excessive consumption of time in the computation process of searching clustering representative objects using the PAM clustering algorithm. Fireworks particle swarm algorithm is utilized in the optimization process. PF-PAM algorithm, which is an improved PAM algorithm, is proposed and applied in image retrieval. First, extract the feature vector of the image in the gallery for the first clustering. Next, according to the clustering results, the most optimal cluster center is searched through the firework particle swarm algorithm to obtain the final clustering result. Finally, according to the incoming query image, determine the related image category and get similar images. The experimental comparison with other approaches shows that this method can effectively improve retrieval accuracy.


2018 ◽  
Vol 8 (10) ◽  
pp. 1869 ◽  
Author(s):  
Saman Riaz ◽  
Ali Arshad ◽  
Licheng Jiao

Deep learning has been well-known for a couple of years, and it indicates incredible possibilities for unsupervised learning of representations with the clustering algorithm. The forms of Convolution Neural Networks (CNN) are now state-of-the-art for many recognition and clustering tasks. However, with the perpetual incrementation of digital images, there exist more and more redundant, irrelevant, and noisy samples which cause CNN running to gradually decrease, and its clustering accuracy decreases concurrently. To conquer these issues, we proposed an effective clustering method for a large-scale image dataset which combines CNN and a Fuzzy-Rough C-Mean (FRCM) clustering algorithm. The main idea is that first a high-level representation, learned by multi-layers of CNN with one clustering layer, produce the initial cluster center, then during training image clusters, and representations, are updating jointly. FRCM is utilized to update the cluster centers in the forward pass, while the parameters of proposed CNN are updated by the backward pass based on Stochastic Gradient Descent (SGD). The concept of the rough set of lower and boundary approximations deal with uncertainty, vagueness, and incompleteness in cluster definition, and fuzzy sets enable efficient handling of overlapping partitions in the noisy environment. The experiment results show that the proposed FRCM based unsupervised CNN clustering method is better than the standard K-Mean, Fuzzy C-Mean, FRCM and also other deep-learning-based clustering algorithms on large-scale image data.


2013 ◽  
Vol 5 (3) ◽  
pp. 604-613
Author(s):  
Asmita Bhaskar Shirsath ◽  
M. J. Chouhan ◽  
N. J Uke

Research on content-based image retrieval has gained tremendous momentum during the last decade. Color, texture and shape information have been the primitive image descriptors in content based image retrieval systems. In order to get faster  retrieval result from large-scale image database ,we proposed image retrieval system in which image database is first pre-processed by Wavelet Based Color Histogram (WBCH) and K-means algorithm and then using Hierarchical clustering algorithm we index the previous result and then by using similarity measures we retrieve the images from pre-processed database. Experiments show that this proposed method offers substantial increase in retrieval speed but needs to be improved on retrieval results.


Author(s):  
Jane You ◽  
Qin Li ◽  
Jinghua Wang

This paper presents a new approach to content-based image retrieval by using dynamic indexing and guided search in a hierarchical structure, and extending data mining and data warehousing techniques. The proposed algorithms include a wavelet-based scheme for multiple image feature extraction, the extension of a conventional data warehouse and an image database to an image data warehouse for dynamic image indexing. It also provides an image data schema for hierarchical image representation and dynamic image indexing, a statistically based feature selection scheme to achieve flexible similarity measures, and a feature component code to facilitate query processing and guide the search for the best matching. A series of case studies are reported, which include a wavelet-based image color hierarchy, classification of satellite images, tropical cyclone pattern recognition, and personal identification using multi-level palmprint and face features. Experimental results confirm that the new approach is feasible for content-based image retrieval.


Author(s):  
Li Yan ◽  
Z. M. Ma

Fuzzy set theory has been extensively applied to the representation and processing of imprecise and uncertain data. Image data is becoming an important data resource with rapid growth in the number of large-scale image repositories. However, image data is fuzzy in nature, and imprecision and vagueness may exist in both image descriptions and query specifications. This chapter reviews some major work of image retrieval with fuzzy logic in the literature, including fuzzy content-based image retrieval and database support for fuzzy image retrieval. For the fuzzy content-based image retrieval, we present how fuzzy sets are applied for the extraction and representation of visual (colors, shapes, textures) features, similarity measures and indexing, relevance feedback, and retrieval systems. For the fuzzy image database retrieval, we present how fuzzy sets are applied for fuzzy image query processing based on a defined database models, and how various fuzzy database models can support image data management.


Author(s):  
Yueshun He ◽  
Wei Zhang ◽  
Ping Du ◽  
Qiaohe Yang

Due to complicated data structure, image can present rich information, and so images are applied widely at different fields. Although the image can offer a lot of convenience, handling such data consume much time and multi-dimensional space. Especially when users need to retrieve some images from larger-scale image datasets, the disadvantage is more obvious. So, in order to retrieve larger-scale image data effectively, a scene images retrieval strategy based on the MapReduce parallel programming model is proposed. The proposed strategy first, investigates how to effectively store large-scale scene images under a Hadoop cluster parallel processing architecture. Second, a distributed feature clustering algorithm MeanShift is introduced to implement the clustering process of emotional feature of scene images. Finally, several experiments are conducted to verify the effectiveness and efficiency of the proposed strategy in terms of different aspects such as retrieval accuracy, speedup ratio and efficiency and data scalability.


Author(s):  
Dasong Sun

Complex networks depict the individual relationship in a population, which can help to deeply mine the characteristics of complex networks and predict the potential collaboration between individuals by analyzing their interaction within different groups or clusters. However, the existing algorithms are with high complexity, which cost much computational time. In this paper, an efficient graph clustering algorithm based on spectral coarsening is proposed, to deal with the large time complexity of the traditional spectral algorithm. We first find the subset most possibly belonged to the same cluster in the original network, and merge them into a single node. The scale of the network will decrease with the network being coarsened. Then, the spectral clustering algorithm is performed on the coarsened network with the maintained advantages and the improved time efficiency. Finally, the experimental results on the multiple datasets demonstrate that the proposed algorithm, compared with the current state-of-the-art methods, has superior performance.


Sign in / Sign up

Export Citation Format

Share Document