Inverted indexing in image fragment retrieval using huge keypoint-based vocabularies

An approach for document retrieval using cluster-based inverted indexing

Journal of Information Science ◽

10.1177/01655515211018401 ◽

2021 ◽

pp. 016555152110184

Author(s):

Gunjan Chandwani ◽

Anil Ahlawat ◽

Gaurav Dubey

Keyword(s):

High Performance ◽

Clustering Algorithm ◽

Pearson Correlation ◽

Relevant Information ◽

Document Retrieval ◽

Bhattacharyya Distance ◽

Data Set ◽

Query Matching ◽

Inverted Indexing ◽

Query Optimisation

Document retrieval plays an important role in knowledge management as it facilitates us to discover the relevant information from the existing data. This article proposes a cluster-based inverted indexing algorithm for document retrieval. First, the pre-processing is done to remove the unnecessary and redundant words from the documents. Then, the indexing of documents is done by the cluster-based inverted indexing algorithm, which is developed by integrating the piecewise fuzzy C-means (piFCM) clustering algorithm and inverted indexing. After providing the index to the documents, the query matching is performed for the user queries using the Bhattacharyya distance. Finally, the query optimisation is done by the Pearson correlation coefficient, and the relevant documents are retrieved. The performance of the proposed algorithm is analysed by the WebKB data set and Twenty Newsgroups data set. The analysis exposes that the proposed algorithm offers high performance with a precision of 1, recall of 0.70 and F-measure of 0.8235. The proposed document retrieval system retrieves the most relevant documents and speeds up the storing and retrieval of information.

Download Full-text

Snaptron: querying and visualizing splicing across tens of thousands of RNA-seq samples

10.1101/097881 ◽

2017 ◽

Cited By ~ 2

Author(s):

Christopher Wilks ◽

Phani Gaddipati ◽

Abhinav Nellore ◽

Ben Langmead

Keyword(s):

Tissue Specificity ◽

Rna Seq ◽

Sequencing Data ◽

Transcription Start ◽

Link Type ◽

Alternative Transcription ◽

Web App ◽

Inverted Indexing ◽

Splice Junctions ◽

Splicing Patterns

AbstractAs more and larger genomics studies appear, there is a growing need for comprehensive and queryable cross-study summaries. Snaptron is a search engine for summarized RNA sequencing data with a query planner that leverages R-tree, B-tree and inverted indexing strategies to rapidly execute queries over 146 million exon-exon splice junctions from over 70,000 human RNA-seq samples. Queries can be tailored by constraining which junctions and samples to consider. Snaptron can also rank and score junctions according to tissue specificity or other criteria. Further, Snaptron can rank and score samples according to the relative frequency of different splicing patterns. We outline biological questions that can be explored with Snaptron queries, including a study of novel exons in annotated genes, of exonization of repetitive element loci, and of a recently discovered alternative transcription start site for the ALK gene. Web app and documentation are at http://snaptron.cs.jhu.edu. Source code is at https://github.com/ChristopherWilks/snaptron under the MIT license.

Download Full-text

A Combination of Spatial Pyramid and Inverted Index for Large-Scale Image Retrieval

Computer Vision ◽

10.4018/978-1-5225-5204-8.ch054 ◽

2018 ◽

pp. 1307-1321

Author(s):

Vinh-Tiep Nguyen ◽

Thanh Duc Ngo ◽

Minh-Triet Tran ◽

Duy-Dinh Le ◽

Duc Anh Duong

Keyword(s):

Image Retrieval ◽

Large Scale ◽

Spatial Information ◽

Real Life ◽

Inverted Index ◽

Bag Of Words ◽

Visual Words ◽

Benchmark Datasets ◽

Large Scale Image Retrieval ◽

Inverted Indexing

Large-scale image retrieval has been shown remarkable potential in real-life applications. The standard approach is based on Inverted Indexing, given images are represented using Bag-of-Words model. However, one major limitation of both Inverted Index and Bag-of-Words presentation is that they ignore spatial information of visual words in image presentation and comparison. As a result, retrieval accuracy is decreased. In this paper, the authors investigate an approach to integrate spatial information into Inverted Index to improve accuracy while maintaining short retrieval time. Experiments conducted on several benchmark datasets (Oxford Building 5K, Oxford Building 5K+100K and Paris 6K) demonstrate the effectiveness of our proposed approach.

Download Full-text

A Combination of Spatial Pyramid and Inverted Index for Large-Scale Image Retrieval

International Journal of Multimedia Data Engineering and Management ◽

10.4018/ijmdem.2015040103 ◽

2015 ◽

Vol 6 (2) ◽

pp. 37-51 ◽

Cited By ~ 2

Author(s):

Vinh-Tiep Nguyen ◽

Thanh Duc Ngo ◽

Minh-Triet Tran ◽

Duy-Dinh Le ◽

Duc Anh Duong

Keyword(s):

Image Retrieval ◽

Large Scale ◽

Spatial Information ◽

Real Life ◽

Inverted Index ◽

Bag Of Words ◽

Visual Words ◽

Benchmark Datasets ◽

Large Scale Image Retrieval ◽

Inverted Indexing

Large-scale image retrieval has been shown remarkable potential in real-life applications. The standard approach is based on Inverted Indexing, given images are represented using Bag-of-Words model. However, one major limitation of both Inverted Index and Bag-of-Words presentation is that they ignore spatial information of visual words in image presentation and comparison. As a result, retrieval accuracy is decreased. In this paper, the authors investigate an approach to integrate spatial information into Inverted Index to improve accuracy while maintaining short retrieval time. Experiments conducted on several benchmark datasets (Oxford Building 5K, Oxford Building 5K+100K and Paris 6K) demonstrate the effectiveness of our proposed approach.

Download Full-text

An effective image retrieval framework in invariant feature space merging GeoSOM with modified inverted indexing

Multimedia Tools and Applications ◽

10.1007/s11042-019-7355-4 ◽

2019 ◽

Vol 78 (14) ◽

pp. 19961-19977

Author(s):

S. Priyanka ◽

M. S. Sudhakar

Keyword(s):

Image Retrieval ◽

Feature Space ◽

Invariant Feature ◽

Inverted Indexing

Download Full-text

Framework for proficient proof of identity of duplicate and near-duplicate images and image distances using high-disguisable image fragment

2018 Fifth International Conference on Parallel, Distributed and Grid Computing (PDGC) ◽

10.1109/pdgc.2018.8745792 ◽

2018 ◽

Cited By ~ 1

Author(s):

V.A. Narayana ◽

Sreevani Gaddameedhi ◽

Vijaya Kumar Koppula ◽

K. Srujan Raju

Keyword(s):

Image Fragment

Download Full-text

Retouching and restoration of missing image fragments by means of the iterative calculation of their spectra

Computer Optics ◽

10.18287/2412-6179-2019-43-6-1030-1040 ◽

2019 ◽

Vol 43 (6) ◽

pp. 1030-1040 ◽

Cited By ~ 5

Author(s):

A.V. Kokoshkin ◽

V.A. Korotkov ◽

K.V. Korotkov ◽

E.P. Novichikhin

Keyword(s):

Image Processing ◽

Human Factor ◽

Fourier Spectrum ◽

Interpolation Method ◽

Virtual Image ◽

Iterative Calculation ◽

Image Fragment ◽

Source Data ◽

Scientific Approach

This paper discusses the use of the interpolation method for the sequential calculation of the Fourier spectrum (IMSCS) for retouching and restoration of missing (shaded) image fragments. The proposed approach can be used with any form of a missing image fragment. Such image processing can give good results even at a significantly high percentage of missing image fragments. The method of digital virtual image reconstruction proposed here is strictly based on a scientific approach; as the source data, it uses all the data available (the image itself is the object to be recovered). Therefore, it is free from the human factor, because of which subjective changes can be introduced in the image under processing. The results presented indicate a significant increase in the quality of digital images (increasing the information content), which can offer helpful auxiliary tools for professionals using these images for their practical purposes.

Download Full-text

Efficient set intersection for inverted indexing

ACM Transactions on Information Systems ◽

10.1145/1877766.1877767 ◽

2010 ◽

Vol 29 (1) ◽

pp. 1-25 ◽

Cited By ~ 66

Author(s):

J. Shane Culpepper ◽

Alistair Moffat

Keyword(s):

Efficient Set ◽

Set Intersection ◽

Inverted Indexing

Download Full-text

A way to stop the process for searching for an image fragment by analyzing the estimation convergence of position parameters

Pattern Recognition and Image Analysis ◽

10.1134/s1054661811020477 ◽

2011 ◽

Vol 21 (2) ◽

pp. 262-266 ◽

Cited By ~ 3

Author(s):

I. N. Kaveev ◽

A. G. Tashlinskii ◽

R. M. Kurbanaliev

Keyword(s):

Image Fragment

Download Full-text

Inverted Indexing for Information Retrieval from Motifs and Domains of Proteins

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.c8044.019320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 63-68

Keyword(s):

Information Retrieval ◽

Similarity Coefficient ◽

Huge Amount ◽

Space Model ◽

Hybrid Approaches ◽

Recent Advancement ◽

Stock Pile ◽

Document Vector ◽

Inverted Indexing ◽

Entire Dataset

The recent advancement in technologies are generating huge amount of data and extracting information from it is being outpaced by data accumulation. The development of hybrid approaches by combining different algorithms for extraction of required from the stock-pile of data is a demand of the hour. One such algorithm is vector space model for inverted indexing that has been used traditionally for search engine indexing in computers. In bioinformatics also it has been used for assembly of DNA fragments generated after sequencing. But it has not been applied for retrieval of relevant protein sequence to the query, based on presence or absence of motifs and domains in it. In this paper the concept of inverted indexing has been applied on small motif/domain data of proteins contained in Motivated Proteins database at http://motif.gla.ac.uk/motif/index.html. The index has been built using 17 small hydrogen bonded motifs present in a dataset of 430 proteins. The entire dataset of 430 proteins has been divided into 19 classes. Seven classes’ example cyanovirin, antibiotic and concavalin etc. had very few instances (1 or 2), hence have been omitted from further studies. Rest 12 classes with more than 10 proteins were considered further for testing information retrieval (IR) strategy. The document vector of all the proteins belonging to one class was averaged and 12 queries with averaged vector were prepared for testing. The similarity coefficient (SC) was then compared between query and all the proteins of the dataset. This approach could successfully classify the query as belonging to the class from which it derived. To further validate the importance of document vector as novel attribute for classification, entire dataset of document vector was clustered to ten (10) clusters. Testing was then performed with similarity coefficient (SC) of the query with clusters obtained above. The allocation of cluster to the 12 query sequences followed the same pattern as done with relevant document search using inverted indexing approach. But clustering allocated the queries to only four (4) classes. Maximum number of query proteins (7 proteins or 58%) were found belonging to cluster 5.

Download Full-text