Analyzing topic drift in query expansion for Information Retrieval from a large-scale patent DataBase

Cross Language Query Expansion Approach for CIMS Based on Weighted D-S Evidence Theory

Key Engineering Materials ◽

10.4028/www.scientific.net/kem.620.534 ◽

2014 ◽

Vol 620 ◽

pp. 534-543

Author(s):

Xiao Bo Wang ◽

Fan Zhao ◽

Xiao Li ◽

Rong Hui Zhang

Keyword(s):

Information Technology ◽

Information Retrieval ◽

Large Scale ◽

Query Expansion ◽

Semantic Analysis ◽

Rapid Development ◽

Evidence Theory ◽

Computer Integrated Manufacturing ◽

Shafer Theory ◽

Cross Language

With the Computer Integrated Manufacturing System and Information Technology rapid development, rapid retrieval multilingual becomes one of the hot spots in Machine Translation. The cross-language information retrieval (CLIR) provides a convenient way, enabling users to use their own familiar language to submit queries to retrieve documents in another language. Basic query expansion is one of the effective methods to improve recall of information retrieval. There are many researchers have proposed many extension methods, but most methods are simply added to the query expansion terms. If we do not distinguish the original query words and extended words, expanded query may deviate from the original semantics. So, it is very inconvenience for mechanical engineer and programmer. Based on Dempster-Shafer theory of evidence, we proposed a query expansion computing model, which considered as the main evidence of the original query terms, while the extensions as a secondary evidence of the original query terms. Which method to use semantic dictionary Han and Uygur-Chinese bilingual dictionary of synonyms forest and How to get the query word synonyms, near-synonyms and hypernym. Latent Semantic Analysis is used to obtain semantic relationships query words related words the using potentially large-scale text. The combination of these two types of evidence is in order to put forward a weighted combination of the Dempster-Shafer rule. Experimental results show that this method can effectively improve retrieval efficiency in Mechanical Engineering and Information Technology. The research results can be provided a reference for CIMS multilingual quick retrieval.

Download Full-text

Neural methods for effective, efficient, and exposure-aware information retrieval

ACM SIGIR Forum ◽

10.1145/3476415.3476434 ◽

2021 ◽

Vol 55 (1) ◽

pp. 1-2

Author(s):

Bhaskar Mitra

Keyword(s):

Information Retrieval ◽

Language Processing ◽

Large Scale ◽

Web Search ◽

Real Life ◽

Inverted Index ◽

Information Need ◽

Product Model ◽

Performance Improvements ◽

Deep Model

Neural networks with deep architectures have demonstrated significant performance improvements in computer vision, speech recognition, and natural language processing. The challenges in information retrieval (IR), however, are different from these other application areas. A common form of IR involves ranking of documents---or short passages---in response to keyword-based queries. Effective IR systems must deal with query-document vocabulary mismatch problem, by modeling relationships between different query and document terms and how they indicate relevance. Models should also consider lexical matches when the query contains rare terms---such as a person's name or a product model number---not seen during training, and to avoid retrieving semantically related but irrelevant results. In many real-life IR tasks, the retrieval involves extremely large collections---such as the document index of a commercial Web search engine---containing billions of documents. Efficient IR methods should take advantage of specialized IR data structures, such as inverted index, to efficiently retrieve from large collections. Given an information need, the IR system also mediates how much exposure an information artifact receives by deciding whether it should be displayed, and where it should be positioned, among other results. Exposure-aware IR systems may optimize for additional objectives, besides relevance, such as parity of exposure for retrieved items and content publishers. In this thesis, we present novel neural architectures and methods motivated by the specific needs and challenges of IR tasks. We ground our contributions with a detailed survey of the growing body of neural IR literature [Mitra and Craswell, 2018]. Our key contribution towards improving the effectiveness of deep ranking models is developing the Duet principle [Mitra et al., 2017] which emphasizes the importance of incorporating evidence based on both patterns of exact term matches and similarities between learned latent representations of query and document. To efficiently retrieve from large collections, we develop a framework to incorporate query term independence [Mitra et al., 2019] into any arbitrary deep model that enables large-scale precomputation and the use of inverted index for fast retrieval. In the context of stochastic ranking, we further develop optimization strategies for exposure-based objectives [Diaz et al., 2020]. Finally, this dissertation also summarizes our contributions towards benchmarking neural IR models in the presence of large training datasets [Craswell et al., 2019] and explores the application of neural methods to other IR tasks, such as query auto-completion.

Download Full-text

Large scale disk-based metric indexing structure for approximate information retrieval by content

Proceedings of the 1st Workshop on New Trends in Similarity Search - NTSS '11 ◽

10.1145/1966865.1966869 ◽

2011 ◽

Cited By ~ 2

Author(s):

Stanislav Barton ◽

Valerie Gouet-Brunet ◽

Marta Rukoz

Keyword(s):

Information Retrieval ◽

Large Scale ◽

Indexing Structure

Download Full-text

Workshop on large-scale distributed systems for information retrieval

ACM SIGIR Forum ◽

10.1145/1328964.1328979 ◽

2007 ◽

Vol 41 (2) ◽

pp. 83-88

Author(s):

Flavio P. Junqueira ◽

Vassilis Plachouras ◽

Fabrizio Silvestri ◽

Ivana Podnar

Keyword(s):

Information Retrieval ◽

Distributed Systems ◽

Large Scale

Download Full-text

Biodiversity Information Retrieval Through Large Scale Content-Based Identification: A Long-Term Evaluation

Information Retrieval Evaluation in a Changing World - The Information Retrieval Series ◽

10.1007/978-3-030-22948-1_16 ◽

2019 ◽

pp. 389-413

Author(s):

Alexis Joly ◽

Hervé Goëau ◽

Hervé Glotin ◽

Concetto Spampinato ◽

Pierre Bonnet ◽

...

Keyword(s):

Information Retrieval ◽

Large Scale ◽

Term Evaluation ◽

Biodiversity Information

Download Full-text

A hybrid semantic query expansion approach for Arabic information retrieval

Journal Of Big Data ◽

10.1186/s40537-020-00310-z ◽

2020 ◽

Vol 7 (1) ◽

Author(s):

Hiba ALMarwi ◽

Mossa Ghurab ◽

Ibrahim Al-Baltah

Keyword(s):

Information Retrieval ◽

Query Expansion ◽

Semantic Query ◽

Arabic Information Retrieval

Download Full-text

A method of query expansion based on topic models and user profile for search in folksonomy

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-210508 ◽

2021 ◽

pp. 1-11

Author(s):

Zhinan Gou ◽

Yan Li

Keyword(s):

Information Retrieval ◽

Query Expansion ◽

Information Overload ◽

Topic Model ◽

User Profile ◽

Expansion Method ◽

Collaborative Tagging ◽

Search Query ◽

Tagging System ◽

The Web

With the development of the web 2.0 communities, information retrieval has been widely applied based on the collaborative tagging system. However, a user issues a query that is often a brief query with only one or two keywords, which leads to a series of problems like inaccurate query words, information overload and information disorientation. The query expansion addresses this issue by reformulating each search query with additional words. By analyzing the limitation of existing query expansion methods in folksonomy, this paper proposes a novel query expansion method, based on user profile and topic model, for search in folksonomy. In detail, topic model is constructed by variational antoencoder with Word2Vec firstly. Then, query expansion is conducted by user profile and topic model. Finally, the proposed method is evaluated by a real dataset. Evaluation results show that the proposed method outperforms the baseline methods.

Download Full-text

A Data-Driven Text Mining and Semantic Network Analysis for Design Information Retrieval

Journal of Mechanical Design ◽

10.1115/1.4037649 ◽

2017 ◽

Vol 139 (11) ◽

Cited By ~ 24

Author(s):

Feng Shi ◽

Liuqing Chen ◽

Ji Han ◽

Peter Childs

Keyword(s):

Information Retrieval ◽

Network Analysis ◽

Text Mining ◽

Engineering Design ◽

Large Scale ◽

Semantic Network ◽

Document Retrieval ◽

Design Information ◽

Improve Design ◽

Correlation Degree

With the advent of the big-data era, massive information stored in electronic and digital forms on the internet become valuable resources for knowledge discovery in engineering design. Traditional document retrieval method based on document indexing focuses on retrieving individual documents related to the query, but is incapable of discovering the various associations between individual knowledge concepts. Ontology-based technologies, which can extract the inherent relationships between concepts by using advanced text mining tools, can be applied to improve design information retrieval in the large-scale unstructured textual data environment. However, few of the public available ontology database stands on a design and engineering perspective to establish the relations between knowledge concepts. This paper develops a “WordNet” focusing on design and engineering associations by integrating the text mining approaches to construct an unsupervised learning ontology network. Subsequent probability and velocity network analysis are applied with different statistical behaviors to evaluate the correlation degree between concepts for design information retrieval. The validation results show that the probability and velocity analysis on our constructed ontology network can help recognize the high related complex design and engineering associations between elements. Finally, an engineering design case study demonstrates the use of our constructed semantic network in real-world project for design relations retrieval.

Download Full-text

Reliability of a Distributed Search Engine for Fresh Information Retrieval in Large-Scale Intranet

New Horizons of Parallel and Distributed Computing ◽

10.1007/0-387-28967-4_14 ◽

2006 ◽

pp. 203-216

Author(s):

Nobuyoshi Sato ◽

Minoru Udagawa ◽

Minoru Uehara ◽

Yoshifumi Sakai ◽

Hideki Mori

Keyword(s):

Information Retrieval ◽

Search Engine ◽

Large Scale ◽

Distributed Search

Download Full-text

Context Window Based Co-occurrence Approach for Improving Feedback Based Query Expansion in Information Retrieval

International Journal of Information Retrieval Research ◽

10.4018/ijirr.2015100103 ◽

2015 ◽

Vol 5 (4) ◽

pp. 31-45 ◽

Cited By ~ 11

Author(s):

Jagendra Singh ◽

Aditi Sharan

Keyword(s):

Information Retrieval ◽

Relevance Feedback ◽

Query Expansion ◽

Contextual Information ◽

Optimal Combination ◽

Query Term ◽

Benchmark Data ◽

First Pass ◽

Baseline Approach ◽

Pseudo Relevance Feedback

Pseudo-relevance feedback (PRF) is a type of relevance feedback approach of query expansion that considers the top ranked retrieved documents as relevance feedback. In this paper the authors focus is to capture the limitation of co-occurrence and PRF based query expansion approach and the authors proposed a hybrid method to improve the performance of PRF based query expansion by combining query term co-occurrence and query terms contextual information based on corpus of top retrieved feedback documents in first pass. Firstly, the paper suggests top retrieved feedback documents based query term co-occurrence approach to select an optimal combination of query terms from a pool of terms obtained using PRF based query expansion. Second, contextual window based approach is used to select the query context related terms from top feedback documents. Third, comparisons were made among baseline, co-occurrence and contextual window based approaches using different performance evaluating metrics. The experiments were performed on benchmark data and the results show significant improvement over baseline approach.

Download Full-text