Generalized ensemble model for document ranking in information retrieval

Yanshan Wang; In-Chan Choi; Hongfang Liu

doi:10.2298/csis160229042w

Generalized ensemble model for document ranking in information retrieval

Computer Science and Information Systems ◽

10.2298/csis160229042w ◽

2017 ◽

Vol 14 (1) ◽

pp. 123-151 ◽

Cited By ~ 2

Author(s):

Yanshan Wang ◽

In-Chan Choi ◽

Hongfang Liu

Keyword(s):

Information Retrieval ◽

Hessian Matrix ◽

Document Retrieval ◽

Stochastic Gradient Descent ◽

Data Sets ◽

Ensemble Model ◽

Retrieval Models ◽

Document Ranking ◽

Optimal Linear ◽

Online Setting

A generalized ensemble model (gEnM) for document ranking is proposed in this paper. The gEnM linearly combines the document retrieval models and tries to retrieve relevant documents at high positions. In order to obtain the optimal linear combination of multiple document retrieval models or rankers, an optimization program is formulated by directly maximizing the mean average precision. Both supervised and unsupervised learning algorithms are presented to solve this program. For the supervised scheme, two approaches are considered based on the data setting, namely batch and online setting. In the batch setting, we propose a revised Newton?s algorithm, gEnM.BAT, by approximating the derivative and Hessian matrix. In the online setting, we advocate a stochastic gradient descent (SGD) based algorithm-gEnM.ON. As for the unsupervised scheme, an unsupervised ensemble model (UnsEnM) by iteratively co-learning from each constituent ranker is presented. Experimental study on benchmark data sets verifies the effectiveness of the proposed algorithms. Therefore, with appropriate algorithms, the gEnM is a viable option in diverse practical information retrieval applications.

Download Full-text

Exploring the Importance of Entities in Semantic Ranking

Information ◽

10.3390/info10020039 ◽

2019 ◽

Vol 10 (2) ◽

pp. 39

Author(s):

Zhenyang Li ◽

Guangluan Xu ◽

Xiao Liang ◽

Feng Li ◽

Lei Wang ◽

...

Keyword(s):

Information Retrieval ◽

Experimental Results ◽

Retrieval Models ◽

Document Ranking ◽

Ranking Models ◽

Ranking Model ◽

Dataset Analysis

In recent years, entity-based ranking models have led to exciting breakthroughs in the research of information retrieval. Compared with traditional retrieval models, entity-based representation enables a better understanding of queries and documents. However, the existing entity-based models neglect the importance of entities in a document. This paper attempts to explore the effects of the importance of entities in a document. Specifically, the dataset analysis is conducted which verifies the correlation between the importance of entities in a document and document ranking. Then, this paper enhances two entity-based models—toy model and Explicit Semantic Ranking model (ESR)—by considering the importance of entities. In contrast to the existing models, the enhanced models assign the weights of entities according to their importance. Experimental results show that the enhanced toy model and ESR can outperform the two baselines by as much as 4.57% and 2.74% on NDCG@20 respectively, and further experiments reveal that the strength of the enhanced models is more evident on long queries and the queries where ESR fails, confirming the effectiveness of taking the importance of entities into account.

Download Full-text

When time meets information retrieval: Past proposals, current plans and future trends

Journal of Information Science ◽

10.1177/0165551515607277 ◽

2016 ◽

Vol 42 (6) ◽

pp. 725-747 ◽

Cited By ~ 4

Author(s):

Bilel Moulahi ◽

Lynda Tamine ◽

Sadok Ben Yahia

Keyword(s):

Information Retrieval ◽

Web Search ◽

Specific Information ◽

Time Dimension ◽

Retrieval Models ◽

Document Ranking ◽

Common Information ◽

Retrieval Effectiveness ◽

Ranking Models ◽

Tremendous Amount

With the advent of Web search and the large amount of data published on the Web sphere, a tremendous amount of documents become strongly time-dependent. In this respect, the time dimension has been extensively exploited as a highly important relevance criterion to improve the retrieval effectiveness of document ranking models. Thus, a compelling research interest is going on the temporal information retrieval realm, which gives rise to several temporal search applications. In this article, we intend to provide a scrutinizing overview of time-aware information retrieval models. We specifically put the focus on the use of timeliness and its impact on the global value of relevance as well as on the retrieval effectiveness. First, we attempt to motivate the importance of temporal signals, whenever combined with other relevance features, in accounting for document relevance. Then, we review the relevant studies standing at the crossroads of both information retrieval and time according to three common information retrieval aspects: the query level, the document content level and the document ranking model level. We organize the related temporal-based approaches around specific information retrieval tasks and regarding the task at hand, we emphasize the importance of results presentation and particularly timelines to the end user. We also report a set of relevant research trends and avenues that can be explored in the future.

Download Full-text

Determining subject headings of documents using information retrieval models

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v23.i2.pp1049-1058 ◽

2021 ◽

Vol 23 (2) ◽

pp. 1049

Author(s):

Evi Yulianti ◽

Laksmita Rahadianti

Keyword(s):

Public Health ◽

Information Retrieval ◽

Research Collaboration ◽

Language Model ◽

Document Retrieval ◽

Controlled Vocabulary ◽

Retrieval Models ◽

Subject Headings ◽

Library Resources ◽

The Subject

<p>Subject heading is a controlled vocabulary that describes the topic of adocument, which is important to find and organize library resources. Assigning appropriate subject headings to a document, however, is a time-consuming process. We therefore conduct a novel study on the effectiveness of information retrieval models, i.e.,language model (LM) andvector spacemodel (VSM), to automatically generate a ranked list of relevant subject headings, with the aim to give a recommendation for librarians to determine the subject headings effectively and efficiently. Our results show that there are a high number of our queries (up to 61%) that have relevant subject headings in the ten top-ranked recommendations and on average, the first relevant subject heading is found at the early position (3rd rank). This indicates that document retrieval methods can help the subject heading assignment process. LM and VSM are shown to have comparable performance, except when the search unit is title, VSM is superior to LM by8-22%. Our further analysis exhibits three faculty pairs that are potential to have research collaboration as their students’ thesis often have overlap subject headings: i) economy and business-social and political sciences, ii) nursing-public health and iii) medicine-public health.</p>

Download Full-text

Towards Privacy-Preserving Evaluation for Information Retrieval Models Over Industry Data Sets

Information Retrieval Technology - Lecture Notes in Computer Science ◽

10.1007/978-3-319-70145-5_16 ◽

2017 ◽

pp. 210-221

Author(s):

Peilin Yang ◽

Mianwei Zhou ◽

Yi Chang ◽

Chengxiang Zhai ◽

Hui Fang

Keyword(s):

Information Retrieval ◽

Privacy Preserving ◽

Data Sets ◽

Industry Data ◽

Retrieval Models

Download Full-text

A Survey on Information Retrieval Models, Techniques and Applications

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse.v7i7.90 ◽

2017 ◽

Vol 7 (7) ◽

pp. 16 ◽

Cited By ~ 1

Author(s):

Ndengabaganizi Tonny James ◽

Rajkumar Kannan

Keyword(s):

Information Retrieval ◽

Retrieval Models ◽

Knowledge Based ◽

Long Time

It has been long time many people have realized the importance of archiving and finding information. With the advent of computers, it became possible to store large amounts of information; and finding useful information from such collections became a necessity. Over the last forty years, Information Retrieval (IR) has matured considerably. Several IR systems are used on an everyday basis by a wide variety of users. Information retrieval (IR) is generally concerned with the searching and retrieving of knowledge-based information from database. In this paper, we will discuss about the various models and techniques and for information retrieval. We are also providing the overview of traditional IR models.

Download Full-text

Interactive Information Retrieval: Models, Algorithms, and Evaluation

Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval ◽

10.1145/3404835.3462811 ◽

2021 ◽

Author(s):

Chengxiang Zhai

Keyword(s):

Information Retrieval ◽

Retrieval Models ◽

Interactive Information Retrieval

Download Full-text

Interactive Information Retrieval: Models, Algorithms, and Evaluation

Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval ◽

10.1145/3397271.3401424 ◽

2020 ◽

Author(s):

ChengXiang Zhai

Keyword(s):

Information Retrieval ◽

Retrieval Models ◽

Interactive Information Retrieval

Download Full-text

Information Retrieval Models: Foundations and Relationships

Synthesis Lectures on Information Concepts Retrieval and Services ◽

10.2200/s00494ed1v01y201304icr027 ◽

2013 ◽

Vol 5 (3) ◽

pp. 1-163 ◽

Cited By ~ 8

Author(s):

Thomas Roelleke

Keyword(s):

Information Retrieval ◽

Retrieval Models

Download Full-text

A Data-Driven Text Mining and Semantic Network Analysis for Design Information Retrieval

Journal of Mechanical Design ◽

10.1115/1.4037649 ◽

2017 ◽

Vol 139 (11) ◽

Cited By ~ 24

Author(s):

Feng Shi ◽

Liuqing Chen ◽

Ji Han ◽

Peter Childs

Keyword(s):

Information Retrieval ◽

Network Analysis ◽

Text Mining ◽

Engineering Design ◽

Large Scale ◽

Semantic Network ◽

Document Retrieval ◽

Design Information ◽

Improve Design ◽

Correlation Degree

With the advent of the big-data era, massive information stored in electronic and digital forms on the internet become valuable resources for knowledge discovery in engineering design. Traditional document retrieval method based on document indexing focuses on retrieving individual documents related to the query, but is incapable of discovering the various associations between individual knowledge concepts. Ontology-based technologies, which can extract the inherent relationships between concepts by using advanced text mining tools, can be applied to improve design information retrieval in the large-scale unstructured textual data environment. However, few of the public available ontology database stands on a design and engineering perspective to establish the relations between knowledge concepts. This paper develops a “WordNet” focusing on design and engineering associations by integrating the text mining approaches to construct an unsupervised learning ontology network. Subsequent probability and velocity network analysis are applied with different statistical behaviors to evaluate the correlation degree between concepts for design information retrieval. The validation results show that the probability and velocity analysis on our constructed ontology network can help recognize the high related complex design and engineering associations between elements. Finally, an engineering design case study demonstrates the use of our constructed semantic network in real-world project for design relations retrieval.

Download Full-text

Field-Based Information Retrieval Models

Encyclopedia of Database Systems ◽

10.1007/978-0-387-39940-9_927 ◽

2009 ◽

pp. 1129-1132

Author(s):

Vassilis Plachouras

Keyword(s):

Information Retrieval ◽

Retrieval Models

Download Full-text