scholarly journals Generalized ensemble model for document ranking in information retrieval

2017 ◽  
Vol 14 (1) ◽  
pp. 123-151 ◽  
Author(s):  
Yanshan Wang ◽  
In-Chan Choi ◽  
Hongfang Liu

A generalized ensemble model (gEnM) for document ranking is proposed in this paper. The gEnM linearly combines the document retrieval models and tries to retrieve relevant documents at high positions. In order to obtain the optimal linear combination of multiple document retrieval models or rankers, an optimization program is formulated by directly maximizing the mean average precision. Both supervised and unsupervised learning algorithms are presented to solve this program. For the supervised scheme, two approaches are considered based on the data setting, namely batch and online setting. In the batch setting, we propose a revised Newton?s algorithm, gEnM.BAT, by approximating the derivative and Hessian matrix. In the online setting, we advocate a stochastic gradient descent (SGD) based algorithm-gEnM.ON. As for the unsupervised scheme, an unsupervised ensemble model (UnsEnM) by iteratively co-learning from each constituent ranker is presented. Experimental study on benchmark data sets verifies the effectiveness of the proposed algorithms. Therefore, with appropriate algorithms, the gEnM is a viable option in diverse practical information retrieval applications.

Information ◽  
2019 ◽  
Vol 10 (2) ◽  
pp. 39
Author(s):  
Zhenyang Li ◽  
Guangluan Xu ◽  
Xiao Liang ◽  
Feng Li ◽  
Lei Wang ◽  
...  

In recent years, entity-based ranking models have led to exciting breakthroughs in the research of information retrieval. Compared with traditional retrieval models, entity-based representation enables a better understanding of queries and documents. However, the existing entity-based models neglect the importance of entities in a document. This paper attempts to explore the effects of the importance of entities in a document. Specifically, the dataset analysis is conducted which verifies the correlation between the importance of entities in a document and document ranking. Then, this paper enhances two entity-based models—toy model and Explicit Semantic Ranking model (ESR)—by considering the importance of entities. In contrast to the existing models, the enhanced models assign the weights of entities according to their importance. Experimental results show that the enhanced toy model and ESR can outperform the two baselines by as much as 4.57% and 2.74% on NDCG@20 respectively, and further experiments reveal that the strength of the enhanced models is more evident on long queries and the queries where ESR fails, confirming the effectiveness of taking the importance of entities into account.


2016 ◽  
Vol 42 (6) ◽  
pp. 725-747 ◽  
Author(s):  
Bilel Moulahi ◽  
Lynda Tamine ◽  
Sadok Ben Yahia

With the advent of Web search and the large amount of data published on the Web sphere, a tremendous amount of documents become strongly time-dependent. In this respect, the time dimension has been extensively exploited as a highly important relevance criterion to improve the retrieval effectiveness of document ranking models. Thus, a compelling research interest is going on the temporal information retrieval realm, which gives rise to several temporal search applications. In this article, we intend to provide a scrutinizing overview of time-aware information retrieval models. We specifically put the focus on the use of timeliness and its impact on the global value of relevance as well as on the retrieval effectiveness. First, we attempt to motivate the importance of temporal signals, whenever combined with other relevance features, in accounting for document relevance. Then, we review the relevant studies standing at the crossroads of both information retrieval and time according to three common information retrieval aspects: the query level, the document content level and the document ranking model level. We organize the related temporal-based approaches around specific information retrieval tasks and regarding the task at hand, we emphasize the importance of results presentation and particularly timelines to the end user. We also report a set of relevant research trends and avenues that can be explored in the future.


Author(s):  
Evi Yulianti ◽  
Laksmita Rahadianti

<p>Subject heading is a controlled vocabulary that describes the topic of adocument, which is important to find and organize library resources. Assigning appropriate subject headings to a document, however, is a time-consuming process. We therefore conduct a novel study on the effectiveness of information retrieval models, i.e.,language model (LM) andvector spacemodel (VSM), to automatically generate a ranked list of relevant subject headings, with the aim to give a recommendation for librarians to determine the subject headings effectively and efficiently. Our results show that there are a high number of our queries (up to 61%) that have relevant subject headings in the ten top-ranked recommendations and on average, the first relevant subject heading is found at the early position (3rd rank). This indicates that document retrieval methods can help the subject heading assignment process. LM and VSM are shown to have comparable performance, except when the search unit is title, VSM is superior to LM by8-22%. Our further analysis exhibits three faculty pairs that are potential to have research collaboration as their students’ thesis often have overlap subject headings: i) economy and business-social and political sciences, ii) nursing-public health and iii) medicine-public health.</p>


Author(s):  
Ndengabaganizi Tonny James ◽  
Rajkumar Kannan

It has been long time many people have realized the importance of archiving and finding information. With the advent of computers, it became possible to store large amounts of information; and finding useful information from such collections became a necessity. Over the last forty years, Information Retrieval (IR) has matured considerably. Several IR systems are used on an everyday basis by a wide variety of users. Information retrieval (IR) is generally concerned with the searching and retrieving of knowledge-based information from database. In this paper, we will discuss about the various models and techniques and for information retrieval. We are also providing the overview of traditional IR models.


2017 ◽  
Vol 139 (11) ◽  
Author(s):  
Feng Shi ◽  
Liuqing Chen ◽  
Ji Han ◽  
Peter Childs

With the advent of the big-data era, massive information stored in electronic and digital forms on the internet become valuable resources for knowledge discovery in engineering design. Traditional document retrieval method based on document indexing focuses on retrieving individual documents related to the query, but is incapable of discovering the various associations between individual knowledge concepts. Ontology-based technologies, which can extract the inherent relationships between concepts by using advanced text mining tools, can be applied to improve design information retrieval in the large-scale unstructured textual data environment. However, few of the public available ontology database stands on a design and engineering perspective to establish the relations between knowledge concepts. This paper develops a “WordNet” focusing on design and engineering associations by integrating the text mining approaches to construct an unsupervised learning ontology network. Subsequent probability and velocity network analysis are applied with different statistical behaviors to evaluate the correlation degree between concepts for design information retrieval. The validation results show that the probability and velocity analysis on our constructed ontology network can help recognize the high related complex design and engineering associations between elements. Finally, an engineering design case study demonstrates the use of our constructed semantic network in real-world project for design relations retrieval.


Sign in / Sign up

Export Citation Format

Share Document