A syntactically-based query reformulation technique for information retrieval

2008 ◽  
Vol 44 (1) ◽  
pp. 143-162 ◽  
Author(s):  
C. Lioma ◽  
I. Ounis
Author(s):  
Ameni Yengui ◽  
Mahmoud Neji

In this article, the authors introduce their OSSVIRI information retrieval system which composed of three modules. In the analysis module, they have proposed a statistical technique exploiting the word frequency in order to extract the simple, compound and specific terms from the documents. In the indexing module, the authors used the ontology to associate the terms with their concepts, retrieve the relations between them and disambiguate the concepts to improve the sematic content of the documents. The concepts and relations are represented as a conceptual graph. In the research module, the authors have proposed a technique of users' query reformulation based on external resources and users' profiles and a technique of pairing based on the combined expansion of the requests and the documents guided by the context of the requirement in information and the documentary contents. This system is validated using the metrics from the research information and comparisons with existing statistical approach. The authors show that their approach achieves good results.


Data Mining ◽  
2013 ◽  
pp. 503-514
Author(s):  
Ismaïl Biskri ◽  
Louis Rompré

In this paper the authors will present research on the combination of two methods of data mining: text classification and maximal association rules. Text classification has been the focus of interest of many researchers for a long time. However, the results take the form of lists of words (classes) that people often do not know what to do with. The use of maximal association rules induced a number of advantages: (1) the detection of dependencies and correlations between the relevant units of information (words) of different classes, (2) the extraction of hidden knowledge, often relevant, from a large volume of data. The authors will show how this combination can improve the process of information retrieval.


2017 ◽  
Vol 7 (3) ◽  
pp. 38-61 ◽  
Author(s):  
Ameni Yengui ◽  
Mahmoud Neji

In this article, the authors introduce their OSSVIRI information retrieval system which composed of three modules. In the analysis module, they have proposed a statistical technique exploiting the word frequency in order to extract the simple, compound and specific terms from the documents. In the indexing module, the authors used the ontology to associate the terms with their concepts, retrieve the relations between them and disambiguate the concepts to improve the sematic content of the documents. The concepts and relations are represented as a conceptual graph. In the research module, the authors have proposed a technique of users' query reformulation based on external resources and users' profiles and a technique of pairing based on the combined expansion of the requests and the documents guided by the context of the requirement in information and the documentary contents. This system is validated using the metrics from the research information and comparisons with existing statistical approach. The authors show that their approach achieves good results.


Study in the field of Information Retrieval (IR) has long been developed and thrived over time. And most of them use the available standard dataset for testing and evaluation. In line with that, the existence of new dataset has also increased to meet the needs of their respective studies. However, to the best of our knowledge, there is no dataset collected from web document that focuses on fruit domain. Therefore, in this paper we contribute to this field by publishing a dataset of web document for fruit focusing on durian fruit. This durian fruit dataset is suitable for query reformulation experiment, searching system, web information retrieval and any search engine experiment. This dataset contains a collection of web document for fruit and durian fruit, a collection of queries and a set of relevant judgement. In addition, in this paper we also publish a list of frequently asked query regarding durian, and an extended list of query characteristic categories.


Sign in / Sign up

Export Citation Format

Share Document