BUAP-UPV TPIRS: A System for Document Indexing Reduction at WebCLEF

Author(s):  
David Pinto ◽  
Héctor Jiménez-Salazar ◽  
Paolo Rosso ◽  
Emilio Sanchis
Keyword(s):  
Author(s):  
Nieves R. Brisaboa ◽  
Yolanda Cillero ◽  
Antonio Farina ◽  
Susana Ladra ◽  
Oscar Pedreira

This paper describes the cross-language plagiarism detection method CLAD (Cross-Language Analog Detector) between test document and indexed documents. The main difference of this method from existing versions is the detection of plagiarism among multiple languages not only two languages. While translating terms, it used the dictionary-based machine-translation method. CLAD’s working process consists of document indexing and detection process phases. In this paper, we will describe both of these phases.


2005 ◽  
Vol 13 (4) ◽  
pp. 299 ◽  
Author(s):  
Mladen Kolar ◽  
Igor Vukmirovi� ◽  
Bojana Dalbelo Ba�i� ◽  
Jan �najder

2012 ◽  
Vol 2 (4) ◽  
pp. 12-30
Author(s):  
Fethi Fkih ◽  
Mohamed Nazih Omri

Collocation is defined as a sequence of lexical tokens which habitually co-occur. This type of information is widely used in various applications such as Information Retrieval, document indexing, machine translation, lexicography, etc. Therefore, many techniques are developed for the automatic retrieval of collocations from textual documents. These techniques use statistical measures based on a joint frequency calculation to quantify the connection strength between the tokens of a candidate collocation. The discrimination between relevant and irrelevant collocations is performed using a priori fixed threshold. Generally, the discrimination threshold estimation is performed manually by a domain expert. This supervised estimation is considered as an additional cost which reduces system performance. In this paper, the authors propose a new technique for the threshold automatic learning to retrieve information from web text document. This technique is mainly based on the usual performance evaluation measures (such as ROC and Precision-Recall curves). The results show the ability to automatically estimate a statistical threshold independently of the treated corpus.


Author(s):  
Nieves R. Brisaboa ◽  
Yolanda Cillero ◽  
Antonio Farina ◽  
Susana Ladra ◽  
Oscar Pedreira

Sign in / Sign up

Export Citation Format

Share Document