BUAP-UPV TPIRS: A System for Document Indexing Reduction at WebCLEF

This paper describes the cross-language plagiarism detection method CLAD (Cross-Language Analog Detector) between test document and indexed documents. The main difference of this method from existing versions is the detection of plagiarism among multiple languages not only two languages. While translating terms, it used the dictionary-based machine-translation method. CLAD’s working process consists of document indexing and detection process phases. In this paper, we will describe both of these phases.

Download Full-text

Computer-aided Document Indexing System

Journal of Computing and Information Technology ◽

10.2498/cit.2005.04.07 ◽

2005 ◽

Vol 13 (4) ◽

pp. 299 ◽

Cited By ~ 2

Author(s):

Mladen Kolar ◽

Igor Vukmirovi� ◽

Bojana Dalbelo Ba�i� ◽

Jan �najder

Keyword(s):

Document Indexing ◽

Indexing System ◽

Computer Aided

Download Full-text

FLPI: An Optimal Algorithm for Document Indexing

Rough Sets and Knowledge Technology - Lecture Notes in Computer Science ◽

10.1007/978-3-540-79721-0_86 ◽

2008 ◽

pp. 644-651

Author(s):

Jian-Wen Tao ◽

Qi-Fu Yao ◽

Jie-Yu Zhao

Keyword(s):

Optimal Algorithm ◽

Document Indexing

Download Full-text

Using Bayesian Network for Conceptual Indexing: Application to Medical Document Indexing with UMLS Metathesaurus

Lecture Notes in Computer Science - Advances in Multilingual and Multimodal Information Retrieval ◽

10.1007/978-3-540-85760-0_80 ◽

2008 ◽

pp. 631-636 ◽

Cited By ~ 1

Author(s):

Thi Hoang Diem Le ◽

Jean-Pierre Chevallet ◽

Joo Hwee Lim

Keyword(s):

Bayesian Network ◽

Document Indexing ◽

Medical Document

Download Full-text

Machine Learning to Support Technical Document Indexing, a Case Study on Seismic Acquisition Reports

80th EAGE Conference and Exhibition 2018 ◽

10.3997/2214-4609.201801219 ◽

2018 ◽

Author(s):

H. Blondelle ◽

P. Neri ◽

J. Micaelli

Keyword(s):

Machine Learning ◽

Technical Document ◽

Document Indexing ◽

Seismic Acquisition

Download Full-text

Information Retrieval from Unstructured Web Text Document Based on Automatic Learning of the Threshold

International Journal of Information Retrieval Research ◽

10.4018/ijirr.2012100102 ◽

2012 ◽

Vol 2 (4) ◽

pp. 12-30

Author(s):

Fethi Fkih ◽

Mohamed Nazih Omri

Keyword(s):

Information Retrieval ◽

A Priori ◽

Automatic Learning ◽

Document Indexing ◽

Text Document ◽

Statistical Measures ◽

Automatic Retrieval ◽

Joint Frequency ◽

Frequency Calculation ◽

A New Technique

Collocation is defined as a sequence of lexical tokens which habitually co-occur. This type of information is widely used in various applications such as Information Retrieval, document indexing, machine translation, lexicography, etc. Therefore, many techniques are developed for the automatic retrieval of collocations from textual documents. These techniques use statistical measures based on a joint frequency calculation to quantify the connection strength between the tokens of a candidate collocation. The discrimination between relevant and irrelevant collocations is performed using a priori fixed threshold. Generally, the discrimination threshold estimation is performed manually by a domain expert. This supervised estimation is considered as an additional cost which reduces system performance. In this paper, the authors propose a new technique for the threshold automatic learning to retrieve information from web text document. This technique is mainly based on the usual performance evaluation measures (such as ROC and Precision-Recall curves). The results show the ability to automatically estimate a statistical threshold independently of the treated corpus.

Download Full-text

A New Approach for Document Indexing UsingWavelet Trees

18th International Conference on Database and Expert Systems Applications (DEXA 2007) ◽

10.1109/dexa.2007.118 ◽

2007 ◽

Cited By ~ 2

Author(s):

Nieves R. Brisaboa ◽

Yolanda Cillero ◽

Antonio Farina ◽

Susana Ladra ◽

Oscar Pedreira

Keyword(s):

New Approach ◽

Document Indexing

Download Full-text

Document indexing framework for retrieval of degraded document images

2015 13th International Conference on Document Analysis and Recognition (ICDAR) ◽

10.1109/icdar.2015.7333966 ◽

2015 ◽

Cited By ~ 2

Author(s):

Ritu Garg ◽

Ehtesham Hassan ◽

Santanu Chaudhury

Keyword(s):

Document Images ◽

Document Indexing ◽

Degraded Document

Download Full-text

Techniques for Textual Document Indexing and Retrieval via Knowledge Sources and Data Mining

Network Theory and Applications - Clustering and Information Retrieval ◽

10.1007/978-1-4613-0227-8_5 ◽

2004 ◽

pp. 135-159 ◽

Cited By ~ 2

Author(s):

Wesley W. Chu ◽

Victor Zhenyu Liu ◽

Wenlei Mao

Keyword(s):

Data Mining ◽

Knowledge Sources ◽

Document Indexing ◽

Indexing And Retrieval

Download Full-text