A Semantic Topic Identification System for Document Retrieval on the Web

Improving Document Retrieval with Spelling Correction for Weak and Fabricated Indonesian-Translated Hadith

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v4i3.1913 ◽

2020 ◽

Vol 4 (3) ◽

pp. 551-557

Author(s):

Muhammad zaky ramadhan ◽

Kemas Muslim Lhaksmana

Keyword(s):

Retrieval System ◽

Islamic Law ◽

Vector Space Model ◽

Document Retrieval ◽

The Internet ◽

Average Precision ◽

Spelling Correction ◽

Space Model ◽

The Mean ◽

The Web

Hadith has several levels of authenticity, among which are weak (dhaif), and fabricated (maudhu) hadith that may not originate from the prophet Muhammad PBUH, and thus should not be considered in concluding an Islamic law (sharia). However, many such hadiths have been commonly confused as authentic hadiths among ordinary Muslims. To easily distinguish such hadiths, this paper proposes a method to check the authenticity of a hadith by comparing them with a collection of fabricated hadiths in Indonesian. The proposed method applies the vector space model and also performs spelling correction using symspell to check whether the use of spelling check can improve the accuracy of hadith retrieval, because it has never been done in previous works and typos are common on Indonesian-translated hadiths on the Web and social media raw text. The experiment result shows that the use of spell checking improves the mean average precision and recall to become 81% (from 73%) and 89% (from 80%), respectively. Therefore, the improvement in accuracy by implementing spelling correction make the hadith retrieval system more feasible and encouraged to be implemented in future works because it can correct typos that are common in the raw text on the Internet.

Download Full-text

Conceptual Design of Identification Systems

Volume 5: 13th Design for Manufacturability and the Lifecycle Conference; 5th Symposium on International Design and Design Education; 10th International Conference on Advanced Vehicle and Tire Technologies ◽

10.1115/detc2008-49782 ◽

2008 ◽

Author(s):

Yuan Mao Huang ◽

Chung-Cheng Liao

Keyword(s):

Conceptual Design ◽

Identification System ◽

Paired Comparisons ◽

Relative Importance ◽

Eigenvalues And Eigenvectors ◽

Function Structure ◽

Combined Solution ◽

Physical Effects ◽

Physical Principles ◽

The Web

This is a student design project to present the procedures and the results of conceptual design for identification systems. The sub-function structure of the identification system is created after recognizing the requirement and establishing the specification. The physical effects, physical principles and solution principles are found based on the sub-functions, and the alternatives or combined solution principles are generated. The Saaty method with the modified normalized values is used to determine the relative importance or weighting factors of the standard evaluation items by paired comparisons. The eigenvalues and eigenvectors of the evaluation items and alternatives are determined. The web method is then used to determine the most preferred design of the alternatives and the best alternative is recommended. It is learned that to determine the sub-functions, the physical effects, physical principles, solution principles and combined solution principles, surveys of evaluation items, matrices of evaluation items and alternatives are very difficult, tedious and time consuming.

Download Full-text

Topic Identification System to Filter Twitter Feeds

2016 3rd International Conference on Soft Computing & Machine Intelligence (ISCMI) ◽

10.1109/iscmi.2016.14 ◽

2016 ◽

Cited By ~ 1

Author(s):

Shatha Hamad Altammami ◽

Omer F. Rana

Keyword(s):

Identification System ◽

Topic Identification

Download Full-text

A Cognitive-Based Approach to Identify Topics in Text Using the Web as a Knowledge Source

Ontology Learning and Knowledge Discovery Using the Web ◽

10.4018/978-1-60960-625-1.ch004 ◽

2011 ◽

pp. 61-78 ◽

Cited By ~ 4

Author(s):

Louis Massey ◽

Wilson Wong

Keyword(s):

Natural Language ◽

Language Processing ◽

Knowledge Bases ◽

Human Cognition ◽

Web Pages ◽

Topic Identification ◽

Unstructured Text ◽

Text Information ◽

Processing Techniques ◽

The Web

This chapter explores the problem of topic identification from text. It is first argued that the conventional representation of text as bag-of-words vectors will always have limited success in arriving at the underlying meaning of text until the more fundamental issues of feature independence in vector-space and ambiguity of natural language are addressed. Next, a groundbreaking approach to text representation and topic identification that deviates radically from current techniques used for document classification, text clustering, and concept discovery is proposed. This approach is inspired by human cognition, which allows ‘meaning’ to emerge naturally from the activation and decay of unstructured text information retrieved from the Web. This paradigm shift allows for the exploitation rather than avoidance of dependence between terms to derive meaning without the complexity introduced by conventional natural language processing techniques. Using the unstructured texts in Web pages as a source of knowledge alleviates the laborious handcrafting of formal knowledge bases and ontologies that are required by many existing techniques. Some initial experiments have been conducted, and the results are presented in this chapter to illustrate the power of this new approach.

Download Full-text

Notice of Retraction: Design and Implementation of the Web-Based Real-Time Remote Expert Identification System: Used for Biological Quarantine

2010 2nd International Conference on Information Engineering and Computer Science ◽

10.1109/iciecs.2010.5678123 ◽

2010 ◽

Author(s):

Jinfeng Ma ◽

Zhenzhou Ji ◽

Maosen Cui ◽

Youcai Zhang ◽

Xiaobin Wu ◽

...

Keyword(s):

Real Time ◽

Identification System ◽

Web Based ◽

Design And Implementation ◽

The Web

Download Full-text

Candidate document retrieval for Arabic-based text reuse detection on the web

2016 12th International Conference on Innovations in Information Technology (IIT) ◽

10.1109/innovations.2016.7880048 ◽

2016 ◽

Cited By ~ 1

Author(s):

Leena Lulu ◽

Boumediene Belkhouche ◽

Saad Harous

Keyword(s):

Document Retrieval ◽

The Web

Download Full-text

Assisting web document retrieval with topic identification in tourism domain

Web Intelligence ◽

10.3233/web-150308 ◽

2015 ◽

Vol 13 (1) ◽

pp. 31-41 ◽

Cited By ~ 3

Author(s):

Rajendra Prasath ◽

Vijai Kumar ◽

Sudeshna Sarkar

Keyword(s):

Document Retrieval ◽

Topic Identification ◽

Web Document

Download Full-text

Efficient Top-k Document Retrieval for Long Queries Using Term-Document Binary Matrix ^|^mdash; Pursuit of Enhanced Informational Search on the Web ^|^mdash;

IEICE Transactions on Information and Systems ◽

10.1587/transinf.e96.d.1016 ◽

2013 ◽

Vol E96.D (5) ◽

pp. 1016-1028

Author(s):

Etsuro FUJITA ◽

Keizo OYAMA

Keyword(s):

Document Retrieval ◽

Binary Matrix ◽

The Web

Download Full-text

Towards Building an Arabic Plagiarism Detection System: Plagiarism Detection in Arabic

International Journal of Information Retrieval Research ◽

10.4018/ijirr.2019070102 ◽

2019 ◽

Vol 9 (3) ◽

pp. 12-22

Author(s):

Imtiaz Hussain Khan ◽

Muazzam Ahmed Siddiqui ◽

Kamal M. Jambi

Keyword(s):

Detection System ◽

Document Retrieval ◽

Arabic Language ◽

The Other ◽

Similarity Analysis ◽

Plagiarism Detection ◽

Similarity Computation ◽

Main Components ◽

Google Search ◽

The Web

This article describes a plagiarism detection system for the Arabic language that combines different similarity-measure techniques to uncover plagiarism in Arabic documents. The proposed system consists of two main components, one document-retrieval and the other detailed similarity analysis. The document-retrieval component generates queries from a given suspicious document and makes use of Google search API to retrieve candidate source documents from the Web. The similarity analysis component takes each source document in turn and attempts to identify the plagiarized parts in the suspicious document. The proposed system is thoroughly evaluated using an indigenous corpus. At the document-retrieval level, the system achieved above 75% accuracy in terms of f-score, whereas at the detailed similarity-computation level, the f-score is above 70%.

Download Full-text

Ranking Web Search Results Exploiting Wikipedia

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213016500184 ◽

2016 ◽

Vol 25 (03) ◽

pp. 1650018 ◽

Cited By ~ 1

Author(s):

Andreas Kanavos ◽

Christos Makris ◽

Yannis Plegas ◽

Evangelos Theodoridis

Keyword(s):

Web Search ◽

State Of The Art ◽

Web Pages ◽

Ranking Methods ◽

Data Mining Technique ◽

Topic Identification ◽

Probabilistic Network ◽

Current State ◽

The Web ◽

Selection Of

It is widely known that search engines are the dominating tools for finding information on the web. In most of the cases, these engines return web page references on a global ranking taking in mind either the importance of the web site or the relevance of the web pages to the identified topic. In this paper, we focus on the problem of determining distinct thematic groups on web search engine results that other existing engines provide. We additionally address the problem of dynamically adapting their ranking according to user selections, incorporating user judgments as implicitly registered in their selection of relevant documents. Our system exploits a state of the art semantic web data mining technique that identifies semantic entities of Wikipedia for grouping the result set in different topic groups, according to the various meanings of the provided query. Moreover, we propose a novel probabilistic Network scheme that employs the aforementioned topic identification method, in order to modify ranking of results as the users select documents. We evaluated in practice our implemented prototype with extensive experiments with the ClueWeb09 dataset using the TREC’s 2009, 2010, 2011 and 2012 Web Tracks’ where we observed improved retrieval performance compared to current state of the art re-ranking methods.

Download Full-text