Shape-DNA: Effective Character Restoration and Enhancement for Arabic Text Documents

Latent Topic Model for Indexing Arabic Documents

International Journal of Information Retrieval Research ◽

10.4018/ijirr.2014010102 ◽

2014 ◽

Vol 4 (1) ◽

pp. 29-45 ◽

Cited By ~ 3

Author(s):

Rami Ayadi ◽

Mohsen Maraoui ◽

Mounir Zrigui

Keyword(s):

Topic Model ◽

Inflectional Morphology ◽

Arabic Text ◽

Text Representation ◽

Text Documents ◽

Latent Topic ◽

Latent Topics ◽

F Measure

In this paper, the authors present latent topic model to index and represent the Arabic text documents reflecting more semantics. Text representation in a language with high inflectional morphology such as Arabic is not a trivial task and requires some special treatments. The authors describe our approach for analyzing and preprocessing Arabic text then we describe the stemming process. Finally, the latent model (LDA) is adapted to extract Arabic latent topics, the authors extracted significant topics of all texts, each theme is described by a particular distribution of descriptors then each text is represented on the vectors of these topics. The experiment of classification is conducted on in house corpus; latent topics are learned with LDA for different topic numbers K (25, 50, 75, and 100) then the authors compare this result with classification in the full words space. The results show that performances, in terms of precision, recall and f-measure, of classification in the reduced topics space outperform classification in full words space and when using LSI reduction.

Download Full-text

The Effect of Stemming on Arabic Text Classification

International Journal of Information Retrieval Research ◽

10.4018/ijirr.2011070104 ◽

2011 ◽

Vol 1 (3) ◽

pp. 54-70 ◽

Cited By ~ 11

Author(s):

Abdullah Wahbeh ◽

Mohammed Al-Kabi ◽

Qasem Al-Radaideh ◽

Emad Al-Shawakfa ◽

Izzat Alsmadi

Keyword(s):

Text Classification ◽

Digital Libraries ◽

Arabic Language ◽

Support Vector ◽

Svm Classifier ◽

Arabic Text ◽

Text Documents ◽

Information Retrieval Systems ◽

Arabic Text Classification ◽

The Web

The information world is rich of documents in different formats or applications, such as databases, digital libraries, and the Web. Text classification is used for aiding search functionality offered by search engines and information retrieval systems to deal with the large number of documents on the web. Many research papers, conducted within the field of text classification, were applied to English, Dutch, Chinese, and other languages, whereas fewer were applied to Arabic language. This paper addresses the issue of automatic classification or classification of Arabic text documents. It applies text classification to Arabic language text documents using stemming as part of the preprocessing steps. Results have showed that applying text classification without using stemming; the support vector machine (SVM) classifier has achieved the highest classification accuracy using the two test modes with 87.79% and 88.54%. On the other hand, stemming has negatively affected the accuracy, where the SVM accuracy using the two test modes dropped down to 84.49% and 86.35%.

Download Full-text

An enhanced Kashida-based watermarking approach for Arabic text-documents

2013 International Conference on Electronics, Computer and Computation (ICECCO) ◽

10.1109/icecco.2013.6718288 ◽

2013 ◽

Cited By ~ 8

Author(s):

Yasser M. Alginahi ◽

Muhammad N. Kabir ◽

Omar Tayan

Keyword(s):

Arabic Text ◽

Text Documents

Download Full-text

Automatic Extraction of Spatio-Temporal Information from Arabic Text Documents

International Journal of Computer Science and Information Technology ◽

10.5121/ijcsit.2015.7507 ◽

2015 ◽

Vol 7 (5) ◽

pp. 97-107

Author(s):

Abdelkoui Feriel ◽

Kholladi Mohamed Khireddine

Keyword(s):

Temporal Information ◽

Automatic Extraction ◽

Arabic Text ◽

Text Documents ◽

Spatio Temporal

Download Full-text

Disguised plagiarism detection in Arabic text documents

2018 2nd International Conference on Natural Language and Speech Processing (ICNLSP) ◽

10.1109/icnlsp.2018.8374395 ◽

2018 ◽

Cited By ~ 3

Author(s):

El Moatez Billah Nagoudi ◽

Hadda Cherroun ◽

Ali Alshehri

Keyword(s):

Arabic Text ◽

Plagiarism Detection ◽

Text Documents

Download Full-text

The Effect of Stemming on Arabic Text Classification

Information Retrieval Methods for Multidisciplinary Applications ◽

10.4018/978-1-4666-3898-3.ch013 ◽

2013 ◽

pp. 207-225 ◽

Cited By ~ 3

Author(s):

Abdullah Wahbeh ◽

Mohammed Al-Kabi ◽

Qasem Al-Radaideh ◽

Emad Al-Shawakfa ◽

Izzat Alsmadi

Keyword(s):

Text Classification ◽

Digital Libraries ◽

Arabic Language ◽

Support Vector ◽

Svm Classifier ◽

Arabic Text ◽

Text Documents ◽

Information Retrieval Systems ◽

Arabic Text Classification ◽

The Web

The information world is rich of documents in different formats or applications, such as databases, digital libraries, and the Web. Text classification is used for aiding search functionality offered by search engines and information retrieval systems to deal with the large number of documents on the web. Many research papers, conducted within the field of text classification, were applied to English, Dutch, Chinese, and other languages, whereas fewer were applied to Arabic language. This paper addresses the issue of automatic classification or classification of Arabic text documents. It applies text classification to Arabic language text documents using stemming as part of the preprocessing steps. Results have showed that applying text classification without using stemming; the support vector machine (SVM) classifier has achieved the highest classification accuracy using the two test modes with 87.79% and 88.54%. On the other hand, stemming has negatively affected the accuracy, where the SVM accuracy using the two test modes dropped down to 84.49% and 86.35%.

Download Full-text

Mood Detection Based on Arabic Text Documents using Machine Learning Methods

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2020/36942020 ◽

2020 ◽

Vol 9 (4) ◽

pp. 4424-4436

Author(s):

Abdelbaset Hussein

Keyword(s):

Machine Learning ◽

Arabic Text ◽

Learning Methods ◽

Text Documents ◽

Machine Learning Methods

Download Full-text

An Enhanced Kashida-Based Watermarking Approach for Increased Protection in Arabic Text-Documents Based on Frequency Recurrence of Characters

International Journal of Computer and Electrical Engineering ◽

10.17706/ijcee.2014.v6.857 ◽

2014 ◽

Vol 6 (5) ◽

pp. 381-392 ◽

Cited By ~ 9

Author(s):

Yasser M. Alginahi ◽

◽

Muhammad N. Kabir ◽

Omar Tayan

Keyword(s):

Arabic Text ◽

Text Documents

Download Full-text

Arabic Text Documents Recommendation Using Joint Deep Representations Learning

Procedia Computer Science ◽

10.1016/j.procs.2021.08.084 ◽

2021 ◽

Vol 192 ◽

pp. 812-821

Author(s):

Ons Meddeb ◽

Mohsen Maraoui ◽

Mounir Zrigui

Keyword(s):

Arabic Text ◽

Text Documents

Download Full-text

A Novel Hybrid Genetic-Whale Optimization Model for Ontology Learning from Arabic Text

Algorithms ◽

10.3390/a12090182 ◽

2019 ◽

Vol 12 (9) ◽

pp. 182 ◽

Cited By ~ 1

Author(s):

Ghoniem ◽

Alhelwa ◽

Shaalan

Keyword(s):

Optimization Algorithm ◽

Machine Learning Algorithms ◽

Whale Optimization Algorithm ◽

Arabic Text ◽

Ontology Learning ◽

Large Space ◽

Text Documents ◽

Basic Task ◽

Whale Optimization ◽

Path Distance

Ontologies are used to model knowledge in several domains of interest, such as the biomedical domain. Conceptualization is the basic task for ontology building. Concepts are identified, and then they are linked through their semantic relationships. Recently, ontologies have constituted a crucial part of modern semantic webs because they can convert a web of documents into a web of things. Although ontology learning generally occupies a large space in computer science, Arabic ontology learning, in particular, is underdeveloped due to the Arabic language’s nature as well as the profundity required in this domain. The previously published research on Arabic ontology learning from text falls into three categories: developing manually hand-crafted rules, using ordinary supervised/unsupervised machine learning algorithms, or a hybrid of these two approaches. The model proposed in this work contributes to Arabic ontology learning in two ways. First, a text mining algorithm is proposed for extracting concepts and their semantic relations from text documents. The algorithm calculates the concept frequency weights using the term frequency weights. Then, it calculates the weights of concept similarity using the information of the ontology structure, involving (1) the concept’s path distance, (2) the concept’s distribution layer, and (3) the mutual parent concept’s distribution layer. Then, feature mapping is performed by assigning the concepts’ similarities to the concept features. Second, a hybrid genetic-whale optimization algorithm was proposed to optimize ontology learning from Arabic text. The operator of the G-WOA is a hybrid operator integrating GA’s mutation, crossover, and selection processes with the WOA’s processes (encircling prey, attacking of bubble-net, and searching for prey) to fulfill the balance between both exploitation and exploration, and to find the solutions that exhibit the highest fitness. For evaluating the performance of the ontology learning approach, extensive comparisons are conducted using different Arabic corpora and bio-inspired optimization algorithms. Furthermore, two publicly available non-Arabic corpora are used to compare the efficiency of the proposed approach with those of other languages. The results reveal that the proposed genetic-whale optimization algorithm outperforms the other compared algorithms across all the Arabic corpora in terms of precision, recall, and F-score measures. Moreover, the proposed approach outperforms the state-of-the-art methods of ontology learning from Arabic and non-Arabic texts in terms of these three measures.

Download Full-text