Stemming Versus Light Stemming as Feature Selection Techniques for Arabic Text Categorization

Comparative Study of Arabic Text Categorization Using Feature Selection Techniques and Four Classifier Models

Proceedings of the 13th International Conference on Intelligent Systems: Theories and Applications ◽

10.1145/3419604.3419778 ◽

2020 ◽

Author(s):

Said Bahassine ◽

Abdellah Madani ◽

Mohamed Kissi

Keyword(s):

Feature Selection ◽

Comparative Study ◽

Text Categorization ◽

Arabic Text ◽

Feature Selection Techniques

Download Full-text

A lazy feature selection method for multi-label classification

Intelligent Data Analysis ◽

10.3233/ida-194878 ◽

2021 ◽

Vol 25 (1) ◽

pp. 21-34

Author(s):

Rafael B. Pereira ◽

Alexandre Plastino ◽

Bianca Zadrozny ◽

Luiz H.C. Merschmann

Keyword(s):

Feature Selection ◽

Text Categorization ◽

Feature Selection Method ◽

Selection Method ◽

Video Classification ◽

Classification Problems ◽

Class Label ◽

New Feature ◽

Feature Selection Techniques ◽

Biomolecular Analysis

In many important application domains, such as text categorization, biomolecular analysis, scene or video classification and medical diagnosis, instances are naturally associated with more than one class label, giving rise to multi-label classification problems. This has led, in recent years, to a substantial amount of research in multi-label classification. More specifically, feature selection methods have been developed to allow the identification of relevant and informative features for multi-label classification. This work presents a new feature selection method based on the lazy feature selection paradigm and specific for the multi-label context. Experimental results show that the proposed technique is competitive when compared to multi-label feature selection techniques currently used in the literature, and is clearly more scalable, in a scenario where there is an increasing amount of data.

Download Full-text

Investigation of Feature Selection Techniques on Performance of Automatic Text Categorization

Data, Engineering and Applications ◽

10.1007/978-981-13-6347-4_7 ◽

2019 ◽

pp. 71-82

Author(s):

Dilip Singh Sisodia ◽

Ankit Shukla

Keyword(s):

Feature Selection ◽

Text Categorization ◽

Feature Selection Techniques ◽

Automatic Text

Download Full-text

Enhanced Filter Feature Selection Methods for Arabic Text Categorization

International Journal of Information Retrieval Research ◽

10.4018/ijirr.2018040101 ◽

2018 ◽

Vol 8 (2) ◽

pp. 1-24 ◽

Cited By ~ 1

Author(s):

Abdullah Saeed Ghareb ◽

Azuraliza Abu Bakara ◽

Qasem A. Al-Radaideh ◽

Abdul Razak Hamdan

Keyword(s):

Feature Selection ◽

Text Categorization ◽

Selection Process ◽

High Dimensional Data ◽

Relevant Information ◽

High Dimensional ◽

Arabic Text ◽

Relevant Feature ◽

Associative Classification ◽

Selection Methods

The filtering of a large amount of data is an important process in data mining tasks, particularly for the categorization of unstructured high dimensional data. Therefore, a feature selection process is desired to reduce the space of high dimensional data into small relevant subset dimensions that represent the best features for text categorization. In this article, three enhanced filter feature selection methods, Category Relevant Feature Measure, Modified Category Discriminated Measure, and Odd Ratio2, are proposed. These methods combine the relevant information about features in both the inter- and intra-category. The effectiveness of the proposed methods with Naïve Bayes and associative classification is evaluated by traditional measures of text categorization, namely, macro-averaging of precision, recall, and F-measure. Experiments are conducted on three Arabic text datasets used for text categorization. The experimental results showed that the proposed methods are able to achieve better and comparable results when compared to 12 well known traditional methods.

Download Full-text

Efficient n-gram construction for text categorization using feature selection techniques

Intelligent Data Analysis ◽

10.3233/ida-205154 ◽

2021 ◽

Vol 25 (3) ◽

pp. 509-525

Author(s):

Maximiliano García ◽

Sebastián Maldonado ◽

Carla Vairetti

Keyword(s):

Feature Selection ◽

Text Classification ◽

Text Categorization ◽

A Priori ◽

Predictive Performance ◽

Online Reviews ◽

Additional Advantage ◽

Novel Approach ◽

N Gram ◽

Feature Selection Techniques

In this paper, we present a novel approach for n-gram generation in text classification. The a-priori algorithm is adapted to prune word sequences by combining three feature selection techniques. Unlike the traditional two-step approach for text classification in which feature selection is performed after the n-gram construction process, our proposal performs an embedded feature elimination during the application of the a-priori algorithm. The proposed strategy reduces the number of branches to be explored, speeding up the process and making the construction of all the word sequences tractable. Our proposal has the additional advantage of constructing a low-dimensional dataset with only the features that are relevant for classification, that can be used directly without the need for a feature selection step. Experiments on text classification datasets for sentiment analysis demonstrate that our approach yields the best predictive performance when compared with other feature selection approaches, while also facilitating a better understanding of the words and phrases that explain a given task; in our case online reviews and ratings in various domains.

Download Full-text

ARABIC TEXT CATEGORIZATION SYSTEM - Using Ant Colony Optimization-based Feature Selection

Proceedings of the Third International Conference on Software and Data Technologies Special Session on Applications in Banking and Finance ◽

10.5220/0001892803840387 ◽

2008 ◽

Keyword(s):

Feature Selection ◽

Ant Colony Optimization ◽

Text Categorization ◽

Ant Colony ◽

Arabic Text ◽

Categorization System

Download Full-text

On the value of filter feature selection techniques in homogeneous ensembles effort estimation

Journal of Software Evolution and Process ◽

10.1002/smr.2343 ◽

2021 ◽

Author(s):

Mohamed Hosni ◽

Ali Idri ◽

Alain Abran

Keyword(s):

Feature Selection ◽

Effort Estimation ◽

Feature Selection Techniques

Download Full-text

Arabic Named Entity Recognition on Social Media based on feature selection techniques usi ng SVM-RFE

2020 Fourth International Conference On Intelligent Computing in Data Sciences (ICDS) ◽

10.1109/icds50568.2020.9268762 ◽

2020 ◽

Author(s):

Brahim AIT BEN ALI ◽

Soukaina MIHI ◽

Ismail EL BAZI ◽

Nabil LAACHFOUBI

Keyword(s):

Social Media ◽

Feature Selection ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Feature Selection Techniques

Download Full-text

Children’s Activity Classification for Domestic Risk Scenarios Using Environmental Sound and a Bayesian Network

Healthcare ◽

10.3390/healthcare9070884 ◽

2021 ◽

Vol 9 (7) ◽

pp. 884

Author(s):

Antonio García-Domínguez ◽

Carlos E. Galván-Tejada ◽

Ramón F. Brena ◽

Antonio A. Aguileta ◽

Jorge I. Galván-Tejada ◽

...

Keyword(s):

Feature Selection ◽

Naive Bayes ◽

Naïve Bayes ◽

Classification Model ◽

Activity Classification ◽

Environmental Sound ◽

Non Invasive ◽

Akaike Criterion ◽

Data Source ◽

Feature Selection Techniques

Children’s healthcare is a relevant issue, especially the prevention of domestic accidents, since it has even been defined as a global health problem. Children’s activity classification generally uses sensors embedded in children’s clothing, which can lead to erroneous measurements for possible damage or mishandling. Having a non-invasive data source for a children’s activity classification model provides reliability to the monitoring system where it is applied. This work proposes the use of environmental sound as a data source for the generation of children’s activity classification models, implementing feature selection methods and classification techniques based on Bayesian networks, focused on the recognition of potentially triggering activities of domestic accidents, applicable in child monitoring systems. Two feature selection techniques were used: the Akaike criterion and genetic algorithms. Likewise, models were generated using three classifiers: naive Bayes, semi-naive Bayes and tree-augmented naive Bayes. The generated models, combining the methods of feature selection and the classifiers used, present accuracy of greater than 97% for most of them, with which we can conclude the efficiency of the proposal of the present work in the recognition of potentially detonating activities of domestic accidents.

Download Full-text

Effective combining of feature selection techniques for machine learning-enabled IoT intrusion detection

Multimedia Tools and Applications ◽

10.1007/s11042-021-10567-y ◽

2021 ◽

Author(s):

Md Arafatur Rahman ◽

A. Taufiq Asyhari ◽

Ong Wei Wen ◽

Husnul Ajra ◽

Yussuf Ahmed ◽

...

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Intrusion Detection ◽

Feature Selection Techniques

Download Full-text