An Ant Colony Optimization Based Feature Selection for Web Page Classification

The Scientific World JOURNAL ◽

10.1155/2014/649260 ◽

2014 ◽

Vol 2014 ◽

pp. 1-16 ◽

Cited By ~ 22

Author(s):

Esra Saraç ◽

Selma Ayşe Özel

Keyword(s):

Feature Selection ◽

Ant Colony Optimization ◽

Information Gain ◽

Classification Systems ◽

Ant Colony ◽

Web Pages ◽

Web Page ◽

Web Page Classification ◽

The Web ◽

Page Classification

The increased popularity of the web has caused the inclusion of huge amount of information to the web, and as a result of this explosive information growth, automated web page classification systems are needed to improve search engines’ performance. Web pages have a large number of features such as HTML/XML tags, URLs, hyperlinks, and text contents that should be considered during an automated classification process. The aim of this study is to reduce the number of features to be used to improve runtime and accuracy of the classification of web pages. In this study, we used an ant colony optimization (ACO) algorithm to select the best features, and then we applied the well-known C4.5, naive Bayes, andknearest neighbor classifiers to assign class labels to web pages. We used the WebKB and Conference datasets in our experiments, and we showed that using the ACO for feature selection improves both accuracy and runtime performance of classification. We also showed that the proposed ACO based algorithm can select better features with respect to the well-known information gain and chi square feature selection methods.

Download Full-text

Automatic Web Page Classification System with Improved Accuracy

Webology ◽

10.14704/web/v18i2/web18318 ◽

2021 ◽

Vol 18 (2) ◽

pp. 225-242

Author(s):

Chait hra ◽

Dr.G.M. Lingaraju ◽

Dr.S. Jagannatha

Keyword(s):

Research Work ◽

Web Pages ◽

Automated Classification ◽

Classification Methods ◽

Web Page ◽

Web Page Classification ◽

Chi Squared ◽

The Web ◽

Page Classification

Nowadays, the Internet contain s a wide variety of online documents, making finding useful information about a given subject impossible, as well as retrieving irrelevant pages. Web document and page recognition software is useful in a variety of fields, including news, medicine, and fitness, research, and information technology. To enhance search capability, a large number of web page classification methods have been proposed, especially for news web pages. Furthermore existing classification approaches seek to distinguish news web pages while still reducing the high dimensionality of features derived from these pages. Due to the lack of automated classification methods, this paper focuses on the classification of news web pages based on their scarcity and importance. This work will establish different models for the identification and classification of the web pages. The data sets used in this paper were collected from popular news websites. In the research work we have used BBC dataset that has five predefined categories. Initially the input source can be preprocessed and the errors can be eliminated. Then the features can be extracted depend upon the web page reviews using Term frequency-inverse document frequency vectorization. In the work 2225 documents are represented with the 15286 features, which represents the tf-idf score for different unigrams and bigrams. This type of the representation is not only used for classification task also helpful to analyze the dataset. Feature selection is done by using the chi-squared test which will be in the task of finding the terms that are most correlated with each of the categories. Then the pointed features can be selected using chi-squared test. Finally depend upon the classifier the web page can be classified. The results showed that list has obtained the highest percentage, which reflect its effectiveness on the classification of web pages.

Download Full-text

Survey on Optimization Algorithms Used for Feature Selection Techniques in Web Page Classification

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2019.8112 ◽

2019 ◽

Vol 16 (2) ◽

pp. 384-388 ◽

Cited By ~ 2

Author(s):

K. S. Ramanujam ◽

K. David

Keyword(s):

Machine Learning ◽

Web Mining ◽

Research Area ◽

Web Pages ◽

Web Page ◽

Web Page Classification ◽

Significant Research ◽

Multiple Classifier ◽

The Web ◽

Page Classification

Web page classification refers to one of the significant research are in the web mining domain. Enormous quantity of data existing in the web demands the essential development of various effective and robust techniques to undergo web mining task that involves the process to categorizing the web page based on the data labels. It also includes various other tasks such as web crawling, analysis of web links and contextual advertising process. Existing machine learning and data mining techniques are being efficiently used for various web mining processes which include classification of web pages. Using of multiple classifier techniques are most promising research area while considering machine learning that works on the base of merging various classifiers with difference in base classifier and/or dataset distribution. With this several classification models are constructed that is highly robust in nature. This review paper, comparison has been done between FA, PSO, ACO, GA and IWT, to evaluate best fit algorithm for classifying web pages.

Download Full-text

Full Content-based Web Page Classification Methods by using Deep Neural Networks

Statistics Optimization & Information Computing ◽

10.19139/soic-2310-5070-1056 ◽

2021 ◽

Vol 9 (4) ◽

pp. 963-973

Author(s):

Suleyman Suleymanzade ◽

Fargana Abdullayeva

Keyword(s):

Image Data ◽

Web Pages ◽

Web Page ◽

Common View ◽

Web Page Classification ◽

Aggregation Techniques ◽

Information Retrieval Systems ◽

Huge Impact ◽

The Web ◽

Page Classification

The quality of the web page classification process has a huge impact on information retrieval systems. In this paper, we proposed to combine the results of text and image data classifiers to get an accurate representation of the web pages. To get and analyse the data we created the complicated classifier system with data miner, text classifier, and aggregator. The process of image and text data classification has been achieved by the deep learning models. In order to represent the common view onto the web pages, we proposed three aggregation techniques that combine the data from the classifiers.

Download Full-text

Research of Novel Web Page Classifiers and Feature Selection Methods

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.k1011.09811s19 ◽

2019 ◽

Vol 8 (11S) ◽

pp. 51-55

Keyword(s):

Feature Selection ◽

Web Technology ◽

Selection Methods ◽

Web Page ◽

Web Page Classification ◽

Current Trends ◽

Web Information ◽

The World ◽

The Web ◽

Page Classification

The World revolves around the web technology at present. Every year, the Web information are exponentially growing and this information are huge and complex. The web users are difficult to classify and extract useful information from the web, because the Webinformation are noisy, redundant and irrelevant and also misclassified.Many researchers don’t have strongknowledge about the process of web page classification, techniques and methods previously used. The objective of this survey is to convey an outline of the modern techniques of Web page classification. In this survey, the recent papers in this area are selected and explored.Thus this study will help the researchers to obtain the required knowledge about the current trends in web page classification

Download Full-text

Feature Selection with Rough Sets for Web Page Classification

Transactions on Rough Sets II - Lecture Notes in Computer Science ◽

10.1007/978-3-540-27778-1_1 ◽

2004 ◽

pp. 1-13 ◽

Cited By ~ 15

Author(s):

Aijun An ◽

Yanhui Huang ◽

Xiangji Huang ◽

Nick Cercone

Keyword(s):

Feature Selection ◽

Rough Sets ◽

Web Page ◽

Web Page Classification ◽

Page Classification

Download Full-text

A combination approach for Web Page Classification using Page Rank and Feature Selection Technique

International Journal of Computer Theory and Engineering ◽

10.7763/ijcte.2010.v2.259 ◽

2010 ◽

pp. 897-900 ◽

Cited By ~ 9

Author(s):

Sini Shibu ◽

Aishwarya Vishwakarma ◽

Niket Bhargava

Keyword(s):

Feature Selection ◽

Web Page ◽

Feature Selection Technique ◽

Page Rank ◽

Web Page Classification ◽

Selection Technique ◽

Combination Approach ◽

Page Classification

Download Full-text

Feature Selection for Web Page Classification

Web Technologies ◽

10.4018/978-1-60566-982-3.ch078 ◽

2011 ◽

pp. 1462-1477 ◽

Cited By ~ 1

Author(s):

K. Selvakuberan ◽

M. Indra Devi ◽

R. Rajaram

Keyword(s):

Feature Selection ◽

Financial Management ◽

Contextual Information ◽

Information Service ◽

Web Pages ◽

Web Page ◽

Customer Information ◽

Web Access ◽

Feature Selection Techniques ◽

The Web

The World Wide Web serves as a huge, widely distributed, global information service center for news, advertisements, customer information, financial management, education, government, e-commerce and many others. The Web contains a rich and dynamic collection of hyperlink information. The Web page access and usage information provide rich sources for data mining. Web pages are classified based on the content and/or contextual information embedded in them. As the Web pages contain many irrelevant, infrequent, and stop words that reduce the performance of the classifier, selecting relevant representative features from the Web page is the essential preprocessing step. This provides secured accessing of the required information. The Web access and usage information can be mined to predict the authentication of the user accessing the Web page. This information may be used to personalize the information needed for the users and to preserve the privacy of the users by hiding the personal details. The issue lies in selecting the features which represent the Web pages and processing the details of the user needed the details. In this article we focus on the feature selection, issues in feature selections, and the most important feature selection techniques described and used by researchers.

Download Full-text

Web Page Classification with an Ant Colony Algorithm

Lecture Notes in Computer Science - Parallel Problem Solving from Nature - PPSN VIII ◽

10.1007/978-3-540-30217-9_110 ◽

2004 ◽

pp. 1092-1102 ◽

Cited By ~ 18

Author(s):

Nicholas Holden ◽

Alex A. Freitas

Keyword(s):

Ant Colony Algorithm ◽

Ant Colony ◽

Web Page ◽

Web Page Classification ◽

Page Classification

Download Full-text

IMPROVED WEB PAGE IDENTIFICATION METHOD USING NEURAL NETWORKS

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026811003008 ◽

2011 ◽

Vol 10 (01) ◽

pp. 87-114 ◽

Cited By ~ 2

Author(s):

ALI SELAMAT ◽

ZHI SAM LEE ◽

MOHD AIZAINI MAAROF ◽

SITI MARIYAM SHAMSUDDIN

Keyword(s):

Neural Networks ◽

Feature Selection ◽

Weighting Scheme ◽

Web Pages ◽

Web Page ◽

Term Weighting ◽

Web Documents ◽

Web Contents ◽

Feature Selection Approach ◽

The Web

In this paper, an improved web page classification method (IWPCM) using neural networks to identify the illicit contents of web pages is proposed. The proposed IWPCM approach is based on the improvement of feature selection of the web pages using class based feature vectors (CPBF). The CPBF feature selection approach has been calculated by considering the important term's weight for illicit web documents and reduce the dependency of the less important term's weight for normal web documents. The IWPCM approach has been examined using the modified term-weighting scheme by comparing it with several traditional term-weighting schemes for non-illicit and illicit web contents available from the web. The precision, recall, and F1 measures have been used to evaluate the effectiveness of the proposed IWPCM approach. The experimental results have shown that the proposed improved term-weighting scheme has been able to identify the non-illicit and illicit web contents available from the experimental datasets.

Download Full-text

Web Pages Classification with Parliamentary Optimization Algorithm

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194017500188 ◽

2017 ◽

Vol 27 (03) ◽

pp. 499-513 ◽

Cited By ~ 6

Author(s):

Soner Kiziloluk ◽

Ahmet Bedri Ozer

Keyword(s):

Optimization Algorithm ◽

Web Mining ◽

Web Pages ◽

Data Sets ◽

Web Page ◽

Web Documents ◽

Web Page Classification ◽

Using Data ◽

Extract Information ◽

Page Classification

In recent years, data on the Internet has grown exponentially, attaining enormous dimensions. This situation makes it difficult to obtain useful information from such data. Web mining is the process of using data mining techniques such as association rules, classification, clustering, and statistics to discover and extract information from Web documents. Optimization algorithms play an important role in such techniques. In this work, the parliamentary optimization algorithm (POA), which is one of the latest social-based metaheuristic algorithms, has been adopted for Web page classification. Two different data sets (Course and Student) were selected for experimental evaluation, and HTML tags were used as features. The data sets were tested using different classification algorithms implemented in WEKA, and the results were compared with those of the POA. The POA was found to yield promising results compared to the other algorithms. This study is the first to propose the POA for effective Web page classification.

Download Full-text