Web page classification based on a binary hierarchical classifier for multi-class support vector machines

2013 ◽  
Author(s):  
Cunhe Li ◽  
Guangqing Wang

Focused Crawler collects domain specific web page from the internet. However, the performance of focused web crawler depends upon the multidimensional nature of the web page. This paper presents a comprehensive analysis of recent web page classifiers for focused crawlers and also explores the impact of web-based feature in collaboration with web classifier. It also evaluates the performance of classification technique such as Support vector machine, Naive Bayes, Linear Regression and Random Forest over web page classification. Along with that it examines the impact of web feature i.e. anchor text, Page content and link over web page classification. Finally the paper yield interesting result about the collective response of web feature and classification technique for web page classification as a relevant class and irrelevant class.


2015 ◽  
Vol 713-715 ◽  
pp. 2312-2316
Author(s):  
Feng Chen ◽  
Rong Chen ◽  
Gen Cheng Wang

A Chinese web-page classification algorithm based on SVM including the important aspects of text preprocessing, feature selection and multiple-Classification algorithm. In this paper, based on the analyses of features of Web documents, this paper does research the approach of classification in Support Vector Machine (SVM) and select of Kernel function. Furthermore, a web-page classification model and algorithm that is based on Binary Tree SVM is presented. The experiments show that it not only reduces the size of train set, but also has very high training efficiency. Its precision and recall are better.


Sign in / Sign up

Export Citation Format

Share Document