Building enriched document representations using aggregated anchor text

Author(s):  
Donald Metzler ◽  
Jasmine Novak ◽  
Hang Cui ◽  
Srihari Reddy
Author(s):  
Sebastian Arnold ◽  
Betty van Aken ◽  
Paul Grundmann ◽  
Felix A. Gers ◽  
Alexander Löser

Author(s):  
Vassilis Plachouras
Keyword(s):  

2008 ◽  
Vol 18 (1) ◽  
pp. 123-138 ◽  
Author(s):  
Milos Radovanovic ◽  
Mirjana Ivanovic

Motivated by applying Text Categorization to classification of Web search results, this paper describes an extensive experimental study of the impact of bag-of- words document representations on the performance of five major classifiers - Na?ve Bayes, SVM, Voted Perceptron, kNN and C4.5. The texts, representing short Web-page descriptions sorted into a large hierarchy of topics, are taken from the dmoz Open Directory Web-page ontology, and classifiers are trained to automatically determine the topics which may be relevant to a previously unseen Web-page. Different transformations of input data: stemming, normalization, logtf and idf, together with dimensionality reduction, are found to have a statistically significant improving or degrading effect on classification performance measured by classical metrics - accuracy, precision, recall, F1 and F2. The emphasis of the study is not on determining the best document representation which corresponds to each classifier, but rather on describing the effects of every individual transformation on classification, together with their mutual relationships. .


2019 ◽  
Vol 477 ◽  
pp. 15-29 ◽  
Author(s):  
Donghwa Kim ◽  
Deokseong Seo ◽  
Suhyoun Cho ◽  
Pilsung Kang

Author(s):  
Bo Xu ◽  
Hongfei Lin ◽  
Lin Wang ◽  
Yuan Lin ◽  
Kan Xu ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document