Link Variable TrustRank for Fighting Web Spam

Construction and Reduction Methods of Web Spam Identification Index System

Recent Patents on Computer Science ◽

10.2174/2213275912666181127130120 ◽

2019 ◽

Vol 12 (3) ◽

pp. 202-211

Author(s):

Yuancheng Li ◽

Rong Huang ◽

Xiangqian Nie

Keyword(s):

Index System ◽

Rapid Development ◽

Recognition Rate ◽

Computing Power ◽

Detection Model ◽

Web Spam ◽

Reduction Methods ◽

Massive Scale ◽

The Index System ◽

The Web

Background: With the rapid development of the Internet, the number of web spam has increased dramatically in recent years, which has wasted search engine storage and computing power on a massive scale. To identify the web spam effectively, the content features, link features, hidden features and quality features of web page are integrated to establish the corresponding web spam identification index system. However, the index system is highly correlation dimension. Methods: An improved method of autoencoder named stacked autoencoder neural network (SAE) is used to realize the reduction of the web spam identification index system. Results: The experiment results show that our method could reduce effectively the index of web spam and significantly improves the recognition rate in the following work. Conclusion: An autoencoder based web spam indexes reduction method is proposed in this paper. The experimental results show that it greatly reduces the temporal and spatial complexity of the future web spam detection model.

Download Full-text

A reference collection for web spam

ACM SIGIR Forum ◽

10.1145/1189702.1189703 ◽

2006 ◽

Vol 40 (2) ◽

pp. 11-24 ◽

Cited By ~ 87

Author(s):

Carlos Castillo ◽

Debora Donato ◽

Luca Becchetti ◽

Paolo Boldi ◽

Stefano Leonardi ◽

...

Keyword(s):

Web Spam ◽

Reference Collection

Download Full-text

Incorporating weight properties in detection of web spam

2012 2nd International Conference on Uncertainty Reasoning and Knowledge Engineering ◽

10.1109/urke.2012.6319540 ◽

2012 ◽

Cited By ~ 1

Author(s):

Alex Goh Kwang Leng ◽

Kumar P. Ravi ◽

Ashutosh Kumar Singh

Keyword(s):

Web Spam

Download Full-text

Countering Web Spam of Link-based Ranking Based on Link Analysis

Procedia Engineering ◽

10.1016/j.proeng.2011.11.2507 ◽

2011 ◽

Vol 23 ◽

pp. 310-315 ◽

Cited By ~ 3

Author(s):

Hongwei Wang ◽

Yuankai Li ◽

Kaiqiang Guo

Keyword(s):

Link Analysis ◽

Web Spam

Download Full-text

Cross-lingual web spam classification

Proceedings of the 22nd International Conference on World Wide Web - WWW '13 Companion ◽

10.1145/2487788.2488139 ◽

2013 ◽

Author(s):

András Garzó ◽

Bálint Daróczy ◽

Tamás Kiss ◽

Dávid Siklósi ◽

András A. Benczúr

Keyword(s):

Web Spam ◽

Cross Lingual

Download Full-text

Comparative Study of Web Spam Detection using Data Mining

International Journal of Computer Applications ◽

10.5120/11680-6493 ◽

2013 ◽

Vol 68 (18) ◽

pp. 26-29

Author(s):

Chirag Nathwani ◽

Viralkumar Prajapati ◽

Deven Agravat

Keyword(s):

Data Mining ◽

Comparative Study ◽

Spam Detection ◽

Web Spam ◽

Using Data

Download Full-text

Web Spam Detection by Learning from Small Labeled Samples

International Journal of Computer Applications ◽

10.5120/7924-0993 ◽

2012 ◽

Vol 50 (21) ◽

pp. 1-5 ◽

Cited By ~ 5

Author(s):

Jaber Karimpour ◽

Ali A. Noroozi ◽

Somayeh Alizadeh

Keyword(s):

Spam Detection ◽

Web Spam

Download Full-text

An Unsupervised Model to detect Web Spam based on Qualified Link Analysis and Language Models

International Journal of Computer Applications ◽

10.5120/10455-5163 ◽

2013 ◽

Vol 63 (4) ◽

pp. 33-37 ◽

Cited By ~ 1

Author(s):

Shrijina Sreenivasan ◽

B. Lakshmipathi

Keyword(s):

Link Analysis ◽

Language Models ◽

Web Spam

Download Full-text

Avaliação de Desempenho de Métodos de Classiﬁcação Aplicados na Identiﬁcação de Spam Hosts

iSys - Brazilian Journal of Information Systems ◽

10.5753/isys.2012.210 ◽

2012 ◽

Vol 5 ◽

Author(s):

Renato Moraes Silva ◽

Alex Garcia Vaz ◽

Tiago Agostinho Almeida ◽

Akebo Yamakami

Keyword(s):

Web Spam

A web vem se tornando cada vez mais importante para seus usuários, tanto como fonte de diversão, comunicação, pesquisa, notícias e comércio. Consequentemente, os sites concorrem entre si para atrair a atenção dos usuários, sendo que muitos ganham maior visibilidade através de estratégias que enganam os motores de busca. Esses sites, conhecidos como web spam, causam prejuízos pessoais e econômicos aos usuários. Diante desse cenário, este trabalho apresenta uma análise de desempenho de diversas técnicas de aprendizagem de máquina aplicadas na detecção automática de servidores web que propagam web spam. Por meio de uma validação estatística dos resultados observou-se as técnicas de bagging de árvores de decisão, redes neurais perceptron de múltiplas camadas, floresta aleatória e boosting adaptativo de árvores de decisão são promissoras na tarefa de detecção de spam hosts.

Download Full-text

Analysis of Web Spam for Non-English Content: Toward More Effective Language-Based Classifiers

PLoS ONE ◽

10.1371/journal.pone.0164383 ◽

2016 ◽

Vol 11 (11) ◽

pp. e0164383 ◽

Cited By ~ 3

Author(s):

Mansour Alsaleh ◽

Abdulrahman Alarifi

Keyword(s):

Web Spam

Download Full-text