web robot detection
Recently Published Documents


TOTAL DOCUMENTS

17
(FIVE YEARS 1)

H-INDEX

8
(FIVE YEARS 0)

2021 ◽  
Vol 2010 (1) ◽  
pp. 012161
Author(s):  
Xue Chen ◽  
Yang Song ◽  
Wei Xiong ◽  
Yutao Lu ◽  
Xingen Wang

2020 ◽  
Vol 50 (11) ◽  
pp. 4017-4028
Author(s):  
Athanasios Lagopoulos ◽  
Grigorios Tsoumakas

Author(s):  
A. A. Menshchikov ◽  
Yu. A. Gatchin

Today modern researches suggest that robotic traffic on web resources prevails over user traffic in terms of volume and intensity. Web robots threaten data privacy, copyright, as well as affect performance, security, and affect statistics. There is a need to develop efficient detection and protection methods against web robots. Existing techniques involve the use of syntactic and analytical processing of web server logs to detect web robots. This article proposes to analyze the graph of visits of web robots, taking into account the time, as well as the connectivity of topics of the visited pages. In the article we provide an algorithm for data selection and cleansing, extracting semantic features of pages on a web resource, as well as the proposed detection parameters. We describe in detail the process of forming the ground truth and the principles of existing sessions labelling to the legit and robotic types. It is proposed to use the capabilities of a web server to identify sessions uniquely. The clustering procedure and the selection of a suitable classification model are discussed. For each of the studied models, the selection of hyper parameters and cross-validation of the results are made. The analysis of performance and detection accuracy, as well as comparison with the results of existing approaches is provided. Empirical results of the proposed method on web-resources show that this method leads to better web robot detection accuracy and precision comparing with the existing approaches.


2017 ◽  
Vol 87 ◽  
pp. 129-140 ◽  
Author(s):  
Mahdieh Zabihimayvan ◽  
Reza Sadeghi ◽  
H. Nathan Rude ◽  
Derek Doran

2016 ◽  
Vol 33 (6) ◽  
pp. 592-606 ◽  
Author(s):  
Derek Doran ◽  
Swapna S. Gokhale

2016 ◽  
Vol 34 (3) ◽  
pp. 500-520 ◽  
Author(s):  
Joseph W. Greene

Purpose The purpose of this paper is to investigate the impact and techniques for mitigating the effects of web robots on usage statistics collected by Open Access (OA) institutional repositories (IRs). Design/methodology/approach A close review of the literature provides a comprehensive list of web robot detection techniques. Reviews of system documentation and open source code are carried out along with personal interviews to provide a comparison of the robot detection techniques used in the major IR platforms. An empirical test based on a simple random sample of downloads with 96.20 per cent certainty is undertaken to measure the accuracy of an IR’s web robot detection at a large Irish University. Findings While web robot detection is not ignored in IRs, there are areas where the two main systems could be improved. The technique tested here is found to have successfully detected 94.18 per cent of web robots visiting the site over a two-year period (recall), with a precision of 98.92 per cent. Due to the high level of robot activity in repositories, correctly labelling more robots has an exponential effect on the accuracy of usage statistics. Research limitations/implications This study is performed on one repository using a single system. Future studies across multiple sites and platforms are needed to determine the accuracy of web robot detection in OA repositories generally. Originality/value This is the only study to date to have investigated web robot detection in IRs. It puts forward the first empirical benchmarking of accuracy in IR usage statistics.


2012 ◽  
Vol 38 (2) ◽  
pp. 118-126 ◽  
Author(s):  
Shinil Kwon ◽  
Young-Gab Kim ◽  
Sungdeok Cha

Sign in / Sign up

Export Citation Format

Share Document