scholarly journals EFFICIENT PREPROCESSING FOR WEB LOG COMPRESSION

2014 ◽  
pp. 35-42
Author(s):  
Sebastian Deorowicz ◽  
Szymon Grabowski

Web log files, storing user activity on a server, may grow at the pace of hundreds of megabytes a day, or even more, on popular sites. They are usually archived, as it enables further analysis, e.g., for detecting attacks or other server abuse patterns. In this work we present a specialized lossless Apache web log preprocessor and test it with combination of several popular general-purpose compressors. Our method works on individual fields of log data (each storing such information like the client’s IP, date/time, requested file or query, download size in bytes, etc.), and utilizes such compression techniques like finding and extracting common prefixes and suffixes, dictionary-based phrase sequence substitution, move-to-front coding, and more. The test results show the proposed transform improves the average compression ratios 2.70 times in case of gzip and 1.86 times in case of bzip2.

Author(s):  
Dheeraj Ahuja

Today, we spend most of our time online using some form of digital technology (such as search engines, news portals, or social media sites). Our online presence keeps us involved most of the time and provides a lot of information to Internet customers. The development of the web is excellent because every day about a million pages are added. Due to the massive use of the network, the log files of the network increase at a faster rate and the scope becomes enormous. Web Usage Mining uses mining technology on log data to extract user performance, which is used in different applications such as support design, e-commerce, service modification, prefetch, etc. In this paper, we propose a tool that users can use to collect data on their website, and then use this web log data to track user interactions on your website, which helps in targeted communication.


2018 ◽  
Vol 7 (2.12) ◽  
pp. 171
Author(s):  
Jae Kyeong Lee ◽  
Mi Hwan Hyun ◽  
Dong Gu Shin

Background/Objectives: To measure occupancy using transition probability matrix as a data analysis method to predict future requirements for web use. From this study, Executives facing business challenges can enhance the decision-making process for management and can be provided quantified evidence.Methods/Statistical analysis: Transition matrix and transition probability matrix are estimated if web users’ webpage use patterns are tied with frequency, using web log data. Occupancy is forecasted based on a Markov chain model.Findings: Data analysis from the perspective of web log-based marketing mostly focuses on increasing traffic and improving transition rates. However, general-purpose tools such as Google Analytics provide diverse web log data. In assumption of independence on users’ page reload, occupancy can be easily estimated through matrix on page reload (transition). As a result, we obtained slightly different results from the usual method that reported only frequency. In particular, rather than making business decisions with the frequency of absolute concepts, we were able to identify the top priority services through the percentage value of relative concepts.Improvements/Applications: The occupancy prediction using transition matrix is about future prediction based on previous information. However, it differs from marketing techniques in that it is estimated based on probability. In addition, it is able to predict more accurately through a probability model. 


2013 ◽  
Vol 765-767 ◽  
pp. 1092-1097
Author(s):  
Yi Ting Zhang ◽  
Bin Wang ◽  
Zhi Hui Zhang

in order to manage the log information of Windows servers, Linux servers, network devices and security devices in a unified, so as to query log data, analysis and audit log data conveniently, a program is put forward, in which a variety of power system information devices log data be converted into a unified relational model and integrated into the database. The data parsing module uses the Windows Workflow procedure to select, clean and merge the massive log data. The database is created and operated by Microsoft SQL Server 2005 development platform. All of the log files have to be converted into a unified format and saved in centralized storage. Experiments and test results show that the module has a good efficiency of data processing and integration, and it greatly increases the proportion of valid data. It provides supports for efficient log auditing and fault diagnosis in the future.


2012 ◽  
Vol 3 (4) ◽  
pp. 92-94
Author(s):  
SUJATHA PADMAKUMAR ◽  
◽  
Dr.PUNITHAVALLI Dr.PUNITHAVALLI ◽  
Dr.RANJITH Dr.RANJITH

2019 ◽  
Vol 161 ◽  
pp. 493-501
Author(s):  
Suleiman Alsaif ◽  
Alice S Li ◽  
Ben Soh ◽  
Sara Alraddady

2017 ◽  
Vol Volume-1 (Issue-6) ◽  
pp. 477-482
Author(s):  
K. Srinivasa Rao ◽  
Dr. A. Ramesh Babu ◽  
Dr. M. Krishna Murthy ◽  
Keyword(s):  
Log Data ◽  

2013 ◽  
Vol 80 (17) ◽  
pp. 41-43 ◽  
Author(s):  
Jagriti Chand ◽  
Abhishek Singh Chauhan ◽  
Ashish Kumar Shrivastava
Keyword(s):  
Log Data ◽  
Web Log ◽  

Sign in / Sign up

Export Citation Format

Share Document