Temporal and Spatial Attribute Extraction from Web Documents and Time-Specific Regional Web Search System

PageRank with personalization is used in Web search as an importance measure for Web documents. The goal of this paper is to characterize the tail behavior of the PageRank distribution in the Web and other complex networks characterized by power laws. To this end, we model the PageRank as a solution of a stochastic equationwhere theRis are distributed asR. This equation is inspired by the original definition of the PageRank. In particular,Nmodels the number of incoming links to a page, andBstays for the user preference. Assuming thatNorBare heavy tailed, we employ the theory of regular variation to obtain the asymptotic behavior ofRunder quite general assumptions on the involved random variables. Our theoretical predictions show good agreement with experimental data.

Download Full-text

A Topic-Specific Web Search System Focusing on Quality Pages

Research and Advanced Technology for Digital Libraries - Lecture Notes in Computer Science ◽

10.1007/978-3-642-15464-5_64 ◽

2010 ◽

pp. 490-493

Author(s):

Ari Pirkola ◽

Tuomas Talvensaari

Keyword(s):

Web Search ◽

Search System

Download Full-text

Web Search Engine Architectures and their Performance Analysis

Handbook of Research on Web Information Systems Quality ◽

10.4018/978-1-59904-847-5.ch028 ◽

2011 ◽

pp. 491-509

Author(s):

Xiannong Meng

Keyword(s):

Performance Analysis ◽

Search Engine ◽

Search Engines ◽

Web Search ◽

General Purpose ◽

Performance Measurements ◽

Web Documents ◽

System Architectures ◽

Web Search Engine ◽

And Performance

This chapter surveys various technologies involved in a Web search engine with an emphasis on performance analysis issues. The aspects of a general-purpose search engine covered in this survey include system architectures, information retrieval theories as the basis of Web search, indexing and ranking of Web documents, relevance feedback and machine learning, personalization, and performance measurements. The objectives of the chapter are to review the theories and technologies pertaining to Web search, and help us understand how Web search engines work and how to use the search engines more effectively and efficiently.

Download Full-text

An Efficient Approach for Web Indexing of Big Data through Hyperlinks in Web Crawling

The Scientific World JOURNAL ◽

10.1155/2015/739286 ◽

2015 ◽

Vol 2015 ◽

pp. 1-9 ◽

Cited By ~ 6

Author(s):

R. Suganya Devi ◽

D. Manjula ◽

R. K. Siddharth

Keyword(s):

Big Data ◽

World Wide ◽

Web Search ◽

Search Algorithm ◽

Source Code ◽

Web Crawling ◽

Uniform Resource Locator ◽

Web Documents ◽

The World ◽

New Challenges

Web Crawling has acquired tremendous significance in recent times and it is aptly associated with the substantial development of the World Wide Web. Web Search Engines face new challenges due to the availability of vast amounts of web documents, thus making the retrieved results less applicable to the analysers. However, recently, Web Crawling solely focuses on obtaining the links of the corresponding documents. Today, there exist various algorithms and software which are used to crawl links from the web which has to be further processed for future use, thereby increasing the overload of the analyser. This paper concentrates on crawling the links and retrieving all information associated with them to facilitate easy processing for other uses. In this paper, firstly the links are crawled from the specified uniform resource locator (URL) using a modified version of Depth First Search Algorithm which allows for complete hierarchical scanning of corresponding web links. The links are then accessed via the source code and its metadata such as title, keywords, and description are extracted. This content is very essential for any type of analyser work to be carried on the Big Data obtained as a result of Web Crawling.

Download Full-text