Temporal and Spatial Attribute Extraction from Web Documents and Time-Specific Regional Web Search System

Author(s):  
Taro Tezuka ◽  
Katsumi Tanaka
2010 ◽  
Vol 42 (02) ◽  
pp. 577-604 ◽  
Author(s):  
Yana Volkovich ◽  
Nelly Litvak

PageRank with personalization is used in Web search as an importance measure for Web documents. The goal of this paper is to characterize the tail behavior of the PageRank distribution in the Web and other complex networks characterized by power laws. To this end, we model the PageRank as a solution of a stochastic equationwhere theRis are distributed asR. This equation is inspired by the original definition of the PageRank. In particular,Nmodels the number of incoming links to a page, andBstays for the user preference. Assuming thatNorBare heavy tailed, we employ the theory of regular variation to obtain the asymptotic behavior ofRunder quite general assumptions on the involved random variables. Our theoretical predictions show good agreement with experimental data.


Author(s):  
Xiannong Meng

This chapter surveys various technologies involved in a Web search engine with an emphasis on performance analysis issues. The aspects of a general-purpose search engine covered in this survey include system architectures, information retrieval theories as the basis of Web search, indexing and ranking of Web documents, relevance feedback and machine learning, personalization, and performance measurements. The objectives of the chapter are to review the theories and technologies pertaining to Web search, and help us understand how Web search engines work and how to use the search engines more effectively and efficiently.


2015 ◽  
Vol 2015 ◽  
pp. 1-9 ◽  
Author(s):  
R. Suganya Devi ◽  
D. Manjula ◽  
R. K. Siddharth

Web Crawling has acquired tremendous significance in recent times and it is aptly associated with the substantial development of the World Wide Web. Web Search Engines face new challenges due to the availability of vast amounts of web documents, thus making the retrieved results less applicable to the analysers. However, recently, Web Crawling solely focuses on obtaining the links of the corresponding documents. Today, there exist various algorithms and software which are used to crawl links from the web which has to be further processed for future use, thereby increasing the overload of the analyser. This paper concentrates on crawling the links and retrieving all information associated with them to facilitate easy processing for other uses. In this paper, firstly the links are crawled from the specified uniform resource locator (URL) using a modified version of Depth First Search Algorithm which allows for complete hierarchical scanning of corresponding web links. The links are then accessed via the source code and its metadata such as title, keywords, and description are extracted. This content is very essential for any type of analyser work to be carried on the Big Data obtained as a result of Web Crawling.


Sign in / Sign up

Export Citation Format

Share Document