scholarly journals Virtual web for PageRank computing

2018 ◽  
Author(s):  
◽  
Bo Song

The enormous size and fast-evolving nature of World-Wide-Web has been demanding an even more efficient PageRank updating algorithm. Web evolution may involve two kinds: (1) link structure modification; (2) page insertion/deletion. When the web evolution is restricted to only link insertion/deletion, we demonstrate the benefit of using the previous PageRank to initialize the current PageRank computation, theoretically and experimentally. When page insertion/deletion occurs, how to effectively use the previous PageRank information to facilitate the current PageRank computation has long been a challenge. To tackle the general case, a so-called "virtual web" is introduced through adding the inserted nodes to the previous web along with some specific "in-home" link structure, where in-links from the previous web and out-links to the previous web are excluded. Through the virtual web, we are able to work out a virtual initialization, which can be efficiently used to calculate the current PageRank. The introduced virtual initialization is "unbiased", that assumes least under available knowledge. The virtual web is then integrated with the Power-Iteration and Gauss-Southwell method to solve the node insertion/deletion problem, which are named as Virtual Web Power-Iteration (VWPI) method and Virtual Web Gauss-Southwell (VWGS) method, respectively. Further, we proposed an optimized approach based on VWGS method for updating node insertions. The experiment result shows that the VWGS algorithm significantly outperformed the conventional PageRank computation based on the original model. On the dataset Twitter-2010 with 42M nodes and 1.5B edges, for a perturbation of 400k node and 14 million link insertions plus deletions at one time, our algorithm is about 20 times faster on number of iterations and 3 times faster on running-time in comparison to the Gauss-Southwell method starting from scratch. On the soc-LiveJournal dataset with up to a 20% node insertion, the optimized VWGS method received another 28% gain comparing to the original VWGS method. To compare with the prior work proposed by Ohsaka et al. in [32], our method is 1800x faster per link insertion/deletion on the Twitter-2010 dataset under similar experiment environment.

Author(s):  
Anthony D. Andre

This paper provides an overview of the various human factors and ergonomics (HF/E) resources on the World Wide Web (WWW). A list of the most popular and useful HF/E sites will be provided, along with several critical guidelines relevant to using the WWW. The reader will gain a clear understanding of how to find HF/E information on the Web and how to successfully use the Web towards various HF/E professional consulting activities. Finally, we consider the ergonomic implications of surfing the Web.


2005 ◽  
Vol 11 (3) ◽  
pp. 278-281 ◽  

Following is a list of microscopy-related meetings and courses. The editors would greatly appreciate input to this list via the electronic submission form found in the MSA World-Wide Web page at http://www.msa.microscopy.com. We will gladly add hypertext links to the notice on the web and insert a listing of the meeting in the next issue of the Journal. Send comments and questions to JoAn Hudson, [email protected] or Nestor Zaluzec, [email protected]. Please furnish the following information (any additional information provided will be edited as required and printed on a space-available basis):


2017 ◽  
Vol 4 (1) ◽  
pp. 95-110 ◽  
Author(s):  
Deepika Punj ◽  
Ashutosh Dixit

In order to manage the vast information available on web, crawler plays a significant role. The working of crawler should be optimized to get maximum and unique information from the World Wide Web. In this paper, architecture of migrating crawler is proposed which is based on URL ordering, URL scheduling and document redundancy elimination mechanism. The proposed ordering technique is based on URL structure, which plays a crucial role in utilizing the web efficiently. Scheduling ensures that URLs should go to optimum agent for downloading. To ensure this, characteristics of both agents and URLs are taken into consideration for scheduling. Duplicate documents are also removed to make the database unique. To reduce matching time, document matching is made on the basis of their Meta information only. The agents of proposed migrating crawler work more efficiently than traditional single crawler by providing ordering and scheduling of URLs.


2021 ◽  
Author(s):  
Michael Dick

Since it was first formally proposed in 1990 (and since the first website was launched in 1991), the World Wide Web has evolved from a collection of linked hypertext documents residing on the Internet, to a "meta-medium" featuring platforms that older media have leveraged to reach their publics through alternative means. However, this pathway towards the modernization of the Web has not been entirely linear, nor will it proceed as such. Accordingly, this paper problematizes the notion of "progress" as it relates to the online realm by illuminating two distinct perspectives on the realized and proposed evolution of the Web, both of which can be grounded in the broader debate concerning technological determinism versus the social construction of technology: on the one hand, the centralized and ontology-driven shift from a human-centred "Web of Documents" to a machine-understandable "Web of Data" or "Semantic Web", which is supported by the Web's inventor, Tim Berners-Lee, and the organization he heads, the World Wide Web Consortium (W3C); on the other, the decentralized and folksonomy-driven mechanisms through which individuals and collectives exert control over the online environment (e.g. through the social networking applications that have come to characterize the contemporary period of "Web 2.0"). Methodologically, the above is accomplished through a sustained exploration of theory derived from communication and cultural studies, which discursively weaves these two viewpoints together with a technical history of recent W3C projects. As a case study, it is asserted that the forward slashes contained in a Uniform Resource Identifier (URI) were a social construct that was eventually rendered extraneous by the end-user community. By focusing On the context of the technology itself, it is anticipated that this paper will contribute to the broader debate concerning the future of the Web and its need to move beyond a determinant "modernization paradigm" or over-arching ontology, as well as advance the potential connections that can be cultivated with cognate disciplines.


Author(s):  
Punam Bedi ◽  
Neha Gupta ◽  
Vinita Jindal

The World Wide Web is a part of the Internet that provides data dissemination facility to people. The contents of the Web are crawled and indexed by search engines so that they can be retrieved, ranked, and displayed as a result of users' search queries. These contents that can be easily retrieved using Web browsers and search engines comprise the Surface Web. All information that cannot be crawled by search engines' crawlers falls under Deep Web. Deep Web content never appears in the results displayed by search engines. Though this part of the Web remains hidden, it can be reached using targeted search over normal Web browsers. Unlike Deep Web, there exists a portion of the World Wide Web that cannot be accessed without special software. This is known as the Dark Web. This chapter describes how the Dark Web differs from the Deep Web and elaborates on the commonly used software to enter the Dark Web. It highlights the illegitimate and legitimate sides of the Dark Web and specifies the role played by cryptocurrencies in the expansion of Dark Web's user base.


Author(s):  
K.G. Srinivasa ◽  
Anil Kumar Muppalla ◽  
Varun A. Bharghava ◽  
M. Amulya

In this paper, the authors discuss the MapReduce implementation of crawler, indexer and ranking algorithms in search engines. The proposed algorithms are used in search engines to retrieve results from the World Wide Web. A crawler and an indexer in a MapReduce environment are used to improve the speed of crawling and indexing. The proposed ranking algorithm is an iterative method that makes use of the link structure of the Web and is developed using MapReduce framework to improve the speed of convergence of ranking the WebPages. Categorization is used to retrieve and order the results according to the user choice to personalize the search. A new score is introduced in this paper that is associated with each WebPage and is calculated using user’s query and number of occurrences of the terms in the query in the document corpus. The experiments are conducted on Web graph datasets and the results are compared with the serial versions of crawler, indexer and ranking algorithms.


2003 ◽  
pp. 299-330 ◽  
Author(s):  
Carmine Scavo

The World Wide Web (Web) has been widely adopted by local governments as a way to interact with local residents. The promise and reality of Web applications are explored in this chapter. Four types of Web utilizations are analyzed—bulletin board applications; promotion applications; service delivery applications; and citizen input applications. A survey of 145 municipal and county government websites originally conducted in 1998 was replicated in 2002. These data are used to examine how local governments are actually using the Web and to examine the evolution of Web usage over the four years between the first and second survey. The chapter concludes that local governments have made progress in incorporating many of the features of the Web but that they have a long way to go in realizing its full promise.


Author(s):  
Adélia Gouveia ◽  
Jorge Cardoso

The World Wide Web (WWW) emerged in 1989, developed by Tim Berners-Lee who proposed to build a system for sharing information among physicists of the CERN (Conseil Européen pour la Recherche Nucléaire), the world’s largest particle physics laboratory. Currently, the WWW is primarily composed of documents written in HTML (hyper text markup language), a language that is useful for visual presentation (Cardoso & Sheth, 2005). HTML is a set of “markup” symbols contained in a Web page intended for display on a Web browser. Most of the information on the Web is designed only for human consumption. Humans can read Web pages and understand them, but their inherent meaning is not shown in a way that allows their interpretation by computers (Cardoso & Sheth, 2006). Since the visual Web does not allow computers to understand the meaning of Web pages (Cardoso, 2007), the W3C (World Wide Web Consortium) started to work on a concept of the Semantic Web with the objective of developing approaches and solutions for data integration and interoperability purpose. The goal was to develop ways to allow computers to understand Web information. The aim of this chapter is to present the Web ontology language (OWL) which can be used to develop Semantic Web applications that understand information and data on the Web. This language was proposed by the W3C and was designed for publishing, sharing data and automating data understood by computers using ontologies. To fully comprehend OWL we need first to study its origin and the basic blocks of the language. Therefore, we will start by briefly introducing XML (extensible markup language), RDF (resource description framework), and RDF Schema (RDFS). These concepts are important since OWL is written in XML and is an extension of RDF and RDFS.


Author(s):  
August-Wilhelm Scheer

The emergence of what we call today the World Wide Web, the WWW, or simply the Web, dates back to 1989 when Tim Berners-Lee proposed a hypertext system to manage information overload at CERN, Switzerland (Berners-Lee, 1989). This article outlines how his approaches evolved into the Web that drives today’s information society and explores its full potentials still ahead. The formerly known wide-area hypertext information retrieval initiative quickly gained momentum due to the fast adoption of graphical browser programs and standardization activities of the World Wide Web Consortium (W3C). In the beginning, based only on the standards of HTML, HTTP, and URL, the sites provided by the Web were static, meaning the information stayed unchanged until the original publisher decided for an update. For a long time, the WWW, today referred to as Web 1.0, was understood as a technical mean to publish information to a vast audience across time and space. Data was kept locally and Web sites were only occasionally updated by uploading files from the client to the Web server. Application software was limited to local desktops and operated only on local data. With the advent of dynamic concepts on server-side (script languages like hypertext preprocessor (PHP) or Perl and Web applications with JSP or ASP) and client-side (e.g., JavaScript), the WWW became more dynamic. Server-side content management systems (CMS) allowed editing Web sites via the browser during run-time. These systems interact with multiple users through PHP-interfaces that push information into server-side databases (e.g., mySQL) which again feed Web sites with content. Thus, the Web became accessible and editable not only for programmers and “techies” but also for the common user. Yet, technological limitations such as slow Internet connections, consumer-unfriendly Internet rates, and poor multimedia support still inhibited a mass-usage of the Web. It needed broad-band Internet access, flat rates, and digitalized media processing to catch on.


Author(s):  
Bill Karakostas ◽  
Yannis Zorgios

Chapter II presented the main concepts underlying business services. Ultimately, as this book proposes, business services need to be decomposed into networks of executable Web services. Web services are the primary software technology available today that closely matches the characteristics of business services. To understand the mapping from business to Web services, we need to understand the fundamental characteristics of the latter. This chapter therefore will introduce the main Web services concepts and standards. It does not intend to be a comprehensive description of all standards applicable to Web services, as many of them are still in a state of flux. It focuses instead on the more important and stable standards. All such standards are fully and precisely defined and maintained by the organizations that have defined and endorsed them, such as the World Wide Web Consortium (http://w3c. org), the OASIS organization (http://www.oasis-open.org) and others. We advise readers to visit periodically the Web sites describing the various standards to obtain the up to date versions.


Sign in / Sign up

Export Citation Format

Share Document