A Semantic Focused Web Crawler Based on a Knowledge Representation Schema

Julio Hernandez; Heidy M. Marin-Castro; Miguel Morales-Sandoval

doi:10.3390/app10113837

An Overview and Technological Background of Semantic Technologies

Advanced Concepts, Methods, and Applications in Semantic Computing - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-7998-6697-8.ch001 ◽

2021 ◽

pp. 1-21

Author(s):

Reinaldo Padilha França ◽

Ana Carolina Borges Monteiro ◽

Rangel Arthur ◽

Yuzo Iano

Keyword(s):

Semantic Web ◽

Knowledge Representation ◽

Data Representation ◽

Current Data ◽

The Internet ◽

Semantic Technologies ◽

Web Based ◽

Web 3.0 ◽

Technological Advances ◽

The Web

The Semantic Web concept is an extension of the web obtained by adding semantics to the current data representation format. It is considered a network of correlating meanings. It is the result of a combination of web-based conceptions and technologies and knowledge representation. Since the internet has gone through many changes and steps in its web versions 1.0, 2.0, and Web 3.0, this last call of smart web, the concept of Web 3.0, is to be associated with the Semantic Web, since technological advances have allowed the internet to be present beyond the devices that were made exactly with the intention of receiving the connection, not limited to computers or smartphones since it has the concept of reading, writing, and execution off-screen, performed by machines. Therefore, this chapter aims to provide an updated review of Semantic Web and its technologies showing its technological origins and approaching its success relationship with a concise bibliographic background, categorizing and synthesizing the potential of technologies.

Download Full-text

Conclusions

Agency and the Semantic Web ◽

10.1093/oso/9780199292486.003.0014 ◽

2006 ◽

Author(s):

Christopher Walton

Keyword(s):

Semantic Web ◽

Large Scale ◽

Web Search ◽

Web Based ◽

E Learning ◽

Computer Based ◽

Intelligent Devices ◽

Automated Data Integration ◽

The Web

At the start of this book we outlined the challenges of automatic computer based processing of information on the Web. These numerous challenges are generally referred to as the ‘vision’ of the Semantic Web. From the outset, we have attempted to take a realistic and pragmatic view of this vision. Our opinion is that the vision may never be fully realized, but that it is a useful goal on which to focus. Each step towards the vision has provided new insights on classical problems in knowledge representation, MASs, and Web-based techniques. Thus, we are presently in a significantly better position as a result of these efforts. It is sometimes difficult to see the purpose of the Semantic Web vision behind all of the different technologies and acronyms. However, the fundamental purpose of the Semantic Web is essentially large scale and automated data integration. The Semantic Web is not just about providing a more intelligent kind of Web search, but also about taking the results of these searches and combining them in interesting and useful ways. As stated in Chapter 1, the possible applications for the Semantic Web include: automated data mining, e-science experiments, e-learning systems, personalized newspapers and journals, and intelligent devices. The current state of progress towards the Semantic Web vision is summarized in Figure 8.1. This figure shows a pyramid with the human-centric Web at the bottom, sometimes termed the Syntactic Web, and the envisioned Semantic Web at the top. Throughout this book, we have been moving upwards on this pyramid, and it should be clear that a great deal of progress that has been made towards the goal. This progress is indicated by the various stages of the pyramid, which can be summarized as follows: • The lowest stage on the pyramid is the basic Web that should be familiar to everyone. This Web of information is human-centric and contains very little automation. Nonetheless, the Web provides the basic protocols and technologies on which the Semantic Web is founded. Furthermore, the information which is represented on the Web will ultimately be the source of knowledge for the Semantic Web.

Download Full-text

Automatic Detection System of Web-Based Malware for Management-Type SaaS

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.129-131.670 ◽

2010 ◽

Vol 129-131 ◽

pp. 670-674

Author(s):

Xu Jing ◽

Dong Jian He ◽

Lin Sen Zan ◽

Jian Liang Li ◽

Wang Yao

Keyword(s):

Behavior Analysis ◽

Detection System ◽

Service Level ◽

Web Crawler ◽

Web Based ◽

Malicious Behavior ◽

System Administrator ◽

Automatic Detection System ◽

Detecting Method ◽

The Web

In management-type SaaS, user must be permitted to submit tenant’s business data on the SP's server, which may be embedded by the web-based malware. In this paper, we propose the automatic detecting method of web-based malware based on behavior analysis, which can make sure to meet the SLA by detecting the web-based malware actively. First, tenant’s update is downloaded to the bastion host by the web crawler. Second, it detect the behavior that tenant’s update is opened by IE. In order to break the malicious behavior during detecting, the IE has been injected in the DLL. Last, if the sensitive operations happen, the URL is appended to the malicious address database, and at same time the system administrator is informed by the SMS. The result of test is shown that our method can detect the web-based malware accurately. It helps to improve the service level of the management-type SaaS.

Download Full-text

WEBYACHT: A CONCEPT-BASED SEARCH TOOL FOR WWW

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213099000105 ◽

1999 ◽

Vol 08 (02) ◽

pp. 137-156 ◽

Cited By ~ 1

Author(s):

CHING-CHI HSU ◽

CHIA-HUI CHANG

Keyword(s):

Information Search ◽

Relevance Feedback ◽

Web Search ◽

Automatic Assessment ◽

Feedback Mechanisms ◽

Document Ranking ◽

Search Results ◽

Web Information ◽

Search Tool ◽

The Web

This paper describes a Web information search tool called WebYacht. The goal of WebYacht is to solve the problem of imprecise search results in current Web search engines. Due to incomplete information given by users and the diversified information published on the Web, conventional document ranking based on an automatic assessment of document relevance to the query may not be the best approach when little information is given as in most cases. In order to clarify the ambiguity of the short queries given by users, WebYacht adopts cluster-based browsing model as well as relevance feedback to facilitate Web information search. The idea is to have users give two to three times more feedback in the same amount of time that would be required to give feedback for conventional feedback mechanisms. With the assistance of cluster-based representation provided by WebYacht, a lot of browsing labor can be reduced. In this paper, we explain the techniques used in the design of WebYacht and compare the performances of feedback interface designs and to conventional similarity ranking search results.

Download Full-text

WEB GRAPH BASED SEARCH BY USING DENSITY OF KEYWORD AND AGE FACTOR

International Journal of Computer Science and Informatics ◽

10.47893/ijcsi.2013.1124 ◽

2013 ◽

pp. 89-93

Author(s):

GAURAV AGARWAL ◽

SACHI GUPTA ◽

SAURABH MUKHERJEE

Keyword(s):

Search Engine ◽

Web Search ◽

Web Pages ◽

Main Role ◽

Ranking Algorithm ◽

Web Page ◽

Web Crawler ◽

User Requirement ◽

Priority Assignment ◽

The Web

Today, web servers, are the key repositories of the information & internet is the source of getting this information. There is a mammoth data on the Internet. It becomes a difficult job to search out the accordant data. Search Engine plays a vital role in searching the accordant data. A search engine follows these steps: Web crawling by crawler, Indexing by Indexer and Searching by Searcher. Web crawler retrieves information of the web pages by following every link on the site. Which is stored by web search engine then the content of the web page is indexed by the indexer. The main role of indexer is how data can be catch soon as per user requirements. As the client gives a query, Search Engine searches the results corresponding to this query to provide excellent output. Here ambition is to enroot an algorithm for search engine which may response most desirable result as per user requirement. In this a ranking method is used by the search engine to rank the web pages. Various ranking approaches are discussed in literature but in this paper, ranking algorithm is proposed which is based on parent-child relationship. Proposed ranking algorithm is based on priority assignment phase of Heterogeneous Earliest Finish Time (HEFT) Algorithm which is designed for multiprocessor task scheduling. Proposed algorithm works on three on range variable its means the density of keywords, number of successors to the nodes and the age of the web page. Density shows the occurrence of the keyword on the particular web page. Numbers of successors represent the outgoing link to a single web page. Age is the freshness value of the web page. The page which is modified recently is the freshest page and having the smallest age or largest freshness value. Proposed Technique requires that the priorities of each page to be set with the downward rank values & pages are arranged in ascending/ Descending order of their rank values. Experiments show that our algorithm is valuable. After the comparison with Google we find that our Algorithm is performing better. For 70% problems our algorithm is working better than Google.

Download Full-text

Building a New Semantic Social Network Using Semantic Web-Based Techniques

10.54216/fpa.030201 ◽

2021 ◽

pp. 54-65

Author(s):

admin admin ◽

◽

Khlid M. .. ◽

...

Keyword(s):

Semantic Web ◽

Social Networking ◽

Web Application ◽

Web Search ◽

Public Information ◽

Social Networking Site ◽

Semantic Search ◽

Web Based ◽

Social Community ◽

The Web

Most people are more or less related to the web by participating in a kind of social networking site. Semantic Web technology plays a crucial role in these sites as they contain an enormous amount of data about ‎persons, pages, events, places, corporations, etc. This research is a Semantic Web application designed to create a new ‎semantic social community called Socialpedia. It links the already existing social public information to the newly ‎public ones. This information is linked with different information on the web to construct a new immense ‎data container. The resulting data container can be processed using a variety of Semantic Web techniques to produce ‎machine-understandable content. This content shows the promise of using integrated data to improve Web search and ‎Web-scale data analysis, unlike conventional search engines or social ones. This community involves obtaining data ‎from traditional users known as contributors or participants, linking data from existing social networks, extracting ‎structured data in triples using predefined ontologies, and finally querying and inferring such data to obtain ‎meaningful pieces of information. Socailpedia supports all popular functionalities of social networking websites ‎besides the enhanced features of the Semantic Web, providing advanced semantic search that acts as a semantic ‎search engine.

Download Full-text

When scholars use Knowledge-Step Forums to create Web-Compended Guides to the literature of their fields, paradigm-shifts will occur in the processes of knowledge creation and in graduate education.

10.7287/peerj.preprints.1568 ◽

2017 ◽

Author(s):

Don L Jewett

Keyword(s):

English Language ◽

Information Dissemination ◽

Web Search ◽

Information Overload ◽

Scientific Publication ◽

Paradigm Shifts ◽

Senior Faculty ◽

Web Based ◽

Senior Faculty Member ◽

The Web

"Publication forms the core structure supporting the development and transmission of scientific knowledge" (Galbraith2015). Yet, with the WorldWideWeb a dominant part of current scientific publication and information-dissemination, internet "publication" is still paper-based in its style and methods. As will become painfully obvious, such a paper-based "publishing model" is NOT adequate for a Web-based world. Consider that in 2011, an estimated 5,000 peer-reviewed scientific articles were published per day (Outsell2013), and that in 2014 just the English-language scholarly publications on the Web were about 4,900 per day. In 1980, the distinguished scientist Garrett Hardin wrote [Hardin1980]:"Who can keep up with such a torrent? When I was young and foolish I vowed that I would read all the articles in my small field of science. Discovering that this was impossible, I tried to read all the abstracts. That, too, proved too much. Now I know that I cannot even read all the titles." To help reduce scholarly information-overload, this article proposes using Knowledge-Step Forums for the purpose of creating a new type scholarly publication, Web-based Compendia. Each Compendium is about a very narrow topic and is presented in a MultiLevel Format. When all these features are combined, the scholarly article is called a Knowledge-Step Compendium, and it is posted on the Web by the scholar, either on an institutional server, or on one of many web-hosting servers. Web-search engines will be automatically notified about the new posting (and later changes, too). Forum-Compendors need not be a senior faculty member (as is the case in traditional literature-reviews), but can be pre-docs, post-docs, and senior medical/surgical residents. These graduate-students will be aided by their mentors and online experts to create these Knowledge-Step Compendia. All participants (students and faculty) will be motivated by their own self-interest and everyone gains from the activity, which self-organizes groups of like-minded scholars. Such groups can be the basis for early reviews of new data, for discovering new ideas, and for finding jobs. Knowledge-Step Forums will speed publication on the Web because it will easily support Publication of Preprints using the software's automatic collection of online "peer-review" comments. In order for the Internet to be an efficient searchable repository of current and developing knowledge, one additional feature will be needed: ForwardLinks must be available in any given publication to those articles that, in the future, cite the given publication, as fully described in a Supplement to this article. Open-source software for this functionality should be on all Web-servers that contain scholarly articles, so as to make the WWW a distributed web full of linkages, of both ForwardLinks and RetroLinks.

Download Full-text

An Enhanced Web Document Search Engine using a Semantic Network

REV Journal on Electronics and Communications ◽

10.21553/rev-jec.134 ◽

2016 ◽

Author(s):

Sang Thanh Thi Nguyen ◽

Tuan Thanh Nguyen

Keyword(s):

Web Search ◽

Latent Dirichlet Allocation ◽

Topic Model ◽

Semantic Network ◽

Daily Basis ◽

Web Pages ◽

Web Crawler ◽

Web Document ◽

Modelling Techniques ◽

The Web

With the rapid advancement of ICT technology, the World Wide Web (referred to as the Web) has become the biggest information repository whose volume keeps growing on a daily basis. The challenge is how to find the most wanted information from the Web with a minimum effort. This paper presents a novel ontology-based framework for searching the related web pages to a given term within a few given specific websites. With this framework, a web crawler first learns the content of web pages within the given websites, then the topic modeller finds the relations between web pages and topics via key words found on the web pages using the Latent Dirichlet Allocation (LDA) technique. After that, the ontology builder establishes an ontology which is a semantic network of web pages based on the topic model. Finally, a reasoner can find the related web pages to a given term by making use of the ontology. The framework and related modelling techniques have been verified using a few test websites and the results convince its superiority over the existing web search tools.

Download Full-text

When scholars use Knowledge-Step Forums to create Web-Compended Guides to the literature of their fields, paradigm-shifts will occur in the processes of knowledge creation and in graduate education.

10.7287/peerj.preprints.1568v13 ◽

2017 ◽

Author(s):

Don L Jewett

Keyword(s):

English Language ◽

Information Dissemination ◽

Web Search ◽

Information Overload ◽

Scientific Publication ◽

Paradigm Shifts ◽

Senior Faculty ◽

Web Based ◽

Senior Faculty Member ◽

The Web

"Publication forms the core structure supporting the development and transmission of scientific knowledge" (Galbraith2015). Yet, with the WorldWideWeb a dominant part of current scientific publication and information-dissemination, internet "publication" is still paper-based in its style and methods. As will become painfully obvious, such a paper-based "publishing model" is NOT adequate for a Web-based world. Consider that in 2011, an estimated 5,000 peer-reviewed scientific articles were published per day (Outsell2013), and that in 2014 just the English-language scholarly publications on the Web were about 4,900 per day. In 1980, the distinguished scientist Garrett Hardin wrote [Hardin1980]:"Who can keep up with such a torrent? When I was young and foolish I vowed that I would read all the articles in my small field of science. Discovering that this was impossible, I tried to read all the abstracts. That, too, proved too much. Now I know that I cannot even read all the titles." To help reduce scholarly information-overload, this article proposes using Knowledge-Step Forums for the purpose of creating a new type scholarly publication, Web-based Compendia. Each Compendium is about a very narrow topic and is presented in a MultiLevel Format. When all these features are combined, the scholarly article is called a Knowledge-Step Compendium, and it is posted on the Web by the scholar, either on an institutional server, or on one of many web-hosting servers. Web-search engines will be automatically notified about the new posting (and later changes, too). Forum-Compendors need not be a senior faculty member (as is the case in traditional literature-reviews), but can be pre-docs, post-docs, and senior medical/surgical residents. These graduate-students will be aided by their mentors and online experts to create these Knowledge-Step Compendia. All participants (students and faculty) will be motivated by their own self-interest and everyone gains from the activity, which self-organizes groups of like-minded scholars. Such groups can be the basis for early reviews of new data, for discovering new ideas, and for finding jobs. Knowledge-Step Forums will speed publication on the Web because it will easily support Publication of Preprints using the software's automatic collection of online "peer-review" comments. In order for the Internet to be an efficient searchable repository of current and developing knowledge, one additional feature will be needed: ForwardLinks must be available in any given publication to those articles that, in the future, cite the given publication, as fully described in a Supplement to this article. Open-source software for this functionality should be on all Web-servers that contain scholarly articles, so as to make the WWW a distributed web full of linkages, of both ForwardLinks and RetroLinks.

Download Full-text

Search Query Recommendations in Web Information Retrieval Using Query Logs

Advances in Data Mining and Database Management - Web Usage Mining Techniques and Applications Across Industries ◽

10.4018/978-1-5225-0613-3.ch008 ◽

2017 ◽

pp. 199-222

Author(s):

R. Umagandhi ◽

A. V. Senthil Kumar

Keyword(s):

Web Search ◽

Query Term ◽

Web Information Retrieval ◽

Significant Information ◽

Web Based ◽

Web Information ◽

The World ◽

Query Logs ◽

Data Source ◽

The Web

Web is the largest and voluminous data source in the world. The inconceivable boom of information available in the web simultaneously throws the challenge of retrieving the precise and appropriate information at the time of need. The unpredictable amount of web information available becomes a menace of experiencing ambiguity in the web search. In this scenario, Search engine retrieves significant information from the web, based on the query term given by the user. The search queries given by the user are always short and ambiguous and the queries may not produce the appropriate results. The retrieved result may not be relevant all the time. At times irrelevant and redundant results are also retrieved because of the short and ambiguous query keywords. Query Recommendation is a technique to provide the alternate queries as a substitute of the input query to the user to frame the queries in future. A methodology was framed to identify the similar queries and they are clustered; this cluster contains the similar queries which are used to provide the recommendations.

Download Full-text