An Enhanced Web Document Search Engine using a Semantic Network

WEB GRAPH BASED SEARCH BY USING DENSITY OF KEYWORD AND AGE FACTOR

International Journal of Computer Science and Informatics ◽

10.47893/ijcsi.2013.1124 ◽

2013 ◽

pp. 89-93

Author(s):

GAURAV AGARWAL ◽

SACHI GUPTA ◽

SAURABH MUKHERJEE

Keyword(s):

Search Engine ◽

Web Search ◽

Web Pages ◽

Main Role ◽

Ranking Algorithm ◽

Web Page ◽

Web Crawler ◽

User Requirement ◽

Priority Assignment ◽

The Web

Today, web servers, are the key repositories of the information & internet is the source of getting this information. There is a mammoth data on the Internet. It becomes a difficult job to search out the accordant data. Search Engine plays a vital role in searching the accordant data. A search engine follows these steps: Web crawling by crawler, Indexing by Indexer and Searching by Searcher. Web crawler retrieves information of the web pages by following every link on the site. Which is stored by web search engine then the content of the web page is indexed by the indexer. The main role of indexer is how data can be catch soon as per user requirements. As the client gives a query, Search Engine searches the results corresponding to this query to provide excellent output. Here ambition is to enroot an algorithm for search engine which may response most desirable result as per user requirement. In this a ranking method is used by the search engine to rank the web pages. Various ranking approaches are discussed in literature but in this paper, ranking algorithm is proposed which is based on parent-child relationship. Proposed ranking algorithm is based on priority assignment phase of Heterogeneous Earliest Finish Time (HEFT) Algorithm which is designed for multiprocessor task scheduling. Proposed algorithm works on three on range variable its means the density of keywords, number of successors to the nodes and the age of the web page. Density shows the occurrence of the keyword on the particular web page. Numbers of successors represent the outgoing link to a single web page. Age is the freshness value of the web page. The page which is modified recently is the freshest page and having the smallest age or largest freshness value. Proposed Technique requires that the priorities of each page to be set with the downward rank values & pages are arranged in ascending/ Descending order of their rank values. Experiments show that our algorithm is valuable. After the comparison with Google we find that our Algorithm is performing better. For 70% problems our algorithm is working better than Google.

Download Full-text

Efficient Web Navigator for Multi-Constrained Spatial Keyword Queries

Journal of Communications Software and Systems ◽

10.24138/jcomss.v11i2.104 ◽

2015 ◽

Vol 11 (2) ◽

pp. 63 ◽

Cited By ~ 1

Author(s):

K.B. Priya Iyer

Keyword(s):

Web Search ◽

Working Hours ◽

Web Pages ◽

Web Crawler ◽

Web Documents ◽

Point Of Interest ◽

Data Objects ◽

Opening Up ◽

New Algorithms ◽

The Web

The mobile technology revolutionizes the world of communications opening up new possibilities for applications such as location based web search. It involves retrieving the user point of interest (POI) from the web documents based on the query relative to a particular place or region. Existing Location based Applications on mobiles finds the nearest neighbors from the POI database not from the web documents. The existing query searches are limited to POI and do not include data objects brands like Model name, color and price etc. This paper introduces a Spatial Web Crawler (SWC) for multi-constrained keyword queries. New algorithms were developed which provides desired data objects from the web pages basing on the working hours of data objects, keywords and priority such as cost, quality and popularity of data objects etc. The SWC also provides the shortest path to reach point of interest based on travel time.

Download Full-text

Search Engine

The Dark Web ◽

10.4018/978-1-5225-3163-0.ch016 ◽

2018 ◽

pp. 359-374

Author(s):

Dilip Kumar Sharma ◽

A. K. Sharma

Keyword(s):

Computer Networks ◽

Search Engines ◽

Web Search ◽

Relevant Information ◽

Vital Role ◽

Deep Web ◽

Telecommunication Networks ◽

Web Pages ◽

Web Crawler ◽

Main Components

ICT plays a vital role in human development through information extraction and includes computer networks and telecommunication networks. One of the important modules of ICT is computer networks, which are the backbone of the World Wide Web (WWW). Search engines are computer programs that browse and extract information from the WWW in a systematic and automatic manner. This paper examines the three main components of search engines: Extractor, a web crawler which starts with a URL; Analyzer, an indexer that processes words on the web page and stores the resulting index in a database; and Interface Generator, a query handler that understands the need and preferences of the user. This paper concentrates on the information available on the surface web through general web pages and the hidden information behind the query interface, called deep web. This paper emphasizes the Extraction of relevant information to generate the preferred content for the user as the first result of his or her search query. This paper discusses the aspect of deep web with analysis of a few existing deep web search engines.

Download Full-text

Search Engine

International Journal of Information Communication Technologies and Human Development ◽

10.4018/ijicthd.2011040103 ◽

2011 ◽

Vol 3 (2) ◽

pp. 38-51 ◽

Cited By ~ 6

Author(s):

Dilip Kumar Sharma ◽

A. K. Sharma

Keyword(s):

Computer Networks ◽

Search Engines ◽

Web Search ◽

Relevant Information ◽

Vital Role ◽

Deep Web ◽

Telecommunication Networks ◽

Web Pages ◽

Web Crawler ◽

Main Components

ICT plays a vital role in human development through information extraction and includes computer networks and telecommunication networks. One of the important modules of ICT is computer networks, which are the backbone of the World Wide Web (WWW). Search engines are computer programs that browse and extract information from the WWW in a systematic and automatic manner. This paper examines the three main components of search engines: Extractor, a web crawler which starts with a URL; Analyzer, an indexer that processes words on the web page and stores the resulting index in a database; and Interface Generator, a query handler that understands the need and preferences of the user. This paper concentrates on the information available on the surface web through general web pages and the hidden information behind the query interface, called deep web. This paper emphasizes the Extraction of relevant information to generate the preferred content for the user as the first result of his or her search query. This paper discusses the aspect of deep web with analysis of a few existing deep web search engines.

Download Full-text

AntWeb—Web Search Based on Ant Behavior

Emerging Technologies of Text Mining ◽

10.4018/978-1-59904-373-9.ch010 ◽

2008 ◽

pp. 208-222

Author(s):

Li Weigang ◽

Wu Man Qi

Keyword(s):

Web Mining ◽

Web Search ◽

Theory Model ◽

Web Pages ◽

Web Portal ◽

Knowledge Based ◽

Log Files ◽

Ant Behavior ◽

Shortest Route ◽

The Web

This chapter presents a study of Ant Colony Optimization (ACO) to Interlegis Web portal, Brazilian legislation Website. The approach of AntWeb is inspired by ant colonies foraging behavior to adaptively mark the most significant link by means of the shortest route to arrive the target pages. The system considers the users in the Web portal as artificial ants and the links among the pages of the Web pages as the researching network. To identify the group of the visitors, Web mining is applied to extract knowledge based on preprocessing Web log files. The chapter describes the theory, model, main utilities and implementation of AntWeb prototype in Interlegis Web portal. The case study shows Off-line Web mining; simulations without and with the use of AntWeb; testing by modification of the parameters. The result demonstrates the sensibility and accessibility of AntWeb and the benefits for the Interlegis Web users.

Download Full-text

Enhancing Web Search through Web Structure Mining

Encyclopedia of Data Warehousing and Mining ◽

10.4018/978-1-59140-557-3.ch084 ◽

2011 ◽

pp. 443-447

Author(s):

Ji-Rong Wen

Keyword(s):

Information Retrieval ◽

Web Search ◽

Product Information ◽

Semantic Representation ◽

Web Pages ◽

Search Performance ◽

Information Display ◽

Web Structure Mining ◽

Free Environment ◽

The Web

The Web is an open and free environment for people to publish and get information. Everyone on the Web can be either an author, a reader, or both. The language of the Web, HTML (Hypertext Markup Language), is mainly designed for information display, not for semantic representation. Therefore, current Web search engines usually treat Web pages as unstructured documents, and traditional information retrieval (IR) technologies are employed for Web page parsing, indexing, and searching. The unstructured essence of Web pages seriously blocks more accurate search and advanced applications on the Web. For example, many sites contain structured information about various products. Extracting and integrating product information from multiple Web sites could lead to powerful search functions, such as comparison shopping and business intelligence. However, these structured data are embedded in Web pages, and there are no proper traditional methods to extract and integrate them. Another example is the link structure of the Web. If used properly, information hidden in the links could be taken advantage of to effectively improve search performance and make Web search go beyond traditional information retrieval (Page, Brin, Motwani, & Winograd, 1998, Kleinberg, 1998).

Download Full-text

A Method of Subtopic Classification of Search Engine Suggests by Integrating a Topic Model and Word Embeddings

International Journal of Software Innovation ◽

10.4018/ijsi.2018070105 ◽

2018 ◽

Vol 6 (3) ◽

pp. 67-78

Author(s):

Tian Nie ◽

Yi Ding ◽

Chen Zhao ◽

Youchao Lin ◽

Takehito Utsuro

Keyword(s):

Search Engine ◽

Information Needs ◽

Web Search ◽

Topic Model ◽

Japanese Version ◽

Word Embedding ◽

Coarse Grained ◽

Web Pages ◽

Word Embeddings

The background of this article is the issue of how to overview the knowledge of a given query keyword. Especially, the authors focus on concerns of those who search for web pages with a given query keyword. The Web search information needs of a given query keyword is collected through search engine suggests. Given a query keyword, the authors collect up to around 1,000 suggests, while many of them are redundant. They classify redundant search engine suggests based on a topic model. However, one limitation of the topic model based classification of search engine suggests is that the granularity of the topics, i.e., the clusters of search engine suggests, is too coarse. In order to overcome the problem of the coarse-grained classification of search engine suggests, this article further applies the word embedding technique to the webpages used during the training of the topic model, in addition to the text data of the whole Japanese version of Wikipedia. Then, the authors examine the word embedding based similarity between search engines suggests and further classify search engine suggests within a single topic into finer-grained subtopics based on the similarity of word embeddings. Evaluation results prove that the proposed approach performs well in the task of subtopic classification of search engine suggests.

Download Full-text

Design and implementation of crawling algorithm to collect deep web information for web archiving

Data Technologies and Applications ◽

10.1108/dta-07-2017-0053 ◽

2018 ◽

Vol 52 (2) ◽

pp. 266-277 ◽

Cited By ~ 2

Author(s):

Hyo-Jung Oh ◽

Dong-Hyun Won ◽

Chonghyuck Kim ◽

Sung-Hee Park ◽

Yong Kim

Keyword(s):

Deep Web ◽

Web Crawler ◽

Web Archiving ◽

Web Browser ◽

Web Documents ◽

Content Type ◽

Web Document ◽

Web Information ◽

Web Crawlers ◽

The Web

Purpose The purpose of this paper is to describe the development of an algorithm for realizing web crawlers that automatically collect dynamically generated webpages from the deep web. Design/methodology/approach This study proposes and develops an algorithm to collect web information as if the web crawler gathers static webpages by managing script commands as links. The proposed web crawler actually experiments with the algorithm by collecting deep webpages. Findings Among the findings of this study is that if the actual crawling process provides search results as script pages, the outcome only collects the first page. However, the proposed algorithm can collect deep webpages in this case. Research limitations/implications To use a script as a link, a human must first analyze the web document. This study uses the web browser object provided by Microsoft Visual Studio as a script launcher, so it cannot collect deep webpages if the web browser object cannot launch the script, or if the web document contains script errors. Practical implications The research results show deep webs are estimated to have 450 to 550 times more information than surface webpages, and it is difficult to collect web documents. However, this algorithm helps to enable deep web collection through script runs. Originality/value This study presents a new method to be utilized with script links instead of adopting previous keywords. The proposed algorithm is available as an ordinary URL. From the conducted experiment, analysis of scripts on individual websites is needed to employ them as links.

Download Full-text

A New Vector Representation of Short Texts for Classification

The International Arab Journal of Information Technology ◽

10.34028/iajit/17/2/12 ◽

2019 ◽

Vol 17 (2) ◽

pp. 241-249

Author(s):

Yangyang Li ◽

Bo Liu

Keyword(s):

Text Classification ◽

Web Search ◽

Latent Dirichlet Allocation ◽

Topic Model ◽

Classification Performance ◽

New Method ◽

Data Sets ◽

Text Data ◽

Short Text ◽

Space Model

Short and sparse characteristics and synonyms and homonyms are main obstacles for short-text classification. In recent years, research on short-text classification has focused on expanding short texts but has barely guaranteed the validity of expanded words. This study proposes a new method to weaken these effects without external knowledge. The proposed method analyses short texts by using the topic model based on Latent Dirichlet Allocation (LDA), represents each short text by using a vector space model and presents a new method to adjust the vector of short texts. In the experiments, two open short-text data sets composed of google news and web search snippets are utilised to evaluate the classification performance and prove the effectiveness of our method.

Download Full-text

Clustering of the Web Search Results in Educational Recommender Systems

Educational Recommender Systems and Technologies ◽

10.4018/978-1-61350-489-5.ch007 ◽

2012 ◽

pp. 154-181 ◽

Cited By ~ 12

Author(s):

Constanta-Nicoleta Bodea ◽

Maria-Iuliana Dascalu ◽

Adina Lipai

Keyword(s):

Recommender Systems ◽

Clustering Algorithm ◽

Web Search ◽

Web Pages ◽

Lexical Database ◽

Assessment Task ◽

Search Results ◽

Meta Search ◽

Search Approach ◽

The Web

This chapter presents a meta-search approach, meant to deliver bibliography from the internet, according to trainees’ results obtained at an e-assessment task. The bibliography consists of web pages related to the knowledge gaps of the trainees. The meta-search engine is part of an education recommender system, attached to an e-assessment application for project management knowledge. Meta-search means that, for a specific query (or mistake made by the trainee), several search mechanisms for suitable bibliography (further reading) could be applied. The lists of results delivered by the standard search mechanisms are used to build thematically homogenous groups using an ontology-based clustering algorithm. The clustering process uses an educational ontology and WordNet lexical database to create its categories. The research is presented in the context of recommender systems and their various applications to the education domain.

Download Full-text