Mica: A Web-Search Tool for Finding API Components and Examples

Author(s):  
J. Stylos ◽  
B.A. Myers
Keyword(s):  
1999 ◽  
Vol 18 (1) ◽  
pp. 80
Author(s):  
S. M. Sparks ◽  
M. A. Rizzolo

1999 ◽  
Vol 08 (02) ◽  
pp. 137-156 ◽  
Author(s):  
CHING-CHI HSU ◽  
CHIA-HUI CHANG

This paper describes a Web information search tool called WebYacht. The goal of WebYacht is to solve the problem of imprecise search results in current Web search engines. Due to incomplete information given by users and the diversified information published on the Web, conventional document ranking based on an automatic assessment of document relevance to the query may not be the best approach when little information is given as in most cases. In order to clarify the ambiguity of the short queries given by users, WebYacht adopts cluster-based browsing model as well as relevance feedback to facilitate Web information search. The idea is to have users give two to three times more feedback in the same amount of time that would be required to give feedback for conventional feedback mechanisms. With the assistance of cluster-based representation provided by WebYacht, a lot of browsing labor can be reduced. In this paper, we explain the techniques used in the design of WebYacht and compare the performances of feedback interface designs and to conventional similarity ranking search results.


2017 ◽  
Vol 10 (13) ◽  
pp. 361
Author(s):  
Nilanjana Dev Nath ◽  
Shreekant Jha ◽  
Janki Meena M ◽  
Syedibrahim S.p

Elasticsearch is a web search tool in view of Lucene. Apache Lucene is a free and open-source data retrieval programming library. Versatile Search gives a conveyed, multitenant-fit full-content web search tool with a HTTP web interface and pattern free JSON archives. It is created in Java and has been released as open source under the terms of the Apache License. Elasticsearch can be utilized to pursuit a wide range of records. It gives adaptable hunt, has close continuous pursuit, and backings multitenancy. It is appropriated, which implies that records can be partitioned into shards and every shard can have zero or more duplicates. Every hub has one or more shards, and goes about as a facilitator to delegate operations to the right shard(s). Elasticsearch is like a wrapper on top of Lucene. In this paper a detailed description of how lucene’s scoring algorithm works and how elasticsearch uses it as “similarity algorithm”


2020 ◽  
Vol 10 (11) ◽  
pp. 3837
Author(s):  
Julio Hernandez ◽  
Heidy M. Marin-Castro ◽  
Miguel Morales-Sandoval

The Web has become the main source of information in the digital world, expanding to heterogeneous domains and continuously growing. By means of a search engine, users can systematically search over the web for particular information based on a text query, on the basis of a domain-unaware web search tool that maintains real-time information. One type of web search tool is the semantic focused web crawler (SFWC); it exploits the semantics of the Web based on some ontology heuristics to determine which web pages belong to the domain defined by the query. An SFWC is highly dependent on the ontological resource, which is created by domain human experts. This work presents a novel SFWC based on a generic knowledge representation schema to model the crawler’s domain, thus reducing the complexity and cost of constructing a more formal representation as the case when using ontologies. Furthermore, a similarity measure based on the combination of the inverse document frequency (IDF) metric, standard deviation, and the arithmetic mean is proposed for the SFWC. This measure filters web page contents in accordance with the domain of interest during the crawling task. A set of experiments were run over the domains of computer science, politics, and diabetes to validate and evaluate the proposed novel crawler. The quantitative (harvest ratio) and qualitative (Fleiss’ kappa) evaluations demonstrate the suitability of the proposed SFWC to crawl the Web using a knowledge representation schema instead of a domain ontology.


2020 ◽  
Vol 4 (2) ◽  
pp. 14-25 ◽  
Author(s):  
Sandeep Suri ◽  
Arushi Gupta ◽  
Kapil Sharma

With the evolution in technology huge amount of data is being generated, and extracts the necessary data from large volumes of data. This process is significantly complex. Generally the web contains bulk of raw data and the process of converting this data to information mining process can be performed. At whatever point the user places some inquiry on particular web search tool, outcomes are produced with respect to the requests which are dependent on the magnitude of the document created via web information retrieval tools. The results are obtained using calculations and implementation of well written algorithms. Well known web search tools like Google and other varied engines contain their specific manner to compute the page rank, various outcomes are obtained on various web crawlers for a same inquiry because the method for deciding the importance of the sites contrasts among number of algorithm. In this research, an attempt to analyze well-known page ranking calculation on the basis of their quality and shortcomings. This paper places the light on a portion of the extremely mainstream ranking algorithm and attempts to discover a better arrangement that can optimize the time spent on looking through the list of sites.


Sign in / Sign up

Export Citation Format

Share Document