The Semanference System: Better Search Results through Better Queries

What is popular on Wikipedia and why?

First Monday ◽

10.5210/fm.v12i4.1765 ◽

2007 ◽

Cited By ~ 29

Author(s):

Anselm Spoerri

Keyword(s):

Search Engines ◽

Web Search ◽

Search Behavior ◽

Search Queries ◽

Search Results ◽

The Web

This paper analyzes which pages and topics are the most popular on Wikipedia and why. For the period of September 2006 to January 2007, the 100 most visited Wikipedia pages in a month are identified and categorized in terms of the major topics of interest. The observed topics are compared with search behavior on the Web. Search queries, which are identical to the titles of the most popular Wikipedia pages, are submitted to major search engines and the positions of popular Wikipedia pages in the top 10 search results are determined. The presented data helps to explain how search engines, and Google in particular, fuel the growth and shape what is popular on Wikipedia.

Download Full-text

Identifying Aspects for Web-Search Queries

Journal of Artificial Intelligence Research ◽

10.1613/jair.3182 ◽

2011 ◽

Vol 40 ◽

pp. 677-700 ◽

Cited By ~ 4

Author(s):

F. Wu ◽

J. Madhavan ◽

A. Halevy

Keyword(s):

Search Engine ◽

Web Search ◽

User Study ◽

Effective Means ◽

Relevant Information ◽

Sources Of Information ◽

Information Need ◽

Search Queries ◽

Mass Collaboration ◽

Query Logs

Many web-search queries serve as the beginning of an exploration of an unknown space of information, rather than looking for a specific web page. To answer such queries effec- tively, the search engine should attempt to organize the space of relevant information in a way that facilitates exploration. We describe the Aspector system that computes aspects for a given query. Each aspect is a set of search queries that together represent a distinct information need relevant to the original search query. To serve as an effective means to explore the space, Aspector computes aspects that are orthogonal to each other and to have high combined coverage. Aspector combines two sources of information to compute aspects. We discover candidate aspects by analyzing query logs, and cluster them to eliminate redundancies. We then use a mass-collaboration knowledge base (e.g., Wikipedia) to compute candidate aspects for queries that occur less frequently and to group together aspects that are likely to be semantically related. We present a user study that indicates that the aspects we compute are rated favorably against three competing alternatives related searches proposed by Google, cluster labels assigned by the Clusty search engine, and navigational searches proposed by Bing.

Download Full-text

An architecture for non-linear discovery of aggregated multimedia document web search results

PeerJ Computer Science ◽

10.7717/peerj-cs.449 ◽

2021 ◽

Vol 7 ◽

pp. e449

Author(s):

Abdur Rehman Khan ◽

Umer Rashid ◽

Khalid Saleem ◽

Adeel Ahmed

Keyword(s):

Multimedia Information ◽

Search Engines ◽

Web Search ◽

Empirical Evaluation ◽

Multimedia Data ◽

Information Need ◽

Multimedia Document ◽

Search Results ◽

Non Linear ◽

Multimedia Search

The recent proliferation of multimedia information on the web enhances user information need from simple textual lookup to multi-modal exploration activities. The current search engines act as major gateways to access the immense amount of multimedia data. However, access to the multimedia content is provided by aggregating disjoint multimedia search verticals. The aggregation of the multimedia search results cannot consider relationships in them and are partially blended. Additionally, the search results’ presentation is via linear lists, which cannot support the users’ non-linear navigation patterns to explore the multimedia search results. Contrarily, users’ are demanding more services from search engines. It includes adequate access to navigate, explore, and discover multimedia information. Our discovery approach allow users to explore and discover multimedia information by semantically aggregating disjoint verticals using sentence embeddings and transforming snippets into conceptually similar multimedia document groups. The proposed aggregation approach retains the relationship in the retrieved multimedia search results. A non-linear graph is instantiated to augment the users’ non-linear information navigation and exploration patterns, which leads to discovering new and interesting search results at various aggregated granularity levels. Our method’s empirical evaluation results achieve 99% accuracy in the aggregation of disjoint search results at different aggregated search granularity levels. Our approach provides a standard baseline for the exploration of multimedia aggregation search results.

Download Full-text

Web Page Recommender System using hybrid of Genetic Algorithm and Trust for Personalized Web Search

Journal of Information Technology Research ◽

10.4018/jitr.2018040107 ◽

2018 ◽

Vol 11 (2) ◽

pp. 110-127 ◽

Cited By ~ 3

Author(s):

Suruchi Chawla

Keyword(s):

Genetic Algorithm ◽

Web Search ◽

Web Pages ◽

Information Need ◽

Web Page ◽

Data Set ◽

Search Results ◽

Page Ranking ◽

Main Challenge ◽

Optimal Ranking

The main challenge to effective information retrieval is to optimize the page ranking in order to retrieve relevant documents for user queries. In this article, a method is proposed which uses hybrid of genetic algorithms (GA) and trust for generating the optimal ranking of trusted clicked URLs for web page recommendations. The trusted web pages are selected based on clustered query sessions for GA based optimal ranking in order to retrieve more relevant documents up in ranking and improves the precision of search results. Thus, the optimal ranking of trusted clicked URLs recommends relevant documents to web users for their search goal and satisfy the information need of the user effectively. The experiment was conducted on a data set captured in three domains, academics, entertainment and sports, to evaluate the performance of GA based optimal ranking (with/without trust) and search results confirms the improvement of precision of search results.

Download Full-text

Searching web documents using a summarization approach

International Journal of Web Information Systems ◽

10.1108/ijwis-11-2015-0039 ◽

2016 ◽

Vol 12 (1) ◽

pp. 83-101 ◽

Cited By ~ 6

Author(s):

Rani Qumsiyeh ◽

Yiu-Kai Ng

Keyword(s):

Search Engines ◽

Web Search ◽

Specific Information ◽

Information Need ◽

Search Query ◽

Content Type ◽

Additional Information ◽

Search Results ◽

User Query ◽

Web Search Engines

Purpose The purpose of this paper is to introduce a summarization method to enhance the current web-search approaches by offering a summary of each clustered set of web-search results with contents addressing the same topic, which should allow the user to quickly identify the information covered in the clustered search results. Web search engines, such as Google, Bing and Yahoo!, rank the set of documents S retrieved in response to a user query and represent each document D in S using a title and a snippet, which serves as an abstract of D. Snippets, however, are not as useful as they are designed for, i.e. assisting its users to quickly identify results of interest. These snippets are inadequate in providing distinct information and capture the main contents of the corresponding documents. Moreover, when the intended information need specified in a search query is ambiguous, it is very difficult, if not impossible, for a search engine to identify precisely the set of documents that satisfy the user’s intended request without requiring additional information. Furthermore, a document title is not always a good indicator of the content of the corresponding document either. Design/methodology/approach The authors propose to develop a query-based summarizer, called QSum, in solving the existing problems of Web search engines which use titles and abstracts in capturing the contents of retrieved documents. QSum generates a concise/comprehensive summary for each cluster of documents retrieved in response to a user query, which saves the user’s time and effort in searching for specific information of interest by skipping the step to browse through the retrieved documents one by one. Findings Experimental results show that QSum is effective and efficient in creating a high-quality summary for each cluster to enhance Web search. Originality/value The proposed query-based summarizer, QSum, is unique based on its searching approach. QSum is also a significant contribution to the Web search community, as it handles the ambiguous problem of a search query by creating summaries in response to different interpretations of the search which offer a “road map” to assist users to quickly identify information of interest.

Download Full-text

Web Page Recommender System using hybrid of Genetic Algorithm and Trust for Personalized Web Search

Research Anthology on Multi-Industry Uses of Genetic Programming and Algorithms ◽

10.4018/978-1-7998-8048-6.ch034 ◽

2021 ◽

pp. 656-675

Author(s):

Suruchi Chawla

Keyword(s):

Genetic Algorithm ◽

Web Search ◽

Web Pages ◽

Information Need ◽

Web Page ◽

Data Set ◽

Search Results ◽

Page Ranking ◽

Main Challenge ◽

Optimal Ranking

The main challenge to effective information retrieval is to optimize the page ranking in order to retrieve relevant documents for user queries. In this article, a method is proposed which uses hybrid of genetic algorithms (GA) and trust for generating the optimal ranking of trusted clicked URLs for web page recommendations. The trusted web pages are selected based on clustered query sessions for GA based optimal ranking in order to retrieve more relevant documents up in ranking and improves the precision of search results. Thus, the optimal ranking of trusted clicked URLs recommends relevant documents to web users for their search goal and satisfy the information need of the user effectively. The experiment was conducted on a data set captured in three domains, academics, entertainment and sports, to evaluate the performance of GA based optimal ranking (with/without trust) and search results confirms the improvement of precision of search results.

Download Full-text

Constructing Web search queries from the user's information need expressed in a natural language

Proceedings of the 2003 ACM symposium on Applied computing - SAC '03 ◽

10.1145/952532.952758 ◽

2003 ◽

Cited By ~ 8

Author(s):

Jacob Shapiro ◽

Isak Taksa

Keyword(s):

Natural Language ◽

Web Search ◽

Information Need ◽

Search Queries

Download Full-text

Neural methods for effective, efficient, and exposure-aware information retrieval

ACM SIGIR Forum ◽

10.1145/3476415.3476434 ◽

2021 ◽

Vol 55 (1) ◽

pp. 1-2

Author(s):

Bhaskar Mitra

Keyword(s):

Information Retrieval ◽

Language Processing ◽

Large Scale ◽

Web Search ◽

Real Life ◽

Inverted Index ◽

Information Need ◽

Product Model ◽

Performance Improvements ◽

Deep Model

Neural networks with deep architectures have demonstrated significant performance improvements in computer vision, speech recognition, and natural language processing. The challenges in information retrieval (IR), however, are different from these other application areas. A common form of IR involves ranking of documents---or short passages---in response to keyword-based queries. Effective IR systems must deal with query-document vocabulary mismatch problem, by modeling relationships between different query and document terms and how they indicate relevance. Models should also consider lexical matches when the query contains rare terms---such as a person's name or a product model number---not seen during training, and to avoid retrieving semantically related but irrelevant results. In many real-life IR tasks, the retrieval involves extremely large collections---such as the document index of a commercial Web search engine---containing billions of documents. Efficient IR methods should take advantage of specialized IR data structures, such as inverted index, to efficiently retrieve from large collections. Given an information need, the IR system also mediates how much exposure an information artifact receives by deciding whether it should be displayed, and where it should be positioned, among other results. Exposure-aware IR systems may optimize for additional objectives, besides relevance, such as parity of exposure for retrieved items and content publishers. In this thesis, we present novel neural architectures and methods motivated by the specific needs and challenges of IR tasks. We ground our contributions with a detailed survey of the growing body of neural IR literature [Mitra and Craswell, 2018]. Our key contribution towards improving the effectiveness of deep ranking models is developing the Duet principle [Mitra et al., 2017] which emphasizes the importance of incorporating evidence based on both patterns of exact term matches and similarities between learned latent representations of query and document. To efficiently retrieve from large collections, we develop a framework to incorporate query term independence [Mitra et al., 2019] into any arbitrary deep model that enables large-scale precomputation and the use of inverted index for fast retrieval. In the context of stochastic ranking, we further develop optimization strategies for exposure-based objectives [Diaz et al., 2020]. Finally, this dissertation also summarizes our contributions towards benchmarking neural IR models in the presence of large training datasets [Craswell et al., 2019] and explores the application of neural methods to other IR tasks, such as query auto-completion.

Download Full-text