Evaluating contents-link coupled web page clustering for web search results

Techniques for Improving Web Search by Understanding Queries

10.26686/wgtn.16985482 ◽

2021 ◽

Author(s):

◽

Daniel Wayne Crabtree

Keyword(s):

Search Engines ◽

Best Practice ◽

Web Search ◽

Special Focus ◽

Clustering Methods ◽

Web Page ◽

Clustering Method ◽

Evaluation Measures ◽

Search Results ◽

Web Page Clustering

<p>This thesis investigates the refinement of web search results with a special focus on the use of clustering and the role of queries. It presents a collection of new methods for evaluating clustering methods, performing clustering effectively, and for performing query refinement. The thesis identifies different types of query, the situations where refinement is necessary, and the factors affecting search difficulty. It then analyses hard searches and argues that many of them fail because users and search engines have different query models. The thesis identifies best practice for evaluating web search results and search refinement methods. It finds that none of the commonly used evaluation measures for clustering meet all of the properties of good evaluation measures. It then presents new quality and coverage measures that satisfy all the desired properties and that rank clusterings correctly in all web page clustering situations. The thesis argues that current web page clustering methods work well when different interpretations of the query have distinct vocabulary, but still have several limitations and often produce incomprehensible clusters. It then presents a new clustering method that uses the query to guide the construction of semantically meaningful clusters. The new clustering method significantly improves performance. Finally, the thesis explores how searches and queries are composed of different aspects and shows how to use aspects to reduce the distance between the query models of search engines and users. It then presents fully automatic methods that identify query aspects, identify underrepresented aspects, and predict query difficulty. Used in combination, these methods have many applications — the thesis describes methods for two of them. The first method improves the search results for hard queries with underrepresented aspects by automatically expanding the query using semantically orthogonal keywords related to the underrepresented aspects. The second method helps users refine hard ambiguous queries by identifying the different query interpretations using a clustering of a diverse set of refinements. Both methods significantly outperform existing methods.</p>

Download Full-text

Techniques for Improving Web Search by Understanding Queries

10.26686/wgtn.16985482.v1 ◽

2021 ◽

Author(s):

◽

Daniel Wayne Crabtree

Keyword(s):

Search Engines ◽

Best Practice ◽

Web Search ◽

Special Focus ◽

Clustering Methods ◽

Web Page ◽

Clustering Method ◽

Evaluation Measures ◽

Search Results ◽

Web Page Clustering

<p>This thesis investigates the refinement of web search results with a special focus on the use of clustering and the role of queries. It presents a collection of new methods for evaluating clustering methods, performing clustering effectively, and for performing query refinement. The thesis identifies different types of query, the situations where refinement is necessary, and the factors affecting search difficulty. It then analyses hard searches and argues that many of them fail because users and search engines have different query models. The thesis identifies best practice for evaluating web search results and search refinement methods. It finds that none of the commonly used evaluation measures for clustering meet all of the properties of good evaluation measures. It then presents new quality and coverage measures that satisfy all the desired properties and that rank clusterings correctly in all web page clustering situations. The thesis argues that current web page clustering methods work well when different interpretations of the query have distinct vocabulary, but still have several limitations and often produce incomprehensible clusters. It then presents a new clustering method that uses the query to guide the construction of semantically meaningful clusters. The new clustering method significantly improves performance. Finally, the thesis explores how searches and queries are composed of different aspects and shows how to use aspects to reduce the distance between the query models of search engines and users. It then presents fully automatic methods that identify query aspects, identify underrepresented aspects, and predict query difficulty. Used in combination, these methods have many applications — the thesis describes methods for two of them. The first method improves the search results for hard queries with underrepresented aspects by automatically expanding the query using semantically orthogonal keywords related to the underrepresented aspects. The second method helps users refine hard ambiguous queries by identifying the different query interpretations using a clustering of a diverse set of refinements. Both methods significantly outperform existing methods.</p>

Download Full-text

Web Page Recommender System using hybrid of Genetic Algorithm and Trust for Personalized Web Search

Journal of Information Technology Research ◽

10.4018/jitr.2018040107 ◽

2018 ◽

Vol 11 (2) ◽

pp. 110-127 ◽

Cited By ~ 3

Author(s):

Suruchi Chawla

Keyword(s):

Genetic Algorithm ◽

Web Search ◽

Web Pages ◽

Information Need ◽

Web Page ◽

Data Set ◽

Search Results ◽

Page Ranking ◽

Main Challenge ◽

Optimal Ranking

The main challenge to effective information retrieval is to optimize the page ranking in order to retrieve relevant documents for user queries. In this article, a method is proposed which uses hybrid of genetic algorithms (GA) and trust for generating the optimal ranking of trusted clicked URLs for web page recommendations. The trusted web pages are selected based on clustered query sessions for GA based optimal ranking in order to retrieve more relevant documents up in ranking and improves the precision of search results. Thus, the optimal ranking of trusted clicked URLs recommends relevant documents to web users for their search goal and satisfy the information need of the user effectively. The experiment was conducted on a data set captured in three domains, academics, entertainment and sports, to evaluate the performance of GA based optimal ranking (with/without trust) and search results confirms the improvement of precision of search results.

Download Full-text

Web Page Recommender System using hybrid of Genetic Algorithm and Trust for Personalized Web Search

Research Anthology on Multi-Industry Uses of Genetic Programming and Algorithms ◽

10.4018/978-1-7998-8048-6.ch034 ◽

2021 ◽

pp. 656-675

Author(s):

Suruchi Chawla

Keyword(s):

Genetic Algorithm ◽

Web Search ◽

Web Pages ◽

Information Need ◽

Web Page ◽

Data Set ◽

Search Results ◽

Page Ranking ◽

Main Challenge ◽

Optimal Ranking

The main challenge to effective information retrieval is to optimize the page ranking in order to retrieve relevant documents for user queries. In this article, a method is proposed which uses hybrid of genetic algorithms (GA) and trust for generating the optimal ranking of trusted clicked URLs for web page recommendations. The trusted web pages are selected based on clustered query sessions for GA based optimal ranking in order to retrieve more relevant documents up in ranking and improves the precision of search results. Thus, the optimal ranking of trusted clicked URLs recommends relevant documents to web users for their search goal and satisfy the information need of the user effectively. The experiment was conducted on a data set captured in three domains, academics, entertainment and sports, to evaluate the performance of GA based optimal ranking (with/without trust) and search results confirms the improvement of precision of search results.

Download Full-text

A Web Search Personalization Based on Probability of Semantic Similarity between User Log and Query with Web Page

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i4.24.21856 ◽

2018 ◽

Vol 7 (4.24) ◽

pp. 59

Author(s):

Y. Raju ◽

Dr D. Suresh Babu ◽

Dr K. Anuradha

Keyword(s):

Data Collection ◽

Semantic Similarity ◽

Web Search ◽

Web Page ◽

User Interest ◽

Information Requirements ◽

Log Data ◽

Search Results ◽

Search Result ◽

User Query

Web search personalization is recognized as a competent solution to address the problem of query-relevant search as per the user interest, while it able to present dissimilar search results based upon the preferences and information requirements of users. The popular search engines provide their search results interpreting the user query only, which mostly have unrelated results due to the keywords ambiguity problem. In order to have satisfied and user interesting result, it is important to personalize the results according to their relevancies. In this paper, we propose a Web search Personalization based on a Probability of Semantic Similarity (WP-PSS) between user log and query with search result webpage. It performs a probability of semantic similarities computation between the user query and search result webpage snippet, and compute the frequency of link associated with the log data. Based on these two computed factors a probability of similarities association is computed to group and re-rank the search results for the personalization. Experiment evaluation over a set of multi-domain web searched data collection shows an accuracy improvisation.

Download Full-text

Improved Web page clustering algorithm based on partial tag tree matching

Journal of Computer Applications ◽

10.3724/sp.j.1087.2010.00818 ◽

2010 ◽

Vol 30 (3) ◽

pp. 818-820

Author(s):

Rui LI ◽

Jun-yu ZENG ◽

Si-wang ZHOU

Keyword(s):

Clustering Algorithm ◽

Web Page ◽

Web Page Clustering

Download Full-text

How the interface design influences users' spontaneous trustworthiness evaluations of web search results

Proceedings of the 2010 Symposium on Eye-Tracking Research & Applications - ETRA '10 ◽

10.1145/1743666.1743736 ◽

2010 ◽

Cited By ~ 16

Author(s):

Yvonne Kammerer ◽

Peter Gerjets

Keyword(s):

Interface Design ◽

Web Search ◽

Search Results

Download Full-text

Automated internal web page clustering for improved data extraction

Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics - WIMS '12 ◽

10.1145/2254129.2254209 ◽

2012 ◽

Author(s):

Cornelia Győrödi ◽

Robert Győrödi ◽

Mihai Cornea ◽

George Pecherle

Keyword(s):

Data Extraction ◽

Web Page ◽

Web Page Clustering

Download Full-text

The Matter of Chance: Auditing Web Search Results Related to the 2020 U.S. Presidential Primary Elections Across Six Search Engines

Social Science Computer Review ◽

10.1177/08944393211006863 ◽

2021 ◽

pp. 089443932110068

Author(s):

Aleksandra Urman ◽

Mykola Makhortykh ◽

Roberto Ulloa

Keyword(s):

Search Engine ◽

Search Engines ◽

Large Scale ◽

Web Search ◽

Primary Elections ◽

Virtual Agents ◽

Search Results ◽

Presidential Primary ◽

Large Scale Analysis ◽

Algorithmic Information

We examine how six search engines filter and rank information in relation to the queries on the U.S. 2020 presidential primary elections under the default—that is nonpersonalized—conditions. For that, we utilize an algorithmic auditing methodology that uses virtual agents to conduct large-scale analysis of algorithmic information curation in a controlled environment. Specifically, we look at the text search results for “us elections,” “donald trump,” “joe biden,” “bernie sanders” queries on Google, Baidu, Bing, DuckDuckGo, Yahoo, and Yandex, during the 2020 primaries. Our findings indicate substantial differences in the search results between search engines and multiple discrepancies within the results generated for different agents using the same search engine. It highlights that whether users see certain information is decided by chance due to the inherent randomization of search results. We also find that some search engines prioritize different categories of information sources with respect to specific candidates. These observations demonstrate that algorithmic curation of political information can create information inequalities between the search engine users even under nonpersonalized conditions. Such inequalities are particularly troubling considering that search results are highly trusted by the public and can shift the opinions of undecided voters as demonstrated by previous research.

Download Full-text

Understanding how people interact with web search results that change in real-time using implicit feedback

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management - CIKM '13 ◽

10.1145/2505515.2505663 ◽

2013 ◽

Cited By ~ 13

Author(s):

Jin Young Kim ◽

Mark Cramer ◽

Jaime Teevan ◽

Dmitry Lagun

Keyword(s):

Real Time ◽

Web Search ◽

Implicit Feedback ◽

Search Results

Download Full-text