Block-Based Similarity Search on the Web Using Manifold-Ranking

Author(s):  
Xiaojun Wan ◽  
Jianwu Yang ◽  
Jianguo Xiao
2012 ◽  
Vol 601 ◽  
pp. 394-400
Author(s):  
Taeh Wan Kim ◽  
Ho Cheol Jeon ◽  
Joong Min Choi

Document similarity search is to retrieve a ranked list of similar documents and find documents similar to a query document in a text corpus or a web page on the web. But most of the previous researches regarding searching for similar documents are focused on classifying documents based on the contents of documents. To solve this problem, we propose a novel retrieval approach based on undirected graphs to represent each document in corpus. In addition, this study also considers unified graph in conjunction with multiple graphs to improve the quality of searching for similar documents. Experimental results on the Reuters-21578 data demonstrate that the proposed system has better performance and success than the traditional approach.


Author(s):  
Qihua Chen ◽  
Xiangdong Wang ◽  
Yueliang Qian

For cell phone users and blind people using non-visual browsers, browsing Web by common browsers is quite inefficient due to the problem of information overload. This paper presents the TB-WPRO (Title-Block based Web Page Re-Organization) method, which hierarchically segments web pages into blocks using visual and layout information reflecting the web designers’ intent. TB-WPRO segments the web pages with a clear goal to extract self-described title blocks. To reorganize web pages, the segmentation result is transformed to a serial of small web pages that could be easily accessed. Compared to current methods, the proposed approach obtains a promising segmentation result where blocks are visually and semantically consistent with original web pages.


Author(s):  
Qihua Chen ◽  
Xiangdong Wang ◽  
Yueliang Qian

For cell phone users and blind people using non-visual browsers, browsing Web by common browsers is quite inefficient due to the problem of information overload. This paper presents the TB-WPRO (Title-Block based Web Page Re-Organization) method, which hierarchically segments web pages into blocks using visual and layout information reflecting the web designers’ intent. TB-WPRO segments the web pages with a clear goal to extract self-described title blocks. To reorganize web pages, the segmentation result is transformed to a serial of small web pages that could be easily accessed. Compared to current methods, the proposed approach obtains a promising segmentation result where blocks are visually and semantically consistent with original web pages.


2008 ◽  
Vol 11 (2) ◽  
pp. 83-85
Author(s):  
Howard Wilson
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document