scholarly journals An Approach for Content Retrieval from Web Pages Using Clustering Techniques

2016 ◽  
Vol 07 (09) ◽  
pp. 2663-2675 ◽  
Author(s):  
R. Manjula ◽  
A. Chilambuchelvan
2011 ◽  
Vol 55-57 ◽  
pp. 1003-1008
Author(s):  
Yong Quan Dong ◽  
Xiang Jun Zhao ◽  
Gong Jie Zhang

A novel approach is proposed to automatically extract data records from detail pages using hierarchical clustering techniques. The approach uses the information of the listing pages to identify the content blocks in detail pages, which narrows the scopes of Web data extraction. Meanwhile, it also makes full use of the structure and content features to cluster content feature vectors. Finally, it aligns data elements of multiple details pages to extract the data records. Experiment results on test beds of real web pages show that the approach can achieve high extraction accuracy and outperforms the existing techniques substantially.


2019 ◽  
Vol 8 (2) ◽  
pp. 6392-6395

Web usage mining is used to analyze the user browsing behavior among the web pages which can be further utilized in other applications like recommender system, personalized web pages, providing insight for better business functionality. Since this type of mining does not only depends on the user or web pages, conventional clustering techniques may not suit very well for the analysis. Biclustering techniques are used to discover the subset in the form of submatrices as objects and attributes of objects are considered symmetrically. Finding optimal biclusters is a critical research issue. This research proposes a hybrid swarm intelligence-based method having Particle Swarm Optimization combined with Leader Clustering method along with Uniform Crossover operator. The experimental study shows that the proposed method performs well than traditional biclustering techniques in terms of evaluation metrics.


2013 ◽  
Vol 347-350 ◽  
pp. 2666-2672
Author(s):  
Kai Lei ◽  
Guang Yu Sun ◽  
Lian En Huang

Delta compression techniques are commonly used in the context of version control systems and the World Wide Web. They are used to compactly encode the differences between two files or strings in order to reduce communication or storage costs. In this paper, we study the use of delta compression in compressing massive web pages according to the similarity of their templates. We propose a framework for template-based delta compression which uses template-based clustering techniques to find the web pages that have similar templates and then encode their differences with delta compression techniques to reduce the storage cost. We also propose a filter-based optimization of Diff algorithm to improve the efficiency of the delta compression approach. To demonstrate the efficiency of our approach, we present experimental results on massive web pages. Our experiments show that template-based delta compression achieves significant improvements in compression ratio as compared to individually compressing each web page.


Crisis ◽  
2018 ◽  
Vol 39 (3) ◽  
pp. 197-204 ◽  
Author(s):  
Hajime Sueki ◽  
Jiro Ito

Abstract. Background: Gatekeeper training is an effective suicide prevention strategy. However, the appropriate targets of online gatekeeping have not yet been clarified. Aim: We examined the association between the outcomes of online gatekeeping using the Internet and the characteristics of consultation service users. Method: An advertisement to encourage the use of e-mail-based psychological consultation services among viewers was placed on web pages that showed the results of searches using suicide-related keywords. All e-mails received between October 2014 and December 2015 were replied to as part of gatekeeping, and the obtained data (responses to an online questionnaire and the content of the received e-mails) were analyzed. Results: A total of 154 consultation service users were analyzed, 35.7% of whom were male. The median age range was 20–29 years. Online gatekeeping was significantly more likely to be successful when such users faced financial/daily life or workplace problems, or revealed their names (including online names). By contrast, the activity was more likely to be unsuccessful when it was impossible to assess the problems faced by consultation service users. Conclusion: It may be possible to increase the success rate of online gatekeeping by targeting individuals facing financial/daily life or workplace problems with marked tendencies for self-disclosure.


Sign in / Sign up

Export Citation Format

Share Document