Research on discovering deep web entries

Ying Wang; Huilai Li; Wanli Zuo; Fengling He; Xin Wang; Kerui Chen

doi:10.2298/csis100322028w

Research on discovering deep web entries

Computer Science and Information Systems ◽

10.2298/csis100322028w ◽

2011 ◽

Vol 8 (3) ◽

pp. 779-799 ◽

Cited By ~ 7

Author(s):

Ying Wang ◽

Huilai Li ◽

Wanli Zuo ◽

Fengling He ◽

Xin Wang ◽

...

Keyword(s):

Experimental Evaluation ◽

Structural Characteristics ◽

Deep Web ◽

Web Page ◽

Focused Crawling ◽

Web Databases ◽

Semantic Level ◽

Domain Specific ◽

Web Contents

Ontology plays an important role in locating Domain-Specific Deep Web contents, therefore, this paper presents a novel framework WFF for efficiently locating Domain-Specific Deep Web databases based on focused crawling and ontology by constructing Web Page Classifier(WPC), Form Structure Classifier(FSC) and Form Content Classifier(FCC) in a hierarchical fashion. Firstly, WPC discovers potentially interesting pages based on ontology-assisted focused crawler. Then, FSC analyzes the interesting pages and determines whether these pages subsume searchable forms based on structural characteristics. Lastly, FCC identifies searchable forms that belong to a given domain in the semantic level, and stores these URLs of Domain- Specific searchable forms to a database. Through a detailed experimental evaluation, WFF framework not only simplifies discovering process, but also effectively determines Domain-Specific databases.

Download Full-text

E-FFC: an enhanced form-focused crawler for domain-specific deep web databases

Journal of Intelligent Information Systems ◽

10.1007/s10844-012-0221-8 ◽

2012 ◽

Vol 40 (1) ◽

pp. 159-184 ◽

Cited By ~ 12

Author(s):

Yanni Li ◽

Yuping Wang ◽

Jintao Du

Keyword(s):

Deep Web ◽

Web Databases ◽

Domain Specific

Download Full-text

A New Architecture of an Intelligent Agent-Based Crawler for Domain-Specific Deep Web Databases

2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology ◽

10.1109/wi-iat.2012.103 ◽

2012 ◽

Cited By ~ 1

Author(s):

Yanni Li ◽

Yuping Wang ◽

Erfeng Tian

Keyword(s):

Intelligent Agent ◽

Deep Web ◽

Web Databases ◽

Agent Based ◽

Domain Specific

Download Full-text

Domain-Specific Deep Web Sources Discovery

2008 Fourth International Conference on Natural Computation ◽

10.1109/icnc.2008.350 ◽

2008 ◽

Cited By ~ 7

Author(s):

Ying Wang ◽

Wanli Zuo ◽

Tao Peng ◽

Fengling He

Keyword(s):

Deep Web ◽

Domain Specific

Download Full-text

Cross-Fertilizing Deep Web Analysis and Ontology Enrichment

10.31219/osf.io/b3fvz ◽

2017 ◽

Author(s):

Marilena Oita ◽

Antoine Amarilli ◽

Pierre Senellart

Keyword(s):

Domain Knowledge ◽

Deep Web ◽

Web Pages ◽

Complete Understanding ◽

Specific Knowledge ◽

Domain Specific ◽

Domain Specific Knowledge ◽

Web Crawlers ◽

New Perspective ◽

The Impact

Deep Web databases, whose content is presented as dynamically-generated Web pages hidden behind forms, have mostly been left unindexed by search engine crawlers. In order to automatically explore this mass of information, many current techniques assume the existence of domain knowledge, which is costly to create and maintain. In this article, we present a new perspective on form understanding and deep Web data acquisition that does not require any domain-specific knowledge. Unlike previous approaches, we do not perform the various steps in the process (e.g., form understanding, record identification, attribute labeling) independently but integrate them to achieve a more complete understanding of deep Web sources. Through information extraction techniques and using the form itself for validation, we reconcile input and output schemas in a labeled graph which is further aligned with a generic ontology. The impact of this alignment is threefold: first, the resulting semantic infrastructure associated with the form can assist Web crawlers when probing the form for content indexing; second, attributes of response pages are labeled by matching known ontology instances, and relations between attributes are uncovered; and third, we enrich the generic ontology with facts from the deep Web.

Download Full-text

Clustering Deep Web Databases Semantically

Information Retrieval Technology - Lecture Notes in Computer Science ◽

10.1007/978-3-540-68636-1_35 ◽

2008 ◽

pp. 365-376 ◽

Cited By ~ 4

Author(s):

Ling Song ◽

Jun Ma ◽

Po Yan ◽

Li Lian ◽

Dongmei Zhang

Keyword(s):

Deep Web ◽

Web Databases

Download Full-text

Site-Wide Wrapper Induction for Life Science Deep Web Databases

Lecture Notes in Computer Science - Data Integration in the Life Sciences ◽

10.1007/978-3-642-02879-3_9 ◽

2009 ◽

pp. 96-112 ◽

Cited By ~ 2

Author(s):

Saqib Mir ◽

Steffen Staab ◽

Isabel Rojas

Keyword(s):

Life Science ◽

Deep Web ◽

Web Databases ◽

Wrapper Induction

Download Full-text

Content-Determined Web Page Segmentation and Navigation for Mobile Web Searching

Result Page Generation for Web Searching - Advances in Web Technologies and Engineering ◽

10.4018/978-1-7998-0961-6.ch007 ◽

2021 ◽

pp. 88-108

Keyword(s):

Web Pages ◽

Web Searching ◽

Web Page ◽

Page Segmentation ◽

Web Browser ◽

Mobile Web ◽

Desktop Computers ◽

Equal Importance ◽

Web Contents ◽

Music Player

Nowadays the usage of mobile phones is widely spread in our lifestyle; we use cell phones as a camera, a radio, a music player, and even as a web browser. Since most web pages are created for desktop computers, navigating through web pages is highly fatigued. Hence, there is a great interest in computer science to adopt such pages with rich content into small screens of our mobile devices. On the other hand, every web page has got many different parts that do not have the equal importance to the end user. Consequently, the authors propose a mechanism to identify the most useful part of a web page to a user regarding his or her search query while the information loss is avoided. The challenge here comes from the fact that long web contents cannot be easily displayed in both vertical and horizontal ways.

Download Full-text