Domain-Specific Deep Web Sources Discovery

Deep Web databases, whose content is presented as dynamically-generated Web pages hidden behind forms, have mostly been left unindexed by search engine crawlers. In order to automatically explore this mass of information, many current techniques assume the existence of domain knowledge, which is costly to create and maintain. In this article, we present a new perspective on form understanding and deep Web data acquisition that does not require any domain-specific knowledge. Unlike previous approaches, we do not perform the various steps in the process (e.g., form understanding, record identification, attribute labeling) independently but integrate them to achieve a more complete understanding of deep Web sources. Through information extraction techniques and using the form itself for validation, we reconcile input and output schemas in a labeled graph which is further aligned with a generic ontology. The impact of this alignment is threefold: first, the resulting semantic infrastructure associated with the form can assist Web crawlers when probing the form for content indexing; second, attributes of response pages are labeled by matching known ontology instances, and relations between attributes are uncovered; and third, we enrich the generic ontology with facts from the deep Web.

Download Full-text

Research on discovering deep web entries

Computer Science and Information Systems ◽

10.2298/csis100322028w ◽

2011 ◽

Vol 8 (3) ◽

pp. 779-799 ◽

Cited By ~ 7

Author(s):

Ying Wang ◽

Huilai Li ◽

Wanli Zuo ◽

Fengling He ◽

Xin Wang ◽

...

Keyword(s):

Experimental Evaluation ◽

Structural Characteristics ◽

Deep Web ◽

Web Page ◽

Focused Crawling ◽

Web Databases ◽

Semantic Level ◽

Domain Specific ◽

Web Contents

Ontology plays an important role in locating Domain-Specific Deep Web contents, therefore, this paper presents a novel framework WFF for efficiently locating Domain-Specific Deep Web databases based on focused crawling and ontology by constructing Web Page Classifier(WPC), Form Structure Classifier(FSC) and Form Content Classifier(FCC) in a hierarchical fashion. Firstly, WPC discovers potentially interesting pages based on ontology-assisted focused crawler. Then, FSC analyzes the interesting pages and determines whether these pages subsume searchable forms based on structural characteristics. Lastly, FCC identifies searchable forms that belong to a given domain in the semantic level, and stores these URLs of Domain- Specific searchable forms to a database. Through a detailed experimental evaluation, WFF framework not only simplifies discovering process, but also effectively determines Domain-Specific databases.

Download Full-text

E-FFC: an enhanced form-focused crawler for domain-specific deep web databases

Journal of Intelligent Information Systems ◽

10.1007/s10844-012-0221-8 ◽

2012 ◽

Vol 40 (1) ◽

pp. 159-184 ◽

Cited By ~ 12

Author(s):

Yanni Li ◽

Yuping Wang ◽

Jintao Du

Keyword(s):

Deep Web ◽

Web Databases ◽

Domain Specific

Download Full-text

Data extraction and annotation based on domain-specific ontology evolution for deep web

Computer Science and Information Systems ◽

10.2298/csis101011023k ◽

2011 ◽

Vol 8 (3) ◽

pp. 673-692 ◽

Cited By ~ 4

Author(s):

Chen Kerui ◽

Wanli Zuo ◽

Fengling He ◽

Yongheng Chen ◽

Ying Wang

Keyword(s):

Data Extraction ◽

Deep Web ◽

Query Interface ◽

Mapping Data ◽

Specific Domain ◽

Data Annotation ◽

Domain Specific ◽

Query Result ◽

User Query ◽

Sample Set

Deep web respond to a user query result records encoded in HTML files. Data extraction and data annotation, which are important for many applications, extracts and annotates the record from the HTML pages. We proposed an domain-specific ontology based data extraction and annotation technique; we first construct mini-ontology for specific domain according to information of query interface and query result pages; then, use constructed mini-ontology for identifying data areas and mapping data annotations in data extraction; in order to adapt to new sample set, mini-ontology will evolve dynamically based on data extraction and data annotation. Experimental results demonstrate that this method has higher precision and recall in data extraction and data annotation.

Download Full-text

SNPMiner: A Domain-Specific Deep Web Mining Tool

2007 IEEE 7th International Symposium on BioInformatics and BioEngineering ◽

10.1109/bibe.2007.4375564 ◽

2007 ◽

Cited By ~ 8

Author(s):

Fan Wang ◽

Gagan Agrawal ◽

Ruoming Jin ◽

Helen Piontkivska

Keyword(s):

Web Mining ◽

Deep Web ◽

Domain Specific ◽

Mining Tool

Download Full-text

A New Architecture of an Intelligent Agent-Based Crawler for Domain-Specific Deep Web Databases

2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology ◽

10.1109/wi-iat.2012.103 ◽

2012 ◽

Cited By ~ 1

Author(s):

Yanni Li ◽

Yuping Wang ◽

Erfeng Tian

Keyword(s):

Intelligent Agent ◽

Deep Web ◽

Web Databases ◽

Agent Based ◽

Domain Specific

Download Full-text

DwCB - Architecture Specification of Deep Web Crawler Bot with Rules Based on FORM Values for Domain Specific Web Site

Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering - Signal Processing and Information Technology ◽

10.1007/978-3-319-11629-7_28 ◽

2014 ◽

pp. 191-196

Author(s):

S. G. Shaila ◽

A. Vadivel ◽

R. Devi Mahalakshmi ◽

J. Karthika

Keyword(s):

Web Site ◽

Deep Web ◽

Web Crawler ◽

Domain Specific ◽

Architecture Specification

Download Full-text

A Self-Healing Approach for a Domain-Specific Deep Web Search Tool

2010 IEEE International Conference on BioInformatics and BioEngineering ◽

10.1109/bibe.2010.13 ◽

2010 ◽

Author(s):

Fan Wang ◽

Gagan Agrawal

Keyword(s):

Web Search ◽

Deep Web ◽

Self Healing ◽

Domain Specific ◽

Search Tool

Download Full-text

Understanding Query Interfaces: Automatic Extraction of Data from Domain-specific Deep Web based on Ontology

Proceedings of the 22nd International Conference on Enterprise Information Systems ◽

10.5220/0009514202410248 ◽

2020 ◽

Author(s):

Li Dong ◽

Zhang Huan ◽

Yu Zitong

Keyword(s):

Deep Web ◽

Automatic Extraction ◽

Web Based ◽

Domain Specific ◽

Query Interfaces

Download Full-text

Scientific Problem Solving in a Virtual Laboratory: A Comparison Between Individuals and Pairs

Swiss Journal of Psychology ◽

10.1024/1421-0185.67.2.71 ◽

2008 ◽

Vol 67 (2) ◽

pp. 71-83 ◽

Cited By ~ 5

Author(s):

Yolanda A. Métrailler ◽

Ester Reijnen ◽

Cornelia Kneser ◽

Klaus Opwis

Keyword(s):

Visual Search ◽

Problem Solving ◽

Process Data ◽

Scientific Problem ◽

Domain Specific ◽

Scientific Problem Solving ◽

Before And After ◽

Verbal Data ◽

Psychological Knowledge ◽

Problem Solving Task

This study compared individuals with pairs in a scientific problem-solving task. Participants interacted with a virtual psychological laboratory called Virtue to reason about a visual search theory. To this end, they created hypotheses, designed experiments, and analyzed and interpreted the results of their experiments in order to discover which of five possible factors affected the visual search process. Before and after their interaction with Virtue, participants took a test measuring theoretical and methodological knowledge. In addition, process data reflecting participants’ experimental activities and verbal data were collected. The results showed a significant but equal increase in knowledge for both groups. We found differences between individuals and pairs in the evaluation of hypotheses in the process data, and in descriptive and explanatory statements in the verbal data. Interacting with Virtue helped all students improve their domain-specific and domain-general psychological knowledge.

Download Full-text