hidden databases
Recently Published Documents


TOTAL DOCUMENTS

12
(FIVE YEARS 2)

H-INDEX

4
(FIVE YEARS 0)

Author(s):  
Stefan Hintzen ◽  
Yves Liesy ◽  
Christian Zirpins

AbstractMuch data on the web is available in hidden databases. Users browse their contents by sending search queries to form-based interfaces or APIs. Yet, hidden databases just return the top-k result entries and limit the number of queries per time interval. Such access restrictions constrict those tasks that require many/specific queries or need to access many/all data entries. For a temporary solution, an unrestricted local snapshot can be created by crawling the hidden database. Yet, keeping the snapshot permanently consistent is challenging due to the access restrictions of its origin. In this paper, we propose a replication approach providing permanent unrestricted access to the local copy of a hidden database with dynamic changes. To this end, we present an algorithm to effectively crawl hidden databases that outperforms the state of the art. Furthermore, we propose a new way to continuously control the consistency of the replicated database in an efficient manner. We also introduce the cloud-based architecture of a replication service for hidden databases. We show the effectiveness of the approach through a variety of reproducible experimental evaluations.


2015 ◽  
Vol 27 (5) ◽  
pp. 1192-1204 ◽  
Author(s):  
Hui Yan ◽  
Zhiguo Gong ◽  
Nan Zhang ◽  
Tao Huang ◽  
Hua Zhong ◽  
...  

Author(s):  
JIAN-WEI TIAN ◽  
WEN-HUI QI ◽  
XIAO-XIAO LIU

A great deal of data on the Web lies in the hidden databases, or the deep Web. Most of the deep Web data is not directly available and can only be accessed through the query interfaces. Current research on deep Web search has focused on crawling the deep Web data via Web interfaces with keywords queries. However, these keywords-based methods have inherent limitations because of the multi-attributes and top-k features of the deep Web. In this paper we propose a novel approach for siphoning structured data with structured queries. Firstly, in order to retrieve all the data non-repeatedly in hidden databases, we model the hidden database as a hierarchy tree. Under this theoretical framework, data retrieving is transformed into the traversing problem in a tree. We also propose techniques to narrow the query space by using heuristic rule, based on mutual information, to guide the traversal process. We conduct extensive experiments over real deep Web sites and controlled databases to illustrate the coverage and efficiency of our techniques.


2010 ◽  
Vol 1 (13) ◽  
pp. 22-25
Author(s):  
Richa Jindal ◽  
Chander Kiran

Sign in / Sign up

Export Citation Format

Share Document