web data integration
Recently Published Documents


TOTAL DOCUMENTS

39
(FIVE YEARS 1)

H-INDEX

6
(FIVE YEARS 0)

2013 ◽  
Vol 756-759 ◽  
pp. 1855-1859
Author(s):  
Meng Juan Li ◽  
Lian Yin Jia ◽  
Jin Guo You ◽  
Jia Man Ding ◽  
Hai He Zhou

Deep web data integration has become the center of many research efforts in the recent few years. Near duplicate detection is very important for deep web integration system, there are seldom researches focusing on integrating deep web Integration and near duplicate detection together. In this paper, we develop a integration system, DWI-ndfree to solve this problem. The wrapper of DWI-ndfree consists of four parts: the form filler, the navigator, the extractor and the near duplicate detector. To find near duplicate records, we propose efficient algorithm CheckNearDuplicate. DWI-ndfree can integrate deep web data with near duplicate free and has been used to execute several web extraction and integration tasks efficiently.


2013 ◽  
Vol 718-720 ◽  
pp. 2242-2247 ◽  
Author(s):  
Tao Lin ◽  
Bao Hua Qiang ◽  
Shi Long ◽  
He Qian

Data extraction is an important issue in Deep web data integration. In order to extract the query results of the Deep Web, it is firstly required to locate the target data block correctly. Due to the html source code of web pages can be parsed as well structured DOM, we proposed an effective algorithm for discerning the common path based on hierarchical DOM. Based on the common path and our predefined regular expression, the target data of the Deep Web can be extracted effectively. The experimental results on real websites show that our proposed algorithm is highly effective.


2013 ◽  
Vol 15 (3) ◽  
pp. 371-398 ◽  
Author(s):  
Rashed Salem ◽  
Omar Boussaïd ◽  
Jérôme Darmont

Sign in / Sign up

Export Citation Format

Share Document