A WEB information extraction method based on DOM tree structure and information entropy

A Web Information Extraction Method Based on HTML Parser

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.774-776.1802 ◽

2013 ◽

Vol 774-776 ◽

pp. 1802-1806

Author(s):

Zhi Ming Zhang ◽

Shuai Shuai Huang ◽

Ping Li

Keyword(s):

Information Extraction ◽

Extraction Method ◽

Rapid Development ◽

Extraction Time ◽

The Internet ◽

Regular Expressions ◽

Web Information Extraction ◽

Amount Of Information ◽

Web Information ◽

Html Parser

With the rapid development of Internet, and surge in the amount of information on the Internet, how to accurately and quickly get the information of the users really need, such as the title, links, and pictures, is the hotspot. This paper proposed a fast web information extraction method based on html parser, this paper validated the effect of the proposed method by extracting commodities information of e-commerce website, the results show that the accuracy of the information extraction by our method is higher than the extraction method based on regular expressions, and the extraction time is greatly shortened.

Download Full-text

Research of the Web Information Extraction Technology on Tourism Theme

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.614.503 ◽

2014 ◽

Vol 614 ◽

pp. 503-506

Author(s):

Qi Shen ◽

Qing Ming Song ◽

Bo Chen

Keyword(s):

Information Extraction ◽

Extraction Method ◽

Structural Features ◽

Web Pages ◽

Cleaning Efficiency ◽

Extraction Technology ◽

Web Information Extraction ◽

Web Information ◽

Dynamic Web ◽

The Web

With the development of web technology, the use of dynamic web pages and the personalization of page contents become more and more popular. Currently, the information of page is protean and the structures of different pages are vastly different, the traditional thinking of web information extraction technology has been difficult to adapt to the situation. In this paper, proposes a web information extraction method based on extended XPath policy through the analysis of structural features of web pages on tourist theme. This algorithm avoids the defects of traditional web information extraction technology; it is simple, practical, high cleaning efficiency, accuracy, and saving the overhead of the system.

Download Full-text

An Adaptive Web Information Extraction Approach Based on STU-DOM Tree

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.397-400.1972 ◽

2013 ◽

Vol 397-400 ◽

pp. 1972-1978

Author(s):

Song Pu Wu ◽

Qing Wang

Keyword(s):

Information Extraction ◽

Web Sites ◽

Web Information Extraction ◽

Maintenance Costs ◽

Web Information ◽

Dom Tree ◽

Html Parser

An adaptive web information extraction approach is presented in this paper. Most of the traditional web information extraction approaches depend on the templates of web sites. If the templates are changed, the information extraction rules should be redesigned. To reduce the maintenance costs and improve the adaptability of information extractors, an adaptive web information extraction approach is proposed based on the STU-DOM tree. The webpage is parsed into DOM Trees based on HTML Parser. Then DOM trees are filtered into STU-DOM trees to confirm blocks which contain keywords of a certain topic. The proposed approach is applied to webpages and the results show that the approach not only extracts information efficiently, but also is irrelevant to site structures.

Download Full-text

Research on WEB Information Extraction Based on DOM Tree Statistics Keyword Path

Computer Science and Application ◽

10.12677/csa.2019.92022 ◽

2019 ◽

Vol 09 (02) ◽

pp. 181-187

Author(s):

建视赵

Keyword(s):

Information Extraction ◽

Web Information Extraction ◽

Web Information ◽

Dom Tree

Download Full-text

A Web Information Extraction Method Based on Ontology

INTERNATIONAL JOURNAL ON Advances in Information Sciences and Service Sciences ◽

10.4156/aiss.vol4.issue8.25 ◽

2012 ◽

Vol 4 (8) ◽

pp. 199-206

Author(s):

Wu Hengliang ◽

Zhang Weiwei

Keyword(s):

Information Extraction ◽

Extraction Method ◽

Web Information Extraction ◽

Web Information

Download Full-text

Web Information Extraction Algorithm Based on Ontology and DOM Tree

2010 International Conference on Computational Intelligence and Software Engineering ◽

10.1109/cise.2010.5677052 ◽

2010 ◽

Author(s):

Li Liu ◽

Junfan Shi ◽

Xinrui Liu

Keyword(s):

Information Extraction ◽

Web Information Extraction ◽

Web Information ◽

Extraction Algorithm ◽

Dom Tree

Download Full-text

An approach of semi-supervised Web information extraction

2nd International Symposium on Information Technologies and Applications in Education (ISITAE 2008) ◽

10.1049/ic:20080243 ◽

2008 ◽

Author(s):

Xika Lin ◽

Xiufen Fu ◽

H. Aras ◽

Shaohua Teng

Keyword(s):

Information Extraction ◽

Web Information Extraction ◽

Web Information

Download Full-text

Cross domain web information extraction with multi-level feature model

2014 10th International Conference on Natural Computation (ICNC) ◽

10.1109/icnc.2014.6975936 ◽

2014 ◽

Author(s):

Qian Chen ◽

Wenhao Zhu ◽

Chaoyou Ju ◽

Wu Zhang

Keyword(s):

Information Extraction ◽

Feature Model ◽

Web Information Extraction ◽

Cross Domain ◽

Web Information ◽

Multi Level

Download Full-text

An agent-based system framework for multi-slot Web information extraction

2010 2nd International Asia Conference on Informatics in Control, Automation and Robotics (CAR 2010) ◽

10.1109/car.2010.5456664 ◽

2010 ◽

Author(s):

Shudong Zhang ◽

Ye Qin ◽

Naiming Yao

Keyword(s):

Information Extraction ◽

System Framework ◽

Web Information Extraction ◽

Agent Based ◽

Web Information

Download Full-text

Web Information Extraction System

Encyclopedia of Database Systems ◽

10.1007/978-0-387-39940-9_4001 ◽

2009 ◽

pp. 3478-3478

Keyword(s):

Information Extraction ◽

Extraction System ◽

Web Information Extraction ◽

Web Information ◽

Information Extraction System

Download Full-text