Web Information Extraction Algorithm Based on Ontology and DOM Tree

An adaptive web information extraction approach is presented in this paper. Most of the traditional web information extraction approaches depend on the templates of web sites. If the templates are changed, the information extraction rules should be redesigned. To reduce the maintenance costs and improve the adaptability of information extractors, an adaptive web information extraction approach is proposed based on the STU-DOM tree. The webpage is parsed into DOM Trees based on HTML Parser. Then DOM trees are filtered into STU-DOM trees to confirm blocks which contain keywords of a certain topic. The proposed approach is applied to webpages and the results show that the approach not only extracts information efficiently, but also is irrelevant to site structures.

Download Full-text

Research on WEB Information Extraction Based on DOM Tree Statistics Keyword Path

Computer Science and Application ◽

10.12677/csa.2019.92022 ◽

2019 ◽

Vol 09 (02) ◽

pp. 181-187

Author(s):

建视赵

Keyword(s):

Information Extraction ◽

Web Information Extraction ◽

Web Information ◽

Dom Tree

Download Full-text

An approach of semi-supervised Web information extraction

2nd International Symposium on Information Technologies and Applications in Education (ISITAE 2008) ◽

10.1049/ic:20080243 ◽

2008 ◽

Author(s):

Xika Lin ◽

Xiufen Fu ◽

H. Aras ◽

Shaohua Teng

Keyword(s):

Information Extraction ◽

Web Information Extraction ◽

Web Information

Download Full-text

Cross domain web information extraction with multi-level feature model

2014 10th International Conference on Natural Computation (ICNC) ◽

10.1109/icnc.2014.6975936 ◽

2014 ◽

Author(s):

Qian Chen ◽

Wenhao Zhu ◽

Chaoyou Ju ◽

Wu Zhang

Keyword(s):

Information Extraction ◽

Feature Model ◽

Web Information Extraction ◽

Cross Domain ◽

Web Information ◽

Multi Level

Download Full-text

An agent-based system framework for multi-slot Web information extraction

2010 2nd International Asia Conference on Informatics in Control, Automation and Robotics (CAR 2010) ◽

10.1109/car.2010.5456664 ◽

2010 ◽

Author(s):

Shudong Zhang ◽

Ye Qin ◽

Naiming Yao

Keyword(s):

Information Extraction ◽

System Framework ◽

Web Information Extraction ◽

Agent Based ◽

Web Information

Download Full-text

Web Information Extraction System

Encyclopedia of Database Systems ◽

10.1007/978-0-387-39940-9_4001 ◽

2009 ◽

pp. 3478-3478

Keyword(s):

Information Extraction ◽

Extraction System ◽

Web Information Extraction ◽

Web Information ◽

Information Extraction System

Download Full-text

Intelligent Web Information Extraction Model for Agricultural Product Quality and Safety System

10.54216/jisiot.040203 ◽

2021 ◽

pp. 99-110

Author(s):

Mohammad Ali Tofigh ◽

◽

Zhendong Mu

Keyword(s):

Information Extraction ◽

Product Quality ◽

Hot Spot ◽

Safety System ◽

Agricultural Product ◽

Quality And Safety ◽

Web Information Extraction ◽

Web Information ◽

Product Quality And Safety ◽

The Web

With the development of society, people pay more and more attention to the safety of food, and relevant laws and policies are gradually introduced and being improved. The research and development of agricultural product quality and safety system has become a research hot spot, and how to obtain the Web information of the system effectively and quickly is the focus of the research, so it is essential to carry out the intelligent extraction of Web information for agricultural product quality and safety system. The purpose of this paper is to solve the problem of how to efficiently extract the Web information of the agricultural product quality and safety system. By studying the Web information extraction methods of various systems, the paper makes a detailed analysis and research on how to realize the efficient and intelligent extraction of the Web information of the agricultural product quality and safety system. This paper analyzes in detail all kinds of template information extraction algorithms used at present, and systematically discusses a set of schemes that can automatically extract the Web information of agricultural product quality and safety system according to the template. The research results show that the proposed scheme is a dynamically extensible information extraction system, which can independently implement dynamic configuration templates according to different requirements without changing the code. Compared with the general way, the Web information extraction speed of agricultural product quality safety system is increased by 25%, the accuracy is increased by 12%, and the recall rate is increased by 30%.

Download Full-text