Data extraction and annotation based on domain-specific ontology evolution for deep web

Chen Kerui; Wanli Zuo; Fengling He; Yongheng Chen; Ying Wang

doi:10.2298/csis101011023k

Data extraction and annotation based on domain-specific ontology evolution for deep web

Computer Science and Information Systems ◽

10.2298/csis101011023k ◽

2011 ◽

Vol 8 (3) ◽

pp. 673-692 ◽

Cited By ~ 4

Author(s):

Chen Kerui ◽

Wanli Zuo ◽

Fengling He ◽

Yongheng Chen ◽

Ying Wang

Keyword(s):

Data Extraction ◽

Deep Web ◽

Query Interface ◽

Mapping Data ◽

Specific Domain ◽

Data Annotation ◽

Domain Specific ◽

Query Result ◽

User Query ◽

Sample Set

Deep web respond to a user query result records encoded in HTML files. Data extraction and data annotation, which are important for many applications, extracts and annotates the record from the HTML pages. We proposed an domain-specific ontology based data extraction and annotation technique; we first construct mini-ontology for specific domain according to information of query interface and query result pages; then, use constructed mini-ontology for identifying data areas and mapping data annotations in data extraction; in order to adapt to new sample set, mini-ontology will evolve dynamically based on data extraction and data annotation. Experimental results demonstrate that this method has higher precision and recall in data extraction and data annotation.

Download Full-text

Extracting Data Records Based on Global Schema

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.20-23.553 ◽

2010 ◽

Vol 20-23 ◽

pp. 553-558 ◽

Cited By ~ 1

Author(s):

Ke Rui Chen ◽

Wan Li Zuo ◽

Fan Zhang ◽

Feng Lin He

Keyword(s):

Deep Web ◽

Experimental Results ◽

Web Pages ◽

Query Interface ◽

Web Data ◽

Data Carrier ◽

Query Result ◽

Global Schema ◽

Urgent Task

With the rapid increasing of web data, deep web is the fastest growing web data carrier. Therefore, the research of deep web, especially on extracting data records from Result pages, has already become an urgent task. We present a data records extraction based on Global Schema method, which automatically extracts the query result records from web pages. This method first analyzes the Query interface and result records instances to build a Global Schema by ontology. Then, the Global Schema is used in the process of extracting data records from result pages and storing these data in a table. Experimental results indicate that this method is accurate to extract data records, as well as to save in a table with a Global Schema.

Download Full-text

Deep Web query interface schema matching based on matching degree and semantic similarity

Journal of Computer Applications ◽

10.3724/sp.j.1087.2012.01688 ◽

2013 ◽

Vol 32 (6) ◽

pp. 1688-1691

Author(s):

Yong FENG ◽

Yang ZHANG

Keyword(s):

Semantic Similarity ◽

Deep Web ◽

Schema Matching ◽

Query Interface ◽

Matching Degree

Download Full-text

Designing a Chat-Bot for College Information using Information Retrieval and Automatic Text Summarization Techniques

Current Chinese Computer Science ◽

10.2174/2665997201999201022191540 ◽

2020 ◽

Vol 01 ◽

Author(s):

Radha Guha

Keyword(s):

Information Retrieval ◽

Language Processing ◽

Latent Dirichlet Allocation ◽

Semantic Analysis ◽

Text Summarization ◽

The Internet ◽

Specific Domain ◽

User Query ◽

College Information ◽

Chat Bot

Background:: In the era of information overload it is very difficult for a human reader to make sense of the vast information available in the internet quickly. Even for a specific domain like college or university website it may be difficult for a user to browse through all the links to get the relevant answers quickly. Objective:: In this scenario, design of a chat-bot which can answer questions related to college information and compare between colleges will be very useful and novel. Methods:: In this paper a novel conversational interface chat-bot application with information retrieval and text summariza-tion skill is designed and implemented. Firstly this chat-bot has a simple dialog skill when it can understand the user query intent, it responds from the stored collection of answers. Secondly for unknown queries, this chat-bot can search the internet and then perform text summarization using advanced techniques of natural language processing (NLP) and text mining (TM). Results:: The advancement of NLP capability of information retrieval and text summarization using machine learning tech-niques of Latent Semantic Analysis(LSI), Latent Dirichlet Allocation (LDA), Word2Vec, Global Vector (GloVe) and Tex-tRank are reviewed and compared in this paper first before implementing them for the chat-bot design. This chat-bot im-proves user experience tremendously by getting answers to specific queries concisely which takes less time than to read the entire document. Students, parents and faculty can get the answers for variety of information like admission criteria, fees, course offerings, notice board, attendance, grades, placements, faculty profile, research papers and patents etc. more effi-ciently. Conclusion:: The purpose of this paper was to follow the advancement in NLP technologies and implement them in a novel application.

Download Full-text

Automatic construction of academic profile: A case of information science domain

Journal of Information Science ◽

10.1177/0165551521998048 ◽

2021 ◽

pp. 016555152199804

Author(s):

Qian Geng ◽

Ziang Chuai ◽

Jian Jin

Keyword(s):

Information Science ◽

Binary Classification ◽

Learning To Rank ◽

Semantic Distance ◽

Background Information ◽

Training Dataset ◽

Initial Vector ◽

Specific Domain ◽

Adaboost Algorithm ◽

Domain Specific

To provide junior researchers with domain-specific concepts efficiently, an automatic approach for academic profiling is needed. First, to obtain personal records of a given scholar, typical supervised approaches often utilise structured data like infobox in Wikipedia as training dataset, but it may lead to a severe mis-labelling problem when they are utilised to train a model directly. To address this problem, a new relation embedding method is proposed for fine-grained entity typing, in which the initial vector of entities and a new penalty scheme are considered, based on the semantic distance of entities and relations. Also, to highlight critical concepts relevant to renowned scholars, scholars’ selective bibliographies which contain massive academic terms are analysed by a newly proposed extraction method based on logistic regression, AdaBoost algorithm and learning-to-rank techniques. It bridges the gap that conventional supervised methods only return binary classification results and fail to help researchers understand the relative importance of selected concepts. Categories of experiments on academic profiling and corresponding benchmark datasets demonstrate that proposed approaches outperform existing methods notably. The proposed techniques provide an automatic way for junior researchers to obtain organised knowledge in a specific domain, including scholars’ background information and domain-specific concepts.

Download Full-text

Domain-Specific Deep Web Sources Discovery

2008 Fourth International Conference on Natural Computation ◽

10.1109/icnc.2008.350 ◽

2008 ◽

Cited By ~ 7

Author(s):

Ying Wang ◽

Wanli Zuo ◽

Tao Peng ◽

Fengling He

Keyword(s):

Deep Web ◽

Domain Specific

Download Full-text

Self-Perceived Creativity

European Journal of Psychological Assessment ◽

10.1027/1015-5759.25.3.194 ◽

2009 ◽

Vol 25 (3) ◽

pp. 194-203 ◽

Cited By ~ 10

Author(s):

Shulamith Kreitler ◽

Hernan Casakin

Keyword(s):

The Self ◽

Experimental Task ◽

Self Assessment ◽

Specific Domain ◽

Specific Product ◽

Domain Specific ◽

Design Representation ◽

Design Requirements ◽

Small Museum ◽

Very High

In view of unclear previous findings about the validity of self-assessed creativity, the hypothesis guiding the present study was that validity would be proven if self-assessed creativity was examined with respect to a specific domain, specific product, specific aspects of creativity, and in terms of specific criteria. The participants were 52 architecture students. The experimental task was to design a small museum in a described context. After completing the task, the students self-assessed their creativity in designing with seven open-ended questions, the Self-Assessment of Creative Design questionnaire, and a list of seven items tapping affective metacognitive aspects of the designing process. Thus, 21 creativity indicators were formed. Four expert architects, working independently, assessed the designs on nine creativity indicators: fluency, flexibility, elaboration, functionality, innovation, fulfilling specified design requirements, considering context, mastery of skills concerning the esthetics of the design representation, and overall creativity. The agreement among the architects’ evaluations was very high. The correlations between the nine corresponding indicators in students’ assessment of their design and those of the experts were positive and significant with respect to three indicators: fluency, flexibility, and overall creativity. On the contrary, the correlations of the rest noncorresponding indicators with those of the experts were not significant. The findings support the validity of self-assessed creativity with specific restrictions.

Download Full-text

Cross-Fertilizing Deep Web Analysis and Ontology Enrichment

10.31219/osf.io/b3fvz ◽

2017 ◽

Author(s):

Marilena Oita ◽

Antoine Amarilli ◽

Pierre Senellart

Keyword(s):

Domain Knowledge ◽

Deep Web ◽

Web Pages ◽

Complete Understanding ◽

Specific Knowledge ◽

Domain Specific ◽

Domain Specific Knowledge ◽

Web Crawlers ◽

New Perspective ◽

The Impact

Deep Web databases, whose content is presented as dynamically-generated Web pages hidden behind forms, have mostly been left unindexed by search engine crawlers. In order to automatically explore this mass of information, many current techniques assume the existence of domain knowledge, which is costly to create and maintain. In this article, we present a new perspective on form understanding and deep Web data acquisition that does not require any domain-specific knowledge. Unlike previous approaches, we do not perform the various steps in the process (e.g., form understanding, record identification, attribute labeling) independently but integrate them to achieve a more complete understanding of deep Web sources. Through information extraction techniques and using the form itself for validation, we reconcile input and output schemas in a labeled graph which is further aligned with a generic ontology. The impact of this alignment is threefold: first, the resulting semantic infrastructure associated with the form can assist Web crawlers when probing the form for content indexing; second, attributes of response pages are labeled by matching known ontology instances, and relations between attributes are uncovered; and third, we enrich the generic ontology with facts from the deep Web.

Download Full-text

Query interface schema extracting from deep web using ontology

10.1117/12.2607261 ◽

2021 ◽

Author(s):

Yong Sun ◽

Shang Wang ◽

Zhenyuan Li ◽

Chang Liu ◽

Tao Peng ◽

...

Keyword(s):

Deep Web ◽

Query Interface

Download Full-text

Giftedness from the perspective of research on genius: Some precautionary implications

Gifted Education International ◽

10.1177/02614294211046324 ◽

2021 ◽

pp. 026142942110463

Author(s):

Dean Keith Simonton

Keyword(s):

20Th Century ◽

19Th Century ◽

Research Literature ◽

Early 20Th Century ◽

Specific Domain ◽

Domain Specific ◽

Empirical Relations ◽

The 19Th Century ◽

Iq Tests

The terms giftedness and genius entered the research literature in the 19th century. Although not synonymous, both terms were defined according to potential or actual achievement in a specific domain. However, in the early 20th century, both terms became defined according to performance on domain-generic IQ tests. Given the empirical relations between achievement and intelligence, this transfer of meaning is unjustified. Both giftedness and genius must be defined with respect to potential or actual domain-specific achievements.

Download Full-text

Associating Labels and Elements of Deep Web Query Interface Based on DOM

Web Information Systems and Mining - Lecture Notes in Computer Science ◽

10.1007/978-3-642-33469-6_81 ◽

2012 ◽

pp. 657-663 ◽

Cited By ~ 1

Author(s):

Baohua Qiang ◽

Long Shi ◽

Chunming Wu ◽

Qian He ◽

Chao Shen

Keyword(s):

Deep Web ◽

Query Interface

Download Full-text