scholarly journals Data extraction and annotation based on domain-specific ontology evolution for deep web

2011 ◽  
Vol 8 (3) ◽  
pp. 673-692 ◽  
Author(s):  
Chen Kerui ◽  
Wanli Zuo ◽  
Fengling He ◽  
Yongheng Chen ◽  
Ying Wang

Deep web respond to a user query result records encoded in HTML files. Data extraction and data annotation, which are important for many applications, extracts and annotates the record from the HTML pages. We proposed an domain-specific ontology based data extraction and annotation technique; we first construct mini-ontology for specific domain according to information of query interface and query result pages; then, use constructed mini-ontology for identifying data areas and mapping data annotations in data extraction; in order to adapt to new sample set, mini-ontology will evolve dynamically based on data extraction and data annotation. Experimental results demonstrate that this method has higher precision and recall in data extraction and data annotation.

2010 ◽  
Vol 20-23 ◽  
pp. 553-558 ◽  
Author(s):  
Ke Rui Chen ◽  
Wan Li Zuo ◽  
Fan Zhang ◽  
Feng Lin He

With the rapid increasing of web data, deep web is the fastest growing web data carrier. Therefore, the research of deep web, especially on extracting data records from Result pages, has already become an urgent task. We present a data records extraction based on Global Schema method, which automatically extracts the query result records from web pages. This method first analyzes the Query interface and result records instances to build a Global Schema by ontology. Then, the Global Schema is used in the process of extracting data records from result pages and storing these data in a table. Experimental results indicate that this method is accurate to extract data records, as well as to save in a table with a Global Schema.


Author(s):  
Radha Guha

Background:: In the era of information overload it is very difficult for a human reader to make sense of the vast information available in the internet quickly. Even for a specific domain like college or university website it may be difficult for a user to browse through all the links to get the relevant answers quickly. Objective:: In this scenario, design of a chat-bot which can answer questions related to college information and compare between colleges will be very useful and novel. Methods:: In this paper a novel conversational interface chat-bot application with information retrieval and text summariza-tion skill is designed and implemented. Firstly this chat-bot has a simple dialog skill when it can understand the user query intent, it responds from the stored collection of answers. Secondly for unknown queries, this chat-bot can search the internet and then perform text summarization using advanced techniques of natural language processing (NLP) and text mining (TM). Results:: The advancement of NLP capability of information retrieval and text summarization using machine learning tech-niques of Latent Semantic Analysis(LSI), Latent Dirichlet Allocation (LDA), Word2Vec, Global Vector (GloVe) and Tex-tRank are reviewed and compared in this paper first before implementing them for the chat-bot design. This chat-bot im-proves user experience tremendously by getting answers to specific queries concisely which takes less time than to read the entire document. Students, parents and faculty can get the answers for variety of information like admission criteria, fees, course offerings, notice board, attendance, grades, placements, faculty profile, research papers and patents etc. more effi-ciently. Conclusion:: The purpose of this paper was to follow the advancement in NLP technologies and implement them in a novel application.


2021 ◽  
pp. 016555152199804
Author(s):  
Qian Geng ◽  
Ziang Chuai ◽  
Jian Jin

To provide junior researchers with domain-specific concepts efficiently, an automatic approach for academic profiling is needed. First, to obtain personal records of a given scholar, typical supervised approaches often utilise structured data like infobox in Wikipedia as training dataset, but it may lead to a severe mis-labelling problem when they are utilised to train a model directly. To address this problem, a new relation embedding method is proposed for fine-grained entity typing, in which the initial vector of entities and a new penalty scheme are considered, based on the semantic distance of entities and relations. Also, to highlight critical concepts relevant to renowned scholars, scholars’ selective bibliographies which contain massive academic terms are analysed by a newly proposed extraction method based on logistic regression, AdaBoost algorithm and learning-to-rank techniques. It bridges the gap that conventional supervised methods only return binary classification results and fail to help researchers understand the relative importance of selected concepts. Categories of experiments on academic profiling and corresponding benchmark datasets demonstrate that proposed approaches outperform existing methods notably. The proposed techniques provide an automatic way for junior researchers to obtain organised knowledge in a specific domain, including scholars’ background information and domain-specific concepts.


2009 ◽  
Vol 25 (3) ◽  
pp. 194-203 ◽  
Author(s):  
Shulamith Kreitler ◽  
Hernan Casakin

In view of unclear previous findings about the validity of self-assessed creativity, the hypothesis guiding the present study was that validity would be proven if self-assessed creativity was examined with respect to a specific domain, specific product, specific aspects of creativity, and in terms of specific criteria. The participants were 52 architecture students. The experimental task was to design a small museum in a described context. After completing the task, the students self-assessed their creativity in designing with seven open-ended questions, the Self-Assessment of Creative Design questionnaire, and a list of seven items tapping affective metacognitive aspects of the designing process. Thus, 21 creativity indicators were formed. Four expert architects, working independently, assessed the designs on nine creativity indicators: fluency, flexibility, elaboration, functionality, innovation, fulfilling specified design requirements, considering context, mastery of skills concerning the esthetics of the design representation, and overall creativity. The agreement among the architects’ evaluations was very high. The correlations between the nine corresponding indicators in students’ assessment of their design and those of the experts were positive and significant with respect to three indicators: fluency, flexibility, and overall creativity. On the contrary, the correlations of the rest noncorresponding indicators with those of the experts were not significant. The findings support the validity of self-assessed creativity with specific restrictions.


2017 ◽  
Author(s):  
Marilena Oita ◽  
Antoine Amarilli ◽  
Pierre Senellart

Deep Web databases, whose content is presented as dynamically-generated Web pages hidden behind forms, have mostly been left unindexed by search engine crawlers. In order to automatically explore this mass of information, many current techniques assume the existence of domain knowledge, which is costly to create and maintain. In this article, we present a new perspective on form understanding and deep Web data acquisition that does not require any domain-specific knowledge. Unlike previous approaches, we do not perform the various steps in the process (e.g., form understanding, record identification, attribute labeling) independently but integrate them to achieve a more complete understanding of deep Web sources. Through information extraction techniques and using the form itself for validation, we reconcile input and output schemas in a labeled graph which is further aligned with a generic ontology. The impact of this alignment is threefold: first, the resulting semantic infrastructure associated with the form can assist Web crawlers when probing the form for content indexing; second, attributes of response pages are labeled by matching known ontology instances, and relations between attributes are uncovered; and third, we enrich the generic ontology with facts from the deep Web.


2021 ◽  
Author(s):  
Yong Sun ◽  
Shang Wang ◽  
Zhenyuan Li ◽  
Chang Liu ◽  
Tao Peng ◽  
...  
Keyword(s):  
Deep Web ◽  

2021 ◽  
pp. 026142942110463
Author(s):  
Dean Keith Simonton

The terms giftedness and genius entered the research literature in the 19th century. Although not synonymous, both terms were defined according to potential or actual achievement in a specific domain. However, in the early 20th century, both terms became defined according to performance on domain-generic IQ tests. Given the empirical relations between achievement and intelligence, this transfer of meaning is unjustified. Both giftedness and genius must be defined with respect to potential or actual domain-specific achievements.


Author(s):  
Baohua Qiang ◽  
Long Shi ◽  
Chunming Wu ◽  
Qian He ◽  
Chao Shen
Keyword(s):  
Deep Web ◽  

Sign in / Sign up

Export Citation Format

Share Document