Ontology Learning pada Teks Tidak Terstruktur

Rajif Agung Yunmar

doi:10.23960/elc.v14n2.2139

Ontology Learning pada Teks Tidak Terstruktur

Electrician ◽

10.23960/elc.v14n2.2139 ◽

2020 ◽

Vol 14 (2) ◽

pp. 46-51

Author(s):

Rajif Agung Yunmar

Keyword(s):

Relation Extraction ◽

Semantic Search ◽

Expert Evaluation ◽

Ontology Learning ◽

Concept Extraction

Informasi yang tersebar pada berbagai sumber di internet banyak ditujukan hanya untuk manusia saja. Sementara itu, muncul kebutuhan agar informasi tersebut tidak hanya bisa dibaca dan dipahami oleh manusia saja, tetapi juga oleh mesin. Informasi dalam format yang dapat dipahami oleh mesin dapat digunakan untuk berbagai keperluan, misalnya: menjadi basis pengetahuan untuk penalaran, sharing knowledge antar mesin, semantic search, visualisasi informasi, dsb. Ontology learning adalah metode yang dapat mengekstrak informasi dari teks tidak terstruktur pada suatu dokumen atau halaman web untuk kemudian diubah menjadi basis pengetahuan dalam format yang dapat dipahami oleh mesin, yaitu dalam bentuk ontologi. Metode tersebut terdiri dari beberapa tahapan, yaitu: preprocessing, ekstraksi konsep, ekstraksi relasi, dan evaluasi. Preprocessing menyiapkan korpus uji sehingga siap untuk masuk kedalam metode ekstraksi konsep, yang menggunakan algoritma entropy concept extraction, pada bagian ekstraksi relasi digunakan algoritma subcat relation extraction, sedangkan pada bagian evaluasi ontologi menggunakan metode expert evaluation. Hasil akhir menunjukkan akurasi pada nilai 89.84% untuk ekstraksi konsep, 93.02% untuk ekstraksi relasi, dengan kepercayaan terhadap ekstraksi relasi pada prosentase 71.15%. Kata kunci: ontology learning, entropy concept extraction, subcat relation extraction.

Download Full-text

Research of Conceptual Relation Extraction Based on Improved Hierarchical Clustering Method

The Open Electrical & Electronic Engineering Journal ◽

10.2174/1874129001408010355 ◽

2014 ◽

Vol 8 (1) ◽

pp. 355-360

Author(s):

Caiyun Xie ◽

Junyun Wu

Keyword(s):

Relation Extraction ◽

Main Task ◽

Semantic Retrieval ◽

Ontology Learning ◽

Clustering Method ◽

Concept Hierarchy ◽

Concept Extraction ◽

Relationship Extraction ◽

Taxonomic Relation ◽

Conceptual Relation

The main task of Ontology learning is concept extraction and conceptual relation extraction. This paper mainly studies the latter. Conceptual relation consists of taxonomic relation and non-taxonomic relation. It introduces hierarchy clustering method, and uses concept hierarchy clustering method which chooses different clustering standards in each hierarchy to obtain the taxonomic relation. It improves the accuracy of the relationship extraction. For extracting the nontaxonomic relation, this paper uses a extended association rule, this method can get concrete names of relation, and confirms the domain and range. In the end, the paper uses the introduced method of Ontology Learning to constructing a domain ontology in the law. And it completes the implementation of an Ontology-based semantic retrieval system. The final effect of this system application demonstrates that this Ontology learning method is efficient.

Download Full-text

Method for automatic key concepts extraction

The Electronic Library ◽

10.1108/el-01-2018-0012 ◽

2019 ◽

Vol 37 (1) ◽

pp. 2-15 ◽

Cited By ~ 2

Author(s):

Sudarsana Desul ◽

Madurai Meenachi N. ◽

Thejas Venkatesh ◽

Vijitha Gunta ◽

Gowtham R. ◽

...

Keyword(s):

Coming Out ◽

English Language ◽

Hybrid Approach ◽

Ontology Learning ◽

Specific Domain ◽

Content Type ◽

Concept Extraction ◽

Nuclear Domain ◽

Wide Range ◽

Key Concepts

PurposeOntology of a domain mainly consists of a set of concepts and their semantic relations. It is typically constructed and maintained by using ontology editors with substantial human intervention. It is desirable to perform the task automatically, which has led to the development of ontology learning techniques. One of the main challenges of ontology learning from the text is to identify key concepts from the documents. A wide range of techniques for key concept extraction have been proposed but are having the limitations of low accuracy, poor performance, not so flexible and applicability to a specific domain. The propose of this study is to explore a new method to extract key concepts and to apply them to literature in the nuclear domain.Design/methodology/approachIn this article, a novel method for key concept extraction is proposed and applied to the documents from the nuclear domain. A hybrid approach was used, which includes a combination of domain, syntactic name entity knowledge and statistical based methods. The performance of the developed method has been evaluated from the data obtained using two out of three voting logic from three domain experts by using 120 documents retrieved from SCOPUS database.FindingsThe work reported pertains to extracting concepts from the set of selected documents and aids the search for documents relating to given concepts. The results of a case study indicated that the method developed has demonstrated better metrics than Text2Onto and CFinder. The method described has the capability of extracting valid key concepts from a set of candidates with long phrases.Research limitations/implicationsThe present study is restricted to literature coming out in the English language and applied to the documents from nuclear domain. It has the potential to extend to other domains also.Practical implicationsThe work carried out in the current study has the potential of leading to updating International Nuclear Information System thesaurus for ontology in the nuclear domain. This can lead to efficient search methods.Originality/valueThis work is the first attempt to automatically extract key concepts from the nuclear documents. The proposed approach will address and fix the most of the problems that are existed in the current methods and thereby increase the performance.

Download Full-text

Learning domain taxonomies: the TaxoLine approach

International Journal of Web Information Systems ◽

10.1108/ijwis-04-2017-0024 ◽

2017 ◽

Vol 13 (3) ◽

pp. 281-301 ◽

Cited By ~ 5

Author(s):

Omar El Idrissi Esserhrouchni ◽

Bouchra Frikh ◽

Brahim Ouhbi ◽

Ismail Khalil Ibrahim

Keyword(s):

Execution Time ◽

Design Methodology ◽

State Of The Art ◽

Relation Extraction ◽

Ontology Learning ◽

Conditional Mutual Information ◽

Web Documents ◽

Content Type ◽

Innovative Methodology

Purpose The aim of this paper is to present an online framework for building a domain taxonomy, called TaxoLine, from Web documents automatically. Design/methodology/approach TaxoLine proposes an innovative methodology that combines frequency and conditional mutual information to improve the quality of the domain taxonomy. The system also includes a set of mechanisms that improve the execution time needed to build the ontology. Findings The performance of the TaxoLine framework was applied to nine different financial corpora. The generated taxonomies are evaluated against a gold-standard ontology and are compared to state-of-the-art ontology learning methods. Originality/value The experimental results show that TaxoLine produces high precision and recall for both concept and relation extraction than well-known ontology learning algorithms. Furthermore, it also shows promising results in terms of execution time needed to build the domain taxonomy.

Download Full-text

Deep learning models in detection of dietary supplement adverse event signals from Twitter

JAMIA Open ◽

10.1093/jamiaopen/ooab081 ◽

2021 ◽

Vol 4 (4) ◽

Author(s):

Yefeng Wang ◽

Yunpeng Zhao ◽

Dalton Schutte ◽

Jiang Bian ◽

Rui Zhang

Keyword(s):

Adverse Event ◽

Deep Learning ◽

Knowledge Base ◽

Dietary Supplement ◽

Relation Extraction ◽

Learning Models ◽

Concept Extraction ◽

Annotation Guideline ◽

End To End ◽

Ae Signals

Abstract Objective The objective of this study is to develop a deep learning pipeline to detect signals on dietary supplement-related adverse events (DS AEs) from Twitter. Materials and Methods We obtained 247 807 tweets ranging from 2012 to 2018 that mentioned both DS and AE. We designed a tailor-made annotation guideline for DS AEs and annotated biomedical entities and relations on 2000 tweets. For the concept extraction task, we fine-tuned and compared the performance of BioClinical-BERT, PubMedBERT, ELECTRA, RoBERTa, and DeBERTa models with a CRF classifier. For the relation extraction task, we fine-tuned and compared BERT models to BioClinical-BERT, PubMedBERT, RoBERTa, and DeBERTa models. We chose the best-performing models in each task to assemble an end-to-end deep learning pipeline to detect DS AE signals and compared the results to the known DS AEs from a DS knowledge base (ie, iDISK). Results DeBERTa-CRF model outperformed other models in the concept extraction task, scoring a lenient microaveraged F1 score of 0.866. RoBERTa model outperformed other models in the relation extraction task, scoring a lenient microaveraged F1 score of 0.788. The end-to-end pipeline built on these 2 models was able to extract DS indication and DS AEs with a lenient microaveraged F1 score of 0.666. Conclusion We have developed a deep learning pipeline that can detect DS AE signals from Twitter. We have found DS AEs that were not recorded in an existing knowledge base (iDISK) and our proposed pipeline can as sist DS AE pharmacovigilance.

Download Full-text

Intelligent Semantic Search Engines for Opinion and Sentiment Mining

Next Generation Search Engines ◽

10.4018/978-1-4666-0330-1.ch009 ◽

2012 ◽

pp. 191-215

Author(s):

Mona Sleem-Amer ◽

Ivan Bigorgne ◽

Stéphanie Brizard ◽

Leeley Daio Pires Dos Santos ◽

Yacine El Bouhairi ◽

...

Keyword(s):

Search Engine ◽

Search Engines ◽

Visual Analytics ◽

Intermediate Phase ◽

Relation Extraction ◽

Semantic Search ◽

Data Sets ◽

Linguistic Resources ◽

Linguistic Resource ◽

Sentiment Mining

Over the last years, research and industry players have become increasingly interested in analyzing opinions and sentiments expressed on the social media web for product marketing and business intelligence. In order to adapt to this need search engines not only have to be able to retrieve lists of documents but to directly access, analyze, and interpret topics and opinions. This article covers an intermediate phase of the ongoing industrial research project ’DoXa’ aiming at developing a semantic opinion and sentiment mining search engine for the French language. The DoXa search engine enables topic related opinion and sentiment extraction beyond positive and negative polarity using rich linguistic resources. Centering the work on two distinct business use cases, the authors analyze both unstructured Web 2.0 contents (e.g., blogs and forums) and structured questionnaire data sets. The focus is on discovering hidden patterns in the data. To this end, the authors present work in progress on opinion topic relation extraction and visual analytics, linguistic resource construction as well as the combination of OLAP technology with semantic search.

Download Full-text

Self-Supervised Chinese Ontology Learning from Online Encyclopedias

The Scientific World JOURNAL ◽

10.1155/2014/848631 ◽

2014 ◽

Vol 2014 ◽

pp. 1-13 ◽

Cited By ~ 6

Author(s):

Fanghuai Hu ◽

Zhiqing Shao ◽

Tong Ruan

Keyword(s):

Machine Learning ◽

Structural Information ◽

Relation Extraction ◽

Knowledge Bases ◽

The Self ◽

Supervised Machine Learning ◽

Ontology Learning ◽

High Coverage ◽

Category Labels ◽

Training Examples

Constructing ontology manually is a time-consuming, error-prone, and tedious task. We present SSCO, a self-supervised learning based chinese ontology, which contains about 255 thousand concepts, 5 million entities, and 40 million facts. We explore the three largest online Chinese encyclopedias for ontology learning and describe how to transfer the structured knowledge in encyclopedias, including article titles, category labels, redirection pages, taxonomy systems, and InfoBox modules, into ontological form. In order to avoid the errors in encyclopedias and enrich the learnt ontology, we also apply some machine learning based methods. First, we proof that the self-supervised machine learning method is practicable in Chinese relation extraction (at least for synonymy and hyponymy) statistically and experimentally and train some self-supervised models (SVMs and CRFs) for synonymy extraction, concept-subconcept relation extraction, and concept-instance relation extraction; the advantages of our methods are that all training examples are automatically generated from the structural information of encyclopedias and a few general heuristic rules. Finally, we evaluate SSCO in two aspects, scale and precision; manual evaluation results show that the ontology has excellent precision, and high coverage is concluded by comparing SSCO with other famous ontologies and knowledge bases; the experiment results also indicate that the self-supervised models obviously enrich SSCO.

Download Full-text

Single Term Concepts from English Translated Qur’an Using Statistical Methods

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.14.11144 ◽

2018 ◽

Vol 7 (2.14) ◽

pp. 13

Author(s):

Rohana Ismail ◽

Nurazzah Abd. Rahman ◽

Zainab Abu Bakar

Keyword(s):

Statistical Method ◽

Statistical Methods ◽

Ontology Development ◽

Knowledge Based Systems ◽

Ontology Learning ◽

Concept Extraction ◽

Knowledge Based ◽

Single Term ◽

Development Concept ◽

F Measure

Ontology is essential for the success of knowledge based systems because it has the opportunity to share vocabulary, integrate knowledge easily and discover new instances or relations. However, the development of ontology via manual is time consuming and tedious task. Thus, ontology learning comes to play it roles. The ontology learning tries to extract ontological elements to support the ontology development. Concept extraction is one of the important tasks in ontology learning. The Hajj domain of Quranic study, concepts have not fully discovered. Hence, this paper tries to discover concepts by extracting the single terms from Qur’an translated version. It provides result on extracting the single terms as concepts by using statistical methods. Apart from that, it has been experimented for English Translated Quran by Hilali Khan. Result shows that the performance of using tf method as a statistical method is significant with the f-measure value is 0.509. Based on the tf, the comparisons have been made for other statistical methods such as tfidf, Avetf and Ridf.

Download Full-text

Domain concept extraction algorithm based on Semantic Web Ontology learning

2010 International Conference on Educational and Network Technology ◽

10.1109/icent.2010.5532133 ◽

2010 ◽

Author(s):

Li Xueyong ◽

Zhu yanli ◽

Wang Quanrui ◽

Shan Yongqiang

Keyword(s):

Semantic Web ◽

Ontology Learning ◽

Concept Extraction ◽

Extraction Algorithm ◽

Domain Concept

Download Full-text

Axiom-Based Feedback Cycle for Relation Extraction in Ontology Learning from Text

2008 19th International Conference on Database and Expert Systems Applications ◽

10.1109/dexa.2008.134 ◽

2008 ◽

Cited By ~ 1

Author(s):

Witold Abramowicz ◽

Maria Vargas-Vera ◽

Marek Wisniewski

Keyword(s):

Relation Extraction ◽

Ontology Learning ◽

Feedback Cycle ◽

Learning From Text

Download Full-text

A Semantic Framework for Extracting Taxonomic Relations from Text Corpus

The International Arab Journal of Information Technology ◽

10.34028/iajit/17/3/6 ◽

2019 ◽

Vol 17 (3) ◽

pp. 325-337

Author(s):

Phuoc Thi Hong Doan ◽

Ngamnij Arch-int ◽

Somjit Arch-int

Keyword(s):

Word Sense Disambiguation ◽

Relation Extraction ◽

Training Data ◽

Word Sense ◽

Text Corpus ◽

Semantic Framework ◽

Concept Extraction ◽

Taxonomic Relation ◽

Important Challenge ◽

Taxonomic Relations

Nowadays, ontologies have been exploited in many current applications due to the abilities in representing knowledge and inferring new knowledge. However, the manual construction of ontologies is tedious and time-consuming. Therefore, the automated ontology construction from text has been investigated. The extraction of taxonomic relations between concepts is a crucial step in constructing domain ontologies. To obtain taxonomic relations from a text corpus, especially when the data is deficient, the approach of using the web as a source of collective knowledge (a.k.a web-based approach) is usually applied. The important challenge of this approach is how to collect relevant knowledge from a large amount of web pages. To overcome this issue, we propose a framework that combines Word Sense Disambiguation (WSD) and web approach to extract taxonomic relations from a domain-text corpus. This framework consists of two main stages: concept extraction and taxonomic-relation extraction. Concepts acquired from the concept-extraction stage are disambiguated through WSD module and passed to stage of extraction taxonomic relations afterward. To evaluate the efficiency of the proposed framework, we conduct experiments on datasets about two domains of tourism and sport. The obtained results show that the proposed method is efficient in corpora which are insufficient or have no training data. Besides, the proposed method outperforms the state of the art method in corpora having high WSD results.

Download Full-text