Schema Matching Quality: Thesaurus as the Matcher

Thabit Sabbah; Ali Selamat

doi:10.11113/jt.v70.3514

Schema Matching Quality: Thesaurus as the Matcher

Jurnal Teknologi ◽

10.11113/jt.v70.3514 ◽

2014 ◽

Vol 70 (5) ◽

Author(s):

Thabit Sabbah ◽

Ali Selamat

Keyword(s):

Information Retrieval ◽

Data Integration ◽

Query Processing ◽

Data Warehousing ◽

Schema Matching ◽

Semantic Query ◽

Integration Data ◽

F Measure

Thesaurus is used in many Information Retrieval (IR) applications such as data integration, data warehousing, semantic query processing and classifiers. It was also utilized to solve the problem of schema matching. Considering the fact of existence of many thesauri for a certain area of knowledge, the quality of schema matching results when using different thesauri in the same field is not predictable. In this paper, we propose a methodology to study the performance of the thesaurus in solving schema matching. The paper also presents results of experiments using different thesauri. Precision, recall, F-measure, and similarity average were calculated to show that the quality of matching changed according to the used thesaurus.

Download Full-text

XML Data Integration

Advanced Applications and Structures in XML Processing ◽

10.4018/978-1-61520-727-5.ch015 ◽

2010 ◽

pp. 333-360 ◽

Cited By ~ 1

Author(s):

Yan Qi ◽

Huiping Cao ◽

K. Selçuk Candan ◽

Maria Luisa Sapino

Keyword(s):

Data Integration ◽

Query Processing ◽

Multiple Sources ◽

Data Types ◽

Xml Data ◽

Data Schema ◽

Integration Data ◽

Feedback Techniques ◽

Different Sources ◽

Xml Data Integration

In XML Data Integration, data/metadata merging and query processing are indispensable. Specifically, merging integrates multiple disparate (heterogeneous and autonomous) input data sources together for further usage, while query processing is one main reason why the data need to be integrated in the first place. Besides, when supported with appropriate user feedback techniques, queries can also provide contexts in which conflicts among the input sources can be interpreted and resolved. The flexibility of XML structure provides opportunities for alleviating some of the difficulties that other less flexible data types face in the presence of uncertainty; yet, this flexibility also introduces new challenges in merging multiple sources and query processing over integrated data. In this chapter, the authors discuss two alternative ways XML data/schema can be integrated: conflict-eliminating (where the result is cleaned from any conflicts that the different sources might have with each other) and conflict-preserving (where the resulting XML data or XML schema captures the alternative interpretations of the data). They also present techniques for query processing over integrated, possibly imprecise, XML data, and cover strategies that can be used for resolving underlying conflicts.

Download Full-text

ANALISIS INTEGRASI DATA PADA RELASIONAL BASIS DATA DENGAN MENGGUNAKAN METODE SCHEMA MATCHING

Jurnal SAINTEKOM ◽

10.33020/saintekom.v9i1.79 ◽

2019 ◽

Vol 9 (1) ◽

pp. 1

Author(s):

Rifqi Hammad

Keyword(s):

Information Technology ◽

Information System ◽

Data Integration ◽

Business Processes ◽

Schema Matching ◽

Matching Methods ◽

F Measure

University is one of the agencies that use information technology to support various business processes. University requires data integration between the systems so that the data available in one system can be used in other systems to support data management.In forwarding data integration there are several obstacles that occur one of the causes is schema heterogeneity used by each information system. linguistic method is one of the schema matching methods used to overcome the problem of schema heterogeneityBased on the analysis of database schemes with the linguistic method, the values of precision, recall and f measure are 0.75. This value indicates that the application of the matching schema has been quite good. But there are still some of the same data between the schemes so that the integration of the data owned is not maximal. So that optimization is still needed to maximize the data integration

Download Full-text

COMPARISON OF VSM, GVSM, AND LSI IN INFORMATION RETRIEVAL FOR INDONESIAN TEXT

Jurnal Teknologi ◽

10.11113/jt.v78.8637 ◽

2016 ◽

Vol 78 (5-6) ◽

Cited By ~ 2

Author(s):

Jasman Pardede ◽

Milda Gustiana Husada

Keyword(s):

Information Retrieval ◽

Vector Space ◽

Word Meaning ◽

Time Response ◽

Retrieval Performance ◽

Dimension Vector ◽

Document Collection ◽

Value Decomposition ◽

F Measure

Vector space model (VSM) is an Information Retrieval (IR) system model that represents query and documents as n-dimension vector. GVSM is an expansion from VSM that represents the documents base on similarity value between query and minterm vector space of documents collection. Minterm vector is defined by the term in query. Therefore, in retrieving a document can be done base on word meaning inside the query. On the contrary, a document can consist the same information semantically. LSI is a method implemented in IR system to retrieve document base on overall meaning of users’ query input from a document, not based on each word translation. LSI uses a matrix algebra technique namely Singular Value Decomposition (SVD). This study discusses the performance of VSM, GVSM and LSI that are implemented on IR to retrieve Indonesian sentences document of .pdf, .doc and .docx extension type files, by using Nazief and Adriani stemming algorithm. Each method implemented either by thread or no-thread. Thread is implemented in preprocessing process in reading each document from document collection and stemming process either for query or documents. The quality of information retrieval performance is evaluated based-on time response, values of recall, precision, and F-measure were measured. The results show that for each method, the fastest execution time is .docx extension type file followed by .doc and .pdf. For the same document collection, the results show that time response for LSI is more faster, followed by GVSM then VSM. The average of recall value for VSM, GVSM and LSI are 82.86 %, 89.68 % and 84.93 % respectively. The average of precision value for VSM, GVSM and LSI are 64.08 %, 67.51 % and 62.08 % respectively. The average of F-measure value for VSM, GVSM and LSI are 71.95 %, 76.63 % and 71.02 % respectively. Implementation of multithread for preprocessing for VSM, GVSM, and LSI can increase average time response required is about 30.422%, 26.282%, and 31.821% respectively.

Download Full-text

A Scalable Algorithm for One-to-One, Onto, and Partial Schema Matching with Uninterpreted Column Names and Column Values

Journal of Database Management ◽

10.4018/jdm.2014100101 ◽

2014 ◽

Vol 25 (4) ◽

pp. 1-16

Author(s):

Boris Rabinovich ◽

Mark Last

Keyword(s):

Mutual Information ◽

Data Integration ◽

Bayesian Networks ◽

Data Privacy ◽

Privacy Preservation ◽

Data Warehousing ◽

Schema Matching ◽

Database Applications ◽

Scalable Algorithm ◽

Matching Techniques

In this paper, the authors propose a five-step approach to the problem of identifying semantic correspondences between attributes of two database schemas. It is one of the key challenges in many database applications such as data integration and data warehousing. The authors' research is focused on uninterpreted schema matching, where the column names and column values are uninterpreted or unreliable. The approach implements Bayesian networks, Pearson's correlation and mutual information to identify inter-attribute dependencies. Additionally, the authors propose an extension to their algorithm that allows the user to manually enter the known mappings to improve the automated matching results. The five-step approach also allows data privacy preservation. The authors' evaluation experiments show that the proposed approach enhances the current set of schema matching techniques.

Download Full-text

Stain contamination and embedding in electron microscopy

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100141986 ◽

1986 ◽

Vol 44 ◽

pp. 50-53

Author(s):

Hilton H. Mollenhauer

Keyword(s):

Electron Microscopy ◽

Information Retrieval ◽

Epoxy Resins ◽

Image Contrast ◽

Sample Preservation ◽

Factors Associated ◽

Amount Of Information ◽

Electron Microscopical ◽

Contrast Information

Many factors (e.g., resolution of microscope, type of tissue, and preparation of sample) affect electron microscopical images and alter the amount of information that can be retrieved from a specimen. Of interest in this report are those factors associated with the evaluation of epoxy embedded tissues. In this context, informational retrieval is dependant, in part, on the ability to “see” sample detail (e.g., contrast) and, in part, on tue quality of sample preservation. Two aspects of this problem will be discussed: 1) epoxy resins and their effect on image contrast, information retrieval, and sample preservation; and 2) the interaction between some stains commonly used for enhancing contrast and information retrieval.

Download Full-text

Spoken query processing for interactive information retrieval

Data & Knowledge Engineering ◽

10.1016/s0169-023x(02)00024-1 ◽

2002 ◽

Vol 41 (1) ◽

pp. 105-124 ◽

Cited By ~ 13

Author(s):

Fabio Crestani

Keyword(s):

Information Retrieval ◽

Query Processing ◽

Interactive Information Retrieval

Download Full-text

A hybrid semantic query expansion approach for Arabic information retrieval

Journal Of Big Data ◽

10.1186/s40537-020-00310-z ◽

2020 ◽

Vol 7 (1) ◽

Author(s):

Hiba ALMarwi ◽

Mossa Ghurab ◽

Ibrahim Al-Baltah

Keyword(s):

Information Retrieval ◽

Query Expansion ◽

Semantic Query ◽

Arabic Information Retrieval

Download Full-text

Semantic query processing in object-oriented databases using deductive approach

Proceedings of the fourth international conference on Information and knowledge management - CIKM '95 ◽

10.1145/221270.221365 ◽

1995 ◽

Cited By ~ 9

Author(s):

S. C. Yoon ◽

I. Y. Song ◽

E. K. Park

Keyword(s):

Query Processing ◽

Object Oriented ◽

Semantic Query ◽

Object Oriented Databases

Download Full-text

A general model of query processing in information retrieval systems

Information Processing & Management ◽

10.1016/0306-4573(81)90019-4 ◽

1981 ◽

Vol 17 (5) ◽

pp. 249-262 ◽

Cited By ~ 19

Author(s):

Duncan A. Buell

Keyword(s):

Information Retrieval ◽

Query Processing ◽

General Model ◽

Retrieval Systems ◽

Information Retrieval Systems

Download Full-text

A Dynamic Methodology for Improving the Search Experience

Information Technology and Libraries ◽

10.6017/ital.v25i2.3334 ◽

2006 ◽

Vol 25 (2) ◽

pp. 78 ◽

Cited By ~ 1

Author(s):

Marcia D. Kerchner

Keyword(s):

Information Retrieval ◽

Systems Engineering ◽

Information Seeking ◽

Business Owners ◽

Early Years ◽

Search Performance ◽

Working Relationship ◽

Help Desk ◽

Modern Information

In the early years of modern information retrieval, the fundamental way in which we understood and evaluated search performance was by measuring precision and recall. In recent decades, however, models of evaluation have expanded to incorporate the information-seeking task and the quality of its outcome, as well as the value of the information to the user. We have developed a systems engineering-based methodology for improving the whole search experience. The approach focuses on understanding users’ information-seeking problems, understanding who has the problems, and applying solutions that address these problems. This information is gathered through ongoing analysis of site-usage reports, satisfaction surveys, Help Desk reports, and a working relationship with the business owners.

Download Full-text