User-Defined Inverted Index in Boolean, Rule-Based Entity Resolution Systems

Strategies for Large-Scale Entity Resolution Based on Inverted Index Data Partitioning

Information Quality and Governance for Business Intelligence - Advances in Business Strategy and Competitive Advantage ◽

10.4018/978-1-4666-4892-0.ch017 ◽

2014 ◽

pp. 329-351

Author(s):

Yinle Zhou ◽

John R. Talburt

Keyword(s):

Large Scale ◽

Distributed Processing ◽

Entity Resolution ◽

Large Datasets ◽

Data Partitioning ◽

Inverted Index ◽

Index Data ◽

Inverted Indexing ◽

Partitioning Strategy

Inverted indexing is a commonly used technique for improving the performance of entity resolution algorithms by reducing the number of pair-wise comparisons necessary to arrive at acceptable results. This chapter describes how inverted indexing can also be used as a data partitioning strategy to perform entity resolution on large datasets in a distributed processing environment. This chapter discusses the importance of index-to-rule alignment, pre-resolution index closure, post-resolution link closure, and workflows for record-based identity capture and update, and attribute-based identity capture and update in a distributed processing environment.

Download Full-text

Rule based method for entity resolution using distinct tree construction

2016 International Conference on Communication Systems and Networks (ComNet) ◽

10.1109/csn.2016.7824003 ◽

2016 ◽

Author(s):

P. Ammu Archa ◽

Lekshmy. D. Kumar

Keyword(s):

Entity Resolution ◽

Rule Based ◽

Tree Construction

Download Full-text

AN EFFICIENT ENTITY RESOLUTION METHOD FOR LARGE RELATIONS

International Journal of Cooperative Information Systems ◽

10.1142/s0218843013500068 ◽

2013 ◽

Vol 22 (01) ◽

pp. 1350006

Author(s):

YAKUN LI ◽

HONGZHI WANG ◽

HONG GAO ◽

JIANZHONG LI

Keyword(s):

Real World ◽

Bloom Filter ◽

Entity Resolution ◽

Complex Data ◽

Resolution Method ◽

Rule Based ◽

Record Matching ◽

Data Objects ◽

Sequence Rule

Entity resolution (ER) is to find the data objects referring to the same real-world entity. When ER is performed on relations, the crucial operator is record matching, which is to judge whether two tuples refer to the same real-world entity. Record matching is a longstanding issue. However, with massive and complex data in applications, current methods cannot satisfy the requirements. A Sequence-rule-based record matching (SeReMatching) is presented with the consideration of both which attributes should be used and their importance in record matching. We have changed the Bloom filter and therefore the checking speed is greatly increased. The best performance of the algorithm makes the complexity of entity resolution O (n). And extensive experiments were performed to evaluate our methods.

Download Full-text

Rule-Based Method for Entity Resolution

IEEE Transactions on Knowledge and Data Engineering ◽

10.1109/tkde.2014.2320713 ◽

2015 ◽

Vol 27 (1) ◽

pp. 250-263 ◽

Cited By ~ 19

Author(s):

Lingli Li ◽

Jianzhong Li ◽

Hong Gao

Keyword(s):

Entity Resolution ◽

Rule Based

Download Full-text

Entity Resolution on Single Relation

Advances in Data Mining and Database Management - Innovative Techniques and Applications of Entity Resolution ◽

10.4018/978-1-4666-5198-2.ch005 ◽

2014 ◽

pp. 87-122

Keyword(s):

Entity Resolution ◽

Basic Process ◽

Rule Based ◽

Resolution Rule ◽

Basic Work ◽

Speed Up ◽

Similarity Computation ◽

Rule Based Approach ◽

Single Relation ◽

Record Similarity

A basic work of entity resolution is to detect duplicate records in single relation. To address this problem, many different approaches for different areas are proposed. The basic process of entity resolution is attribute similarity computation. Based on the attribute similarity computation methods, many techniques for different areas are proposed to fulfill the process of entity resolution. Rule-based approach is one of the main techniques for entity resolution. To speed up the process of duplicate record detecting, the authors use techniques such as canopy and blocking. In this chapter, the authors focus on the record similarity computation, rule-based approach, similarity threshold computation, and blocking.

Download Full-text

Rule-Based Entity Resolution on Database with Hidden Temporal Information (Extended Abstract)

2019 IEEE 35th International Conference on Data Engineering (ICDE) ◽

10.1109/icde.2019.00266 ◽

2019 ◽

Author(s):

Hongzhi Wang ◽

Xiaoou Ding ◽

Jianzhong Li ◽

Hong Gao

Keyword(s):

Entity Resolution ◽

Temporal Information ◽

Rule Based

Download Full-text

An effective weighted rule-based method for entity resolution

Distributed and Parallel Databases ◽

10.1007/s10619-018-7240-6 ◽

2018 ◽

Vol 36 (3) ◽

pp. 593-612 ◽

Cited By ~ 4

Author(s):

Hiba Abu Ahmad ◽

Hongzhi Wang

Keyword(s):

Entity Resolution ◽

Rule Based

Download Full-text

Rule-based Entity Resolution on Database with hidden temporal Information

IEEE Transactions on Knowledge and Data Engineering ◽

10.1109/tkde.2018.2816018 ◽

2018 ◽

pp. 1-1 ◽

Cited By ~ 2

Author(s):

Hongzhi Wang ◽

Xiaoou Ding ◽

Jianzhong Li ◽

Hong Gao

Keyword(s):

Entity Resolution ◽

Temporal Information ◽

Rule Based

Download Full-text

Entity Resolution Using Logistic Regression as an extension to the Rule-Based Oyster System

2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR) ◽

10.1109/mipr.2018.00033 ◽

2018 ◽

Cited By ~ 1

Author(s):

Fumiko Kobayashi ◽

Aziz Eram ◽

John Talburt

Keyword(s):

Logistic Regression ◽

Entity Resolution ◽

Rule Based

Download Full-text

Using Conventional Articulation Tests With Highly Unintelligible Children

Language Speech and Hearing Services in Schools ◽

10.1044/0161-1461.2301.52 ◽

1992 ◽

Vol 23 (1) ◽

pp. 52-60 ◽

Cited By ~ 1

Author(s):

Pamela G. Garn-Nunn ◽

Vicki Martin

Keyword(s):

Test Results ◽

Phonological Processes ◽

Rule Based ◽

Severity Level ◽

Error Sensitivity ◽

Conventional Tests ◽

Severity Levels ◽

Conventional Test ◽

Impaired Children

This study explored whether or not standard administration and scoring of conventional articulation tests accurately identified children as phonologically disordered and whether or not information from these tests established severity level and programming needs. Results of standard scoring procedures from the Assessment of Phonological Processes-Revised, the Goldman-Fristoe Test of Articulation, the Photo Articulation Test, and the Weiss Comprehensive Articulation Test were compared for 20 phonologically impaired children. All tests identified the children as phonologically delayed/disordered, but the conventional tests failed to clearly and consistently differentiate varying severity levels. Conventional test results also showed limitations in error sensitivity, ease of computation for scoring procedures, and implications for remediation programming. The use of some type of rule-based analysis for phonologically impaired children is highly recommended.

Download Full-text