User-Defined Inverted Index in Boolean, Rule-Based Entity Resolution Systems

Author(s):  
Yinle Zhou ◽  
John R. Talburt ◽  
Eric Nelson
Author(s):  
Yinle Zhou ◽  
John R. Talburt

Inverted indexing is a commonly used technique for improving the performance of entity resolution algorithms by reducing the number of pair-wise comparisons necessary to arrive at acceptable results. This chapter describes how inverted indexing can also be used as a data partitioning strategy to perform entity resolution on large datasets in a distributed processing environment. This chapter discusses the importance of index-to-rule alignment, pre-resolution index closure, post-resolution link closure, and workflows for record-based identity capture and update, and attribute-based identity capture and update in a distributed processing environment.


2013 ◽  
Vol 22 (01) ◽  
pp. 1350006
Author(s):  
YAKUN LI ◽  
HONGZHI WANG ◽  
HONG GAO ◽  
JIANZHONG LI

Entity resolution (ER) is to find the data objects referring to the same real-world entity. When ER is performed on relations, the crucial operator is record matching, which is to judge whether two tuples refer to the same real-world entity. Record matching is a longstanding issue. However, with massive and complex data in applications, current methods cannot satisfy the requirements. A Sequence-rule-based record matching (SeReMatching) is presented with the consideration of both which attributes should be used and their importance in record matching. We have changed the Bloom filter and therefore the checking speed is greatly increased. The best performance of the algorithm makes the complexity of entity resolution O (n). And extensive experiments were performed to evaluate our methods.


2015 ◽  
Vol 27 (1) ◽  
pp. 250-263 ◽  
Author(s):  
Lingli Li ◽  
Jianzhong Li ◽  
Hong Gao
Keyword(s):  

A basic work of entity resolution is to detect duplicate records in single relation. To address this problem, many different approaches for different areas are proposed. The basic process of entity resolution is attribute similarity computation. Based on the attribute similarity computation methods, many techniques for different areas are proposed to fulfill the process of entity resolution. Rule-based approach is one of the main techniques for entity resolution. To speed up the process of duplicate record detecting, the authors use techniques such as canopy and blocking. In this chapter, the authors focus on the record similarity computation, rule-based approach, similarity threshold computation, and blocking.


2018 ◽  
Vol 36 (3) ◽  
pp. 593-612 ◽  
Author(s):  
Hiba Abu Ahmad ◽  
Hongzhi Wang
Keyword(s):  

1992 ◽  
Vol 23 (1) ◽  
pp. 52-60 ◽  
Author(s):  
Pamela G. Garn-Nunn ◽  
Vicki Martin

This study explored whether or not standard administration and scoring of conventional articulation tests accurately identified children as phonologically disordered and whether or not information from these tests established severity level and programming needs. Results of standard scoring procedures from the Assessment of Phonological Processes-Revised, the Goldman-Fristoe Test of Articulation, the Photo Articulation Test, and the Weiss Comprehensive Articulation Test were compared for 20 phonologically impaired children. All tests identified the children as phonologically delayed/disordered, but the conventional tests failed to clearly and consistently differentiate varying severity levels. Conventional test results also showed limitations in error sensitivity, ease of computation for scoring procedures, and implications for remediation programming. The use of some type of rule-based analysis for phonologically impaired children is highly recommended.


Sign in / Sign up

Export Citation Format

Share Document