Data preparation for biomedical knowledge domain visualization: a probabilistic record linkage and information fusion approach to citation data

2021 ◽  
Author(s):  
Marie B. Synnestvedt
2000 ◽  
Vol 16 (2) ◽  
pp. 439-447 ◽  
Author(s):  
Kenneth R. de Camargo Jr. ◽  
Cláudia M. Coeli

Apresenta-se um sistema de relacionamento de bases de dados fundamentado na técnica de relacionamento probabilístico de registros, desenvolvido na linguagem C++ com o ambiente de programação Borland C++ Builder versão 3.0. O sistema foi testado a partir de fontes de dados de diferentes tamanhos, tendo sido avaliado em tempo de processamento e sensibilidade para a identificação de pares verdadeiros. O tempo gasto com o processamento dos registros foi menor quando se empregou o programa do que ao ser realizado manualmente, em especial, quando envolveram bases de maior tamanho. As sensibilidades do processo manual e do processo automático foram equivalentes quando utilizaram bases com menor número de registros; entretanto, à medida que as bases aumentaram, percebeu-se tendência de diminuição na sensibilidade apenas no processo manual. Ainda que em fase inicial de desenvolvimento, o sistema apresentou boa performance tanto em velocidade quanto em sensibilidade. Embora a performance dos algoritmos utilizados tenha sido satisfatória, o objetivo é avaliar outras rotinas, buscando aprimorar o desempenho do sistema.


2014 ◽  
Vol 30 (2) ◽  
pp. 433-438 ◽  
Author(s):  
Silvano Barbosa de Oliveira ◽  
Edgar Merchan-Hamann ◽  
Leila Denise Alves Ferreira Amorim

The aim of this study is to estimate the prevalence of HIV/HBV and HIV/HCV coinfections among AIDS cases reported in Brazil, and to describe the epidemiological profile of these cases. Coinfection was identified through probabilistic record linkage of the data of all patients carrying the HIV virus recorded as AIDS patients and of those patients reported as carriers of hepatitis B or C virus in various databases from the Brazilian Ministry of Health from 1999 to 2010. In this period 370,672 AIDS cases were reported, of which 3,724 were HIV/HBV coinfections. Women are less likely to become coinfected than men and the chance of coinfection increases with age. This study allowed an important evaluation of HBV/HIV and HCV/HIV coinfections in Brazil using information obtained via merging secondary databases from the Ministry of Health, without conducting seroprevalence research. The findings of this study might be important for planning activities of the Brazilian epidemiologic surveillance agencies.


Author(s):  
Colin Babyak ◽  
Abdelnasser Saidi

ABSTRACTObjectivesThe objectives of this talk are to introduce Statistics Canada’s Social Data Linkage Environment (SDLE) and to explain the methodology behind the creation of the central depository and how both deterministic and probabilistic record linkage techniques are used to maintain and expand the environment.ApproachWe will start with a brief overview of the SDLE and then continue with a discussion of how both deterministic linkages and probabilistic linkages (using Statistic Canada’s generalized record linkage software, G-Link) have been combined to create and maintain a very large central depository, which can in turn be linked to virtually any social data source for the ultimate end goal of analysis.ResultsAlthough Canada has a population of about 36 million people, the central depository contains some 300 million records to represent them, due to multiple addresses, names, etc. Although this allows for a significant reduction in missing links, it raises the spectre of additional false positive matches and has added computational complexity which we have had to overcome.ConclusionThe combination of deterministic and probabilistic record linkage strategies has been effective in creating the central depository for the SDLE. As more and more data are linked to the environment and we continue to refine our methodology, we can now move on to the ultimate goal of the SDLE, which is to analyze this vast wealth of linked data.


2015 ◽  
Vol 45 (3) ◽  
pp. 954-964 ◽  
Author(s):  
Adrian Sayers ◽  
Yoav Ben-Shlomo ◽  
Ashley W Blom ◽  
Fiona Steele

Author(s):  
Yinghao Zhang ◽  
Senlin Xu ◽  
Mingfan Zheng ◽  
Xinran Li

Record linkage is the task for identifying which records refer to the same entity. When records in different data sources do not have a common key and they contain typographical errors in their identifier fields, the extended Fellegi–Sunter probabilistic record linkage method with consideration of field similarity proposed by Winkler, is one of the most effective methods to perform record linkage to our knowledge. But this method has a limitation that it cannot efficiently handle the problem of missing value in the fields, an inappropriate weight is assigned to record pair containing missing data. Therefore, to improve the performance of Winkler’s probabilistic record linkage method in presence of missing value, we proposed a solution for adjusting record pair’s weight when missing data occurred, which allows enhancing the accuracy of the Winkler’s record linkage decisions without increasing much more computational time.


Author(s):  
Jana Asher ◽  
Dean Resnick ◽  
Jennifer Brite ◽  
Robert Brackbill ◽  
James Cone

Since its post-World War II inception, the science of record linkage has grown exponentially and is used across industrial, governmental, and academic agencies. The academic fields that rely on record linkage are diverse, ranging from history to public health to demography. In this paper, we introduce the different types of data linkage and give a historical context to their development. We then introduce the three types of underlying models for probabilistic record linkage: Fellegi-Sunter-based methods, machine learning methods, and Bayesian methods. Practical considerations, such as data standardization and privacy concerns, are then discussed. Finally, recommendations are given for organizations developing or maintaining record linkage programs, with an emphasis on organizations measuring long-term complications of disasters, such as 9/11.


Sign in / Sign up

Export Citation Format

Share Document