scholarly journals Estimating Measurement Error in SIPP Annual Job Earnings: A Comparison of Census Bureau Survey and SSA Administrative Data

Author(s):  
John M. Abowd ◽  
Martha Harrison Stinson
2021 ◽  
pp. 1-22
Author(s):  
Emily Berg ◽  
Johgho Im ◽  
Zhengyuan Zhu ◽  
Colin Lewis-Beck ◽  
Jie Li

Statistical and administrative agencies often collect information on related parameters. Discrepancies between estimates from distinct data sources can arise due to differences in definitions, reference periods, and data collection protocols. Integrating statistical data with administrative data is appealing for saving data collection costs, reducing respondent burden, and improving the coherence of estimates produced by statistical and administrative agencies. Model based techniques, such as small area estimation and measurement error models, for combining multiple data sources have benefits of transparency, reproducibility, and the ability to provide an estimated uncertainty. Issues associated with integrating statistical data with administrative data are discussed in the context of data from Namibia. The national statistical agency in Namibia produces estimates of crop area using data from probability samples. Simultaneously, the Namibia Ministry of Agriculture, Water, and Forestry obtains crop area estimates through extension programs. We illustrate the use of a structural measurement error model for the purpose of synthesizing the administrative and survey data to form a unified estimate of crop area. Limitations on the available data preclude us from conducting a genuine, thorough application. Nonetheless, our illustration of methodology holds potential use for a general practitioner.


Author(s):  
Jennifer Auer

Federal administrative data is a low-cost and low-burden data source for evidence-based policy making. By linking information from different surveys, or over time, researchers can achieve the sample size and variation needed for advanced econometric methods. However, the personally identifying information (PII) needed to link information means that these data re not available to the public. One solution is to provide technical specifications to the requisite agency(s) to execute the research. This paper outlines the process and pitfalls of drafting specifications for an implementing party who knows more about the data than you do. Drawing on experience from working with the U.S. Census Bureau and knowledge gained from related literatures, such as open-source coding, this paper recommends the depth of description, order of data manipulation and analysis, and requested output to make these collaborative projects successful. A federal administrative data project proposal template is offered. The paper also advises on information that federal agencies can supply to facilitate the use of these important data sources.


Author(s):  
Charles Hokayem ◽  
Trivellore Raghunathan ◽  
Jonathan Rothbaum

Abstract We test an improved imputation technique, sequential regression multivariate imputation (SRMI), for the Current Population Survey Annual Social and Economic Supplement to address match bias. Furthermore, we augment the model with administrative tax data to test for nonignorable nonresponse. Using data from 2009, 2011, and 2013, we find that the current hot deck imputation used by the Census Bureau produces different distribution statistics, downward for poverty and inequality and upward for median income, relative to the SRMI model-based estimates. Our results suggest that these differences are a result of match bias, not nonignorable nonresponse. Nearly all poverty, median income, and inequality estimates are not significantly different when comparing imputation models with and without administrative data. However, there are clear efficiency gains from using administrative data.


Author(s):  
Maxime Lavigne ◽  
Robert Goulden ◽  
Bettina Habib ◽  
Lawrence Joseph ◽  
Nadyne Girard ◽  
...  

IntroductionData from electronic medical records is now readily available and records information needed in pharmacoepidemiological studies not usually found in administrative data such as risk factors and biometrics. Yet, EMR data leads to measurement error due to primary non-adherence. Bayesian bias correction could provide corrected estimates from administrative data. Objectives and ApproachWe present a method for correcting risk estimates from EMR data using linked data. In our example, we estimate the risk of cardiovascular events from oral-hypoglycemics in patients with type-2 diabetes in Boston, Quebec, and the UK between 2009 and 2012. Using linked EMR and administrative data in Quebec, we compute a positive and negative predicting value of prescription on dispensation for each class of oral-hypoglycemics. The cardiovascular risk is then analysed using a bayesian Weibull survival model adjusted for potential confounders. A similar model is then computed that accounts for exposure measurement error using the PPV and NPV. ResultsThe Quebec and Boston cohorts have similar sizes with 1197 and 2346 patients, but the UK was bigger at 41370 patients. In Quebec's data, there were important differences in PPV and NPV by class of oral-hypoglycemics with PPVs for Biguanides at 0.81, Sulphonylureas at 0.65, and others at 0.50. The pattern for NPV differed with the same classes having respectively values of 0.56, 0.97, and 0.99. Estimates from the naïve model are typical of similar analysis but compared to their correction, they were generally overprecise and biased towards the null. The adjusted estimated were adequately representing the increased uncertainty with hazard ratios for Sulphonylureas going from 1.72 (1.22, 2.41) to 3.19 (1.36, 5.93), and from 1.09 (0.86, 1.39) to 1.05 (0.45, 2.16) for no drugs Conclusion/ImplicationsBayesian adjustment for measurement error allowed us to use linked data to regenerate uncertainty and to correct the bias in our risk estimates. Our approach was impacted by the observed low predictive value of prescribing, by reduced transportability of our PPV and NPV estimates, and other sources of bias.


2020 ◽  
pp. 127-144
Author(s):  
T. Yu. Cherkashina

Income is one of the most obvious and frequently used indicators of economic status and living standards. Surveys of households and individuals are the main sources of income data for sociologists and economists. Administrative data is added to them on a growing scale. Comparison of data obtained from different sources or surveys using different methods allows us to estimate biases, sources of errors, and demonstrates the absence of “ideal” income data in general. The review of foreign studies on this problem is supplemented by an example of calculations on data from the The Russia Longitudinal Monitoring Survey — Higher School of Economics (RLMS—HSE): we compare the compositional individual income, calculated as the sum of types of income, and the total personal income reported by respondents. The first measurement of individual incomes has turned out to be more consistent and definite, less prone to measurement error, but gives lower values of individual incomes. The differences of the total personal income reported by respondents and compositional individual income are due not so much to the inaccuracy of the summation and rounding as to “conceptual” features of understanding of personal income by some respondents. Such comparisons are necessary in order to understand the limitations of various measurements of income, grounded and reflexive choice of its specific indicators.


Sign in / Sign up

Export Citation Format

Share Document