Disparity of imputed data from small area estimate approaches – A case study on diabetes prevalence at the county level in the U.S.

Lung Chang Chien*, Ge Lin, Xiao Li, Xingyou Zhang

*Corresponding author for this work

Research output: Contribution to journalJournal Articlepeer-review

Abstract

This paper assesses concordance and inconsistency among three small area estimation methods that are currently providing county-level health indicators in the United States. The three methods are multi-level logistic regression, spatial logistic regression, and spatial Poison regression, all proposed since 2010. Diabetes prevalence is estimated for each county in the continental United States from the 2012 sample of Behavioral Risk Factor Surveillance System. The mapping results show that all three methods displayed elevated diabetes prevalence in the South. While the Pearson correlation coefficients among three model-based estimates were all above 0.60, the highest one was 0.80 between the multilevel and spatial logistic methods. While point estimates are apparently different among the three small area estimate methods, their top and bottom of quintile distributions are fairly consistent based on Bangdiwala’s B-statistic, suggesting that outputs from each method would support consistent policy making in terms of identifying top and bottom percent counties.

Original languageEnglish
Article number8
JournalData Science Journal
Volume17
DOIs
Publication statusPublished - 1 Apr 2018
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2018, Ubiquity Press Ltd. All rights reserved.

Keywords

  • Diabetes prevalence
  • Multi-level logistic regression
  • Small area estimate
  • Spatial Poisson regression
  • Spatial logistic regression

Fingerprint

Dive into the research topics of 'Disparity of imputed data from small area estimate approaches – A case study on diabetes prevalence at the county level in the U.S.'. Together they form a unique fingerprint.

Cite this