Hospital Ratings Are Deeply Flawed. Can They Be Fixed?
The most influential rating system rests on some faulty calculations, affecting millions of people and billions of dollars.
Hospital Ratings Are Deeply Flawed. Can They Be Fixed?Bayesian models are increasingly fit to large administrative datasets and then used to make individualized recommendations. In particular, Medicare’s Hospital Compare webpage provides information to patients about specific hospital mortality rates for a heart attack or acute myocardial infarction (AMI). Hospital Compare’s current recommendations are based on a random-effects logit model with a random hospital indicator and patient risk factors. Except for the largest hospitals, these individual recommendations or predictions are not checkable against data, because data from smaller hospitals are too limited to provide a meaningful check. Before individualized Bayesian recommendations, people derived general advice from empirical studies of many hospitals, for example, prefer hospitals of Type 1 to Type 2 because the risk is lower at Type 1 hospitals. Here, we calibrate these Bayesian recommendation systems by checking, out of sample, whether their predictions aggregate to give correct general advice derived from another sample. This process of calibrating individualized predictions against general empirical advice leads to substantial revisions in the Hospital Compare model for AMI mortality. To make appropriately calibrated predictions, our revised models incorporate information about hospital volume, nursing staff, medical residents, and the hospital’s ability to perform cardiovascular procedures. For the ultimate purpose of comparisons, hospital mortality rates must be standardized to adjust for patient mix variation across hospitals. We find that indirect standardization, as currently used by Hospital Compare, fails to adequately control for differences in patient risk factors and systematically underestimates mortality rates at the low volume hospitals. To provide good control and correctly calibrated rates, we propose direct standardization instead. Supplementary materials for this article are available online.
Published in: Journal of the American Statistical Association
The most influential rating system rests on some faulty calculations, affecting millions of people and billions of dollars.
Hospital Ratings Are Deeply Flawed. Can They Be Fixed?