Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

Background: An important consideration for studies that derive utility scores using multi-attribute utility measures is the psychometric integrity of the measurement instrument. Of particular importance is the requirement to establish the empirical validity of multi-attribute utility measures; that is, whether they generate utility scores that, in practice, reflect people's preferences. We compared the empirical validity of EQ-5D versus SF-6D utility scores based on hypothetical preferences in a large, representative sample of the English population. Methods: Adult participants in the 1996 Health Survey for England (n = 16443) formed the basis of the investigation. The subjects were asked to complete the EQ-5D and SF-36 measures. Their responses were converted into utility scores using the York A1 tariff set and the SF-6D utility algorithm, respectively. One-way analysis of variance was used to test the hypothetically constructed preference rule that each set of utility scores differs significantly by self-reported health status (categorised as very good, good, fair, bad or very bad). The degree to which EQ-5D and SF-6D utility scores reflect alternative configurations of self-reported health status; illness, disability or infirmity, and medication use was tested using the relative efficiency statistic and receiver operating characteristic (ROC) curves. Results: The mean utility score for the EQ-5D was 0.845 (95% CI: 0.842, 0.849), whilst the mean utility score for the SF-6D was 0.799 (95% CI: 0.797, 0.802), representing a mean difference in utility score of 0.046 (95% CI: 0.044, 0.049; p<0.001). Bland-Altman plots displayed considerable lack of agreement between the two measures, particularly at the lower end of the utility scale. Both measures demonstrated statistically significant differences between subjects who described their health status as very good, good, fair, bad or very bad (p<0.001), as well as monotonically decreasing utility scores (test for linear trend: p<0.001). The SF-6D was between 30.9 and 100.4% more efficient than the EQ-5D at detecting differences in self-reported health status, and between 10.4 and 45.6% more efficient at detecting differences in illness, disability or infirmity and medication use. The area under the curve scores generated by the ROC curves were significantly higher for the SF-6D at the 0.1% significance level when self-reported health status was dichotomised as very good versus good, fair, bad or very bad. However, the AUC scores did not reveal any significant differences in the discriminatory powers of the measures when alternative configurations of illness, disability or infirmity and medication use were examined. Conclusions: This study provides evidence that the SF-6D is an empirically valid and efficient alternative multi-attribute utility measure to the EQ-5D, and is capable of discriminating between external indicators of health status. However, health economists should also consider other psychometric properties, such as practicality and reliability, when selecting either measure for evaluative purposes. Copyright © 2005 John Wiley & Sons, Ltd.

Original publication




Journal article


Health Economics

Publication Date





1169 - 1189