External validation of multivariable prediction models: A systematic review of methodological conduct and reporting
Background: Before considering whether to use a multivariable (diagnostic or prognostic) prediction model, it is essential that its performance be evaluated in data that were not used to develop the model (referred to as external validation). We critically appraised the methodological conduct and reporting of external validation studies of multivariable prediction models. Methods. We conducted a systematic review of articles describing some form of external validation of one or more multivariable prediction models indexed in PubMed core clinical journals published in 2010. Study data were extracted in duplicate on design, sample size, handling of missing data, reference to the original study developing the prediction models and predictive performance measures. Results: 11,826 articles were identified and 78 were included for full review, which described the evaluation of 120 prediction models. in participant data that were not used to develop the model. Thirty-three articles described both the development of a prediction model and an evaluation of its performance on a separate dataset, and 45 articles described only the evaluation of an existing published prediction model on another dataset. Fifty-seven percent of the prediction models were presented and evaluated as simplified scoring systems. Sixteen percent of articles failed to report the number of outcome events in the validation datasets. Fifty-four percent of studies made no explicit mention of missing data. Sixty-seven percent did not report evaluating model calibration whilst most studies evaluated model discrimination. It was often unclear whether the reported performance measures were for the full regression model or for the simplified models. Conclusions: The vast majority of studies describing some form of external validation of a multivariable prediction model were poorly reported with key details frequently not presented. The validation studies were characterised by poor design, inappropriate handling and acknowledgement of missing data and one of the most key performance measures of prediction models i.e. calibration often omitted from the publication. It may therefore not be surprising that an overwhelming majority of developed prediction models are not used in practice, when there is a dearth of well-conducted and clearly reported (external validation) studies describing their performance on independent participant data. © 2014 Collins et al.; licensee BioMed Central Ltd.