Risk models for recurrence and survival after kidney cancer: a systematic review
Usher-Smith JA., Li L., Roberts L., Harrison H., Rossi SH., Sharp SJ., Coupland C., Hippisley-Cox J., Griffin SJ., Klatte T., Stewart GD.
Objective: To systematically identify and compare the performance of prognostic models providing estimates of survival or recurrence of localized renal cell cancer (RCC) in patients treated with surgery with curative intent. Materials and Methods: We performed a systematic review (PROSPERO CRD42019162349). We searched Medline, EMBASE and the Cochrane Library from 1 January 2000 to 12 December 2019 to identify studies reporting the performance of one or more prognostic model(s) that predict recurrence-free survival (RFS), cancer-specific survival (CSS) or overall survival (OS) in patients who have undergone surgical resection for localized RCC. For each outcome we summarized the discrimination of each model using the C-statistic and performed multivariate random-effects meta-analysis of the logit transformed C-statistic to rank the models. Results: Of a total of 13 549 articles, 57 included data on the performance of 22 models in external populations. C-statistics ranged from 0.59 to 0.90. Several risk models were assessed in two or more external populations and had similarly high discriminative performance. For RFS, these were the Sorbellini, Karakiewicz, Leibovich and Kattan models, with the UCLA Integrated Staging System model also having similar performance in European/US populations. All had C-statistics ≥0.75 in at least half of the validations. For CSS, they the models with the highest discriminative performance in two or more external validation studies were the Zisman, Stage, Size, Grade and Necrosis (SSIGN), Karakiewicz, Leibovich and Sorbellini models (C-statistic ≥0.80 in at least half of the validations), and for OS they were the Leibovich, Karakiewicz, Sorbellini and SSIGN models. For all outcomes, the models based on clinical features at presentation alone (Cindolo and Yaycioglu) had consistently lower discrimination. Estimates of model calibration were only infrequently included but most underestimated survival. Conclusion: Several models had good discriminative ability, with there being no single ‘best’ model. The choice from these models for each setting should be informed by both the comparative performance and availability of factors included in the models. All would need recalibration if used to provide absolute survival estimates.