Predicting 10-year breast cancer mortality risk in the general female population in England: a model development and validation study
Clift AK., Collins GS., Lord S., Petrou S., Dodwell D., Brady M., Hippisley-Cox J.
BACKGROUND: Identifying female individuals at highest risk of developing life-threatening breast cancers could inform novel stratified early detection and prevention strategies to reduce breast cancer mortality, rather than only considering cancer incidence. We aimed to develop a prognostic model that accurately predicts the 10-year risk of breast cancer mortality in female individuals without breast cancer at baseline. METHODS: In this model development and validation study, we used an open cohort study from the QResearch primary care database, which was linked to secondary care and national cancer and mortality registers in England, UK. The data extracted were from female individuals aged 20-90 years without previous breast cancer or ductal carcinoma in situ who entered the cohort between Jan 1, 2000, and Dec 31, 2020. The primary outcome was breast cancer-related death, which was assessed in the full dataset. Cox proportional hazards, competing risks regression, XGBoost, and neural network modelling approaches were used to predict the risk of breast cancer death within 10 years using routinely collected health-care data. Death due to causes other than breast cancer was the competing risk. Internal-external validation was used to evaluate prognostic model performance (using Harrell's C, calibration slope, and calibration in the large), performance heterogeneity, and transportability. Internal-external validation involved dataset partitioning by time period and geographical region. Decision curve analysis was used to assess clinical utility. FINDINGS: We identified data for 11 626 969 female individuals, with 70 095 574 person-years of follow-up. There were 142 712 (1·2%) diagnoses of breast cancer, 24 043 (0·2%) breast cancer-related deaths, and 696 106 (6·0%) deaths from other causes. Meta-analysis pooled estimates of Harrell's C were highest for the competing risks model (0·932, 95% CI 0·917-0·946). The competing risks model was well calibrated overall (slope 1·011, 95% CI 0·978-1·044), and across different ethnic groups. Decision curve analysis suggested favourable clinical utility across all age groups. The XGBoost and neural network models had variable performance across age and ethnic groups. INTERPRETATION: A model that predicts the combined risk of developing and then dying from breast cancer at the population level could inform stratified screening or chemoprevention strategies. Further evaluation of the competing risks model should comprise effect and health economic assessment of model-informed strategies. FUNDING: Cancer Research UK.