Cookies on this website
We use cookies to ensure that we give you the best experience on our website. If you click 'Continue' we'll assume that you are happy to receive all cookies and you won't see this message again. Click 'Find out more' for information on how to change your cookie settings.

Electronic healthcare records are increasingly used for epidemiological research and in particular for cohort studies, studying the risk of disease or death over time. Data within electronic healthcare records are recorded by the healthcare professionals and administrative staff responsible for patient care, with minimal (if any) data quality checks. Recording of chronic disease diagnoses (in terms of presence/ absence of disease) has been shown to be accurate and this is likely driven by incentive schemes within the NHS for chronic disease management, such as the Quality and Outcomes Framework. However, the specific dates of chronic disease diagnoses are known to be recorded less accurately. Recorded diagnosis dates are typically later than “true” diagnosis dates or arbitrarily recorded as 1st of each month, or 1st January where the true day and/or month of diagnosis is unknown. The extent to which these types of errors affect data recorded across different clinical systems or for different diseases is currently unknown. These types of errors have implications for how to define dates of diagnoses using routine data. They may also introduce biases into analyses of time-to-event outcomes, either where the time to event is measured from the point of diagnosis (for example survival following a diagnosis of heart failure), or where the time of diagnosis is of interest (for example time to diagnosis of cardiovascular disease after age 50). The impact of these errors can be reduced by utilising data from multiple sources to more accurately define dates of diagnosis or, more recently, using statistical methods to account for errors in survival time measurement. However, the best approach for dealing with these errors, given different clinical scenarios is unclear. 


This DPhil project will generate evidence for data-driven and statistical methods for handling errors in dates of diagnoses in time-to-event analyses of electronic healthcare records data. The specific chronic conditions to be studied are flexible, but should fit within the expertise of the proposed supervisors (cardiovascular disease, diabetes or cancer). Specific projects are likely to include: 

  • Literature review to identify studies of the accuracy of dates of diagnoses in electronic healthcare records.
  • Literature review to identify statistical methods for measurement error in survival time measurement
  • Evaluation of primary care electronic health records combined with practitioner survey data to examine how the accuracy of diagnosis dates varies according to  (e.g.) clinical systems, presence of incentive schemes, disease presentation.
  • Time-to-event analyses in simulated and electronic healthcare record data to compare statistical and data driven methods to deal with errors.

 This project will rely on access to records from primary care (e.g. CPRD), an enhanced understanding of the processes that generate these records and familiarity with time-to-event analyses.  Given this, the research student is likely to be embedded in the work carried out by the CPRD and Statistics groups.