OpenSAFELY: A platform for analysing electronic health records designed for reproducible research
Nab L., Schaffer AL., Hulme W., DeVito NJ., Dillingham I., Wiedemann M., Andrews CD., Curtis H., Fisher L., Green A., Massey J., Walters CE., Higgins R., Cunningham C., Morley J., Mehrkar A., Hart L., Davy S., Evans D., Hickman G., Inglesby P., Morton CE., Smith RM., Ward T., O'Dwyer T., Maude S., Bridges L., Butler-Cole BFC., Stables CL., Stokes P., Bates C., Cockburn J., Hester F., Parry J., Bhaskaran K., Schultze A., Rentsch CT., Mathur R., Tomlinson LA., Williamson EJ., Smeeth L., Walker A., Bacon S., MacKenna B., Goldacre B.
Electronic health records (EHRs) and other administrative health data are increasingly used in research to generate evidence on the effectiveness, safety, and utilisation of medical products and services, and to inform public health guidance and policy. Reproducibility is a fundamental step for research credibility and promotes trust in evidence generated from EHRs. At present, ensuring research using EHRs is reproducible can be challenging for researchers. Research software platforms can provide technical solutions to enhance the reproducibility of research conducted using EHRs. In response to the COVID-19 pandemic, we developed the secure, transparent, analytic open-source software platform OpenSAFELY designed with reproducible research in mind. OpenSAFELY mitigates common barriers to reproducible research by: standardising key workflows around data preparation; removing barriers to code-sharing in secure analysis environments; enforcing public sharing of programming code and codelists; ensuring the same computational environment is used everywhere; integrating new and existing tools that encourage and enable the use of reproducible working practices; and providing an audit trail for all code that is run against the real data to increase transparency. This paper describes OpenSAFELY's reproducibility-by-design approach in detail.