Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

We are developing a data-driven risk prediction model for COVID-19, supported by NHS Digital. 

In the UK, government guidance on COVID-19 identifies individuals based on three broad categories of risk, with those who are ‘clinically extremely vulnerable’ to the disease previously being advised to shield themselves from the virus. 

This new model could be applied in a variety of health and care settings, including supporting GPs and specialists in consultations with their patients to provide more targeted advice based on individual levels of risk.

The project team

The project was a commission from the Office of the Chief Medical Officer for England to NERVTAG (New and Emerging Respiratory Virus Threats Advisory Group), who established the parameters and brought together the team as a sub-group of NERVTAG. 

This team is led by Professor Julia Hippisley-Cox and the Primary Care Epidemiology Group in the University of Oxford's Nuffield Department of Primary Care Health Sciences. 

The research is funded by the National Institute for Health Research Oxford Biomedical Research Centre, and the University of Oxford COVID-19 Rapid Response Fund with support from Wellcome and Cancer Research UK.

Partner institutions include:

University of Cambridge University of Edinburgh
University of Swansea University of Leicester
University of Nottingham University of Liverpool
London School of Hygiene and Tropical Medicine Queen’s University Belfast
Queen Mary, University of London University College London
Department of Health and Social Care NHS Digital
NHS England


Routinely collected anonymised electronic health records of 8 million adults in the UK, accessed through the University of Oxford’s QResearch database and linked datasets will be analysed to identify factors that can be used to predict those at highest risk of infection and serious illness from COVID-19. These include age, sex, ethnicity, deprivation, smoking status, body mass index, pre-existing medical conditions and current medications.

Algorithms from the data analysis, will be developed in conjunction with clinical and data experts at NHS Digital and will drive a clinical risk prediction model which can be applied across various health and care settings. 

The research team are planning to utilise other datasets from across all four nations of the UK to validate their model and offer a unified approach to evidence-based risk stratification policy.

intended Use

Individualised risk assessment could be used to improve shared decision-making between clinicians and patients based on more accurate information as well as discussions on how to reduce risk from COVID-19.

The model could also be used to inform mathematical modelling of the potential impact of national public health policies on shielding and preventing infection and potentially help identify those at highest risk to be vaccinated, when available.


This study is using QResearch, a large database of over 35 million people that has collected data from a general practice computer system called EMIS since 1989 ( 

Where available, the individuals in the database have had their GP data in QResearch linked to data in the Public Health England COVID-19 test registry, hospital admissions data and the Office of National Statistics death registry. The research involves looking at data regarding age, sex, ethnicity, medical conditions, prescribed medications and lifestyle factors such as smoking, alcohol and social circumstances such as deprivation.   


All data is anonymised (meaning that individuals cannot be identified) and is handled and analysed on secure servers that only some of the researchers can access. Data privacy and ethics are taken extremely seriously by all researchers, and best practice has been and will be complied with throughout the project. The QResearch database itself has a long-standing ethical approval and individual patients can choose to opt-out of it.    

This research is reviewing a very wide range of factors, including dozens of demographic, medical, prescription and lifestyle ‘data points’. The medical conditions considered are diverse in terms of the parts of the body they affect, how rare they are, their ‘type’ (cancer, degenerative, inflammatory, genetic) and the age groups they affect. These have been identified by a team of medical, epidemiological, statistical and public health experts, based on evidence that has emerged during the pandemic and knowledge from other conditions that may be helpful.   

The voices of patients, advocacy groups, charities and other expert groups are incredibly important, as are their lived experiences. Due to time constraints and the confidential nature of the project until now, we have not yet been able to consult with a wide range of patient groups or charities. As such, we are starting a consultation process with a range of patient groups and charities following the publication of the study protocol.  

We are also liaising with clinical experts and other leaders across the spectrum of considered conditions or social groups to engage with them regarding how to be inclusive, scientifically robust and fair when considering which factors to consider for the research. This will help to build a tool that is meaningful, deployable and relevant to the population. QResearch has had patient representation on its advisory group for many years.  

Starting the development of a risk tool/calculator is an important first step.  Deciding how the finished tool will be used across the health and care system will involve discussions with experts in clinical practice, public health and policy as well as patient groups, charities and user group research.

One of the areas we are exploring is whether we can turn it into a clinical tool to support conversations between patients and doctors/nurses, to provide more targeted advice to the clinically vulnerable, based on levels of individual risk.   

At the start of the pandemic, very little was known about COVID-19 and the virus that causes the disease. As the pandemic progressed, more data and understanding have now emerged to guide decision making. 

This study started after a period in which enough clinical data has been collected for a more robust and reliable analysis. Although the UK may be past the initial peak of the pandemic, as we move to ease lockdown measures and restore health services and economic activity, our understanding of the virus means doctors and nurses and policy makers can be better guided by the evidence collected so far to keep people safe now and in the future 


Since emerging data suggests that ethnicity, age, and several health conditions are associated with risk of severe COVID-19, many of these factors are included in this research, including ethnicity.   

This study is focused on developing a statistical model that takes multiple factors into account to estimate the risk of any given individual of both catching the virus that causes COVID-19 and then dying from it; or, catching the virus and being admitted to hospital. As such, it is not looking to identify the causes for severe COVID-19. Instead, it is using multiple forms of information to provide a meaningful basis for clinicians and patients to understand and potentially act on their level of risk.    


Our team