Big data in healthcare: problems and potential
28 April 2016
Big data Research methods & EBM
The term “big data” has become familiar in recent years. In our digital age, vast amounts of information are being generated and stored by companies and governments alike. The simple act of buying a pint of milk can now be documented, tracked and forecast. For better or worse, the healthcare sector and our own NHS have not escaped this trend. So what does “big data” look like when it comes to healthcare, what are the problems for researchers and what are the potential benefits to patients?
What data is available?
I spend a large proportion of my time working with data about consultations in general practice, hospital admissions and causes of death; from diagnoses to treatments, referrals and tests. Given the troubles and subsequent back-tracking surrounding the government’s “care.data” plan back in 2014, some people may be surprised to learn that any data of this nature is available to researchers like me. The reality is that the data I receive looks very different to the information a doctor or nurse might see when they are with their patients.
The reality is that the data I receive looks very different to the information a doctor or nurse might see when they are with their patients.
Firstly, any information that could be used to identify individual patients is removed, including names, addresses, NHS numbers and full dates of birth. So-called “free-text” information from GP notes and letters is also removed. This means that Joe Bloggs, aged 52 from Southampton and John Smith, aged 52 from Portsmouth, cannot be told apart.
There are also significant controls on who can access the data and what data they can access. Researchers must be trained in using the data and agree to store it securely. They must also decide what specific questions they want to answer, explain how they are going to answer them and be clear about what data is needed. If I want to study 10,000 patients with diabetes, I’m not going to receive information about 100,000 people with heart failure. Data is also only available from a few hundred GP practices compared to the nearly 9000 practices across the UK.
As the name suggests, the first problem with “big data” is the size. Even simple consultations, such as a contraceptive pill check, can generate lots of information. If a woman has her blood pressure and weight measured, is asked about her smoking and drinking habits and is then given a prescription, that’s five pieces of information already. Before you know it, you’re dealing with hundreds of thousands of pieces of information and hundreds of gigabytes of data.
The second problem is that of data quality and there are many issues here. Firstly, some people are more likely to see their GP than others, meaning it’s very difficult to find data on young, healthy men, for example. Secondly, GPs aren’t always very good at recording information in a way that is useful to researchers (they like those free text boxes that researchers don’t get to see). For example, some women with a record of pregnancy have no record of a subsequent birth or termination. Can we assume the original record of pregnancy is correct? If so, what do we assume was the outcome? There are plenty of opportunities for research regarding how to deal with problems like this.
We will never know how many patients collect their pills only to flush them down the toilet when nobody is watching.
In terms of treatment, research has shown that only 59% of patients who are treated for high blood pressure take their medication as prescribed. GP data may tell us that a prescription was given, but how can we tell who’s in that 59%? Linking GP and pharmacy data could help with this but we will never know how many patients collect their pills only to flush them down the toilet when nobody is watching.
Similarly, if the data shows that a patient was given the same prescription 11 times in one day, how much can we trust that information? Is it likely that there was a problem with the printer and a frustrated GP clicked “re-issue” 11 times instead of “re-print”? Researchers and doctors must work together to understand how what is recorded in practice relates to the data that people like me see every day.
For all its faults, healthcare data still presents huge opportunities. Firstly, it can be used to identify problems in our healthcare systems and inform government policy for improvements. Not a week goes by without a news story telling us how our hospitals and GP surgeries are full to bursting with sick and elderly patients. Without healthcare data, these studies would not be possible. Once new policies have been introduced, researchers can measure how effective they have been in improving patient care (such as the introduction of GP payments for meeting certain targets).
Studies using healthcare data can replace conventional studies or the data can be used to help identify people with diseases of interest more quickly.
Secondly, there are many disease areas that are under-researched because conventional studies would be too complex or costly to set-up. When diseases are rare, simply finding enough patients with the relevant disease can take several months if not years, and require international collaboration. Studies using healthcare data can replace conventional studies or the data can be used to help identify people with diseases of interest more quickly.
A further benefit over conventional studies is that we now have access to several years’ worth of data. Some side effects of treatment take time to develop and are not picked up by studies lasting a year or two. Although unexpected side effects are already reported to the authorities using the Yellow Card Scheme, healthcare data could help to identify potentially unsafe drugs more quickly.
Finally, even when conventional studies are possible, using healthcare data can be a cheaper and quicker alternative. This frees up time and money for other areas of research that might not be funded otherwise, allowing more to be done with the same precious resources.
It seems inevitable that “big data” in healthcare will only get bigger in the coming years. First and foremost, patients need to be reassured that appropriate safeguards exist to limit the risks of data leaks. If the government can get that right and researchers can overcome some of the challenges I’ve mentioned, the potential of “big data” in healthcare is huge. It would be a shame to waste it.
What to read next
Text reminders – a low-cost strategy for type 2 diabetes?
By Dr Nacho Ricci Cabello. Can text messages help people with type 2 diabetes to achieve a healthier lifestyle?