A recent BBC news article reported that the NHS was “confusing the public by using ‘gobbledygook’”. This is particularly true when it comes to the words used to describe patient data. Words like ‘secondary uses’, ‘pseudonymised’, ‘key-coded’, and ‘de-identified for limited disclosure’ are confusing and unnecessarily technical. They may sound like ‘gobbledygook’, acting as a significant barrier to honest discussions.

We think that an important part of improving conversations about patient data is getting the language right, using words that are accurate but also clear and meaningful. We commissioned the consultancy, Good Business to find the best words and imagery for talking about the use of data in care, treatment and research, and to describe different levels of identifiability.

Good Business began with a creative workshop with people who are good with words – verbal branding specialists, speech writers, journalists, linguists. The outputs from the workshop were checked with expert stakeholders, and then taken through rounds of quick fire testing in focus groups with the public and healthcare professionals. This led to a set of preferred terms, which we have continued to test and refine with patients and IG experts over the last few months. We are publishing the findings today.  

The research findings

The results are illuminating. For example, although many healthcare professionals and policy makers routinely use the phrase ‘direct care’, we found that it was meaningless to the public. As one participant said, “it makes me think of health insurance or credit cards”. The phrase ‘individual care’ seems a more effective way to help people understand the concept of personally tailored care. By extension, ‘purposes beyond direct care’ clearly will not be meaningful. Instead, we found that the phrase “improving health, care and services, through research and planning”, while long, was well received and helped people to understand the full range of purposes covered by the vague phrase ‘secondary uses’. The phrase helped ensure that there were “no surprises” about the types of data use, as people recognised that use by academics or commercial organisations would fall under this umbrella.

Finding words to describe different levels of identifiability was, unsurprisingly, much more difficult. We found that using images is more powerful. A very simple concept – using a photo of someone, a pixelated picture and a group silhouette – seems to explain the concept of anonymisation clearly, in a way that words have not previously achieved.

A clear photo of a women, a blurred photo of a women, and a silhouette of a crowd of people

What does ‘anonymised’ mean?

Using these images as a starting point, we need to get clearer on the word ‘anonymised’, to ensure it is only used to describe data that has been anonymised according to the ICO code of anonymisation. This is really important because one of the biggest concerns people have about the use of data is whether the information could be traced back to them personally. The best way to do that is to be clear that there are two different types of anonymised information: one individual-level, one grouped. It’s important to distinguish between them, because the risks of re-identification are different, and therefore the data has to be protected in different ways. 

People are familiar with the idea of aggregate or grouped data, where data is presented as statistics or general trends so it would not be possible to identify an individual – it is effectively “anonymous”. But the concept of data that has been de-identified but is still at an individual level has never been well explained or understood. We therefore think it is helpful to use a new word here to help introduce this concept. Our testing suggests that the word ‘de-personalised’ helps convey the meaning effectively. The data has been through a process to remove personal identifiers – it is “de-personalised” - but it would still theoretically be possible to reverse that process and re-identify someone if you had access to enough additional information.

Perhaps understandably, not everyone likes the word ‘de-personalised’ – to some people, it sounded “not human” – but the important thing is that it seems to make sense. During the testing, an exercise to map different levels of identifiability along a spectrum suggested a surprisingly high level of understanding after very little discussion. The word ‘de-identified’ did not mean anything to people, and other alternatives, such as non-identified, non-personal or un-identifiable,  did not convey the idea of a process that could be reversed.

Next steps

We want these words and images to be used as widely as possible, and not just in the health sector. The accompanying resources that we are publishing today (under a creative commons license) are aimed at those people who are having conversations with the public about data. 

We are continuing to develop other resources including an animation explaining the spectrum of identifiability, which will be aimed at the public directly.

We fully recognise that technical conversations between professionals will, and should, continue to use technical language.  The Information Governance Alliance (IGA) are currently developing a glossary of technical terms, and we are working with them to ensure that our terms for the public map against the agreed technical definitions.

If we are going to change the way we talk about data, and explain technical concepts in a meaningful way while avoiding confusing jargon, we need to work together to use consistent words and images. We hope that, with your help, we can make that happen.