What are Secure Data Environments?
A Secure Data Environment is a new method of storing and accessing health and care data ,and is sometimes known as a Trusted Research Environment, Data Safe Haven, Databank, Research Library, etc.
These are spaces where approved users can access health and care data that is not identifiable for use in research that can save and improve lives.
Getting access is subject to strict approval processes and only approved users with a legitimate need can access the data. Those who could access the data includes NHS providers or local authority planners, researchers (both public and private sector), and charities.
During our “What words to use when talking about Secure Data Environments” project, we talked to members of the public across the UK to co-design an animation that explains Secure Data Environments:
There isn’t yet a standard definition of what a Secure Data Environment is when it comes to health data. This is likely because they have developed in an organic way. The UK Statistics Authority has set out standards for data environments that use other data, and it’s possible a similar approach might be taken for health, particularly in England.
Why Secure Data Environments?
Patient data has been used for research and analysis for a long time. The way data has been shared between the organisation that is responsible for the data (e.g. a hospital), and the organisation wanting to use it (e.g. a university), has varied, but it has often involved creating a copy of the data and giving it to the user. The copy is then shared, which might be through a secure email and downloaded, put onto an encrypted device like a USB stick or CD, or the user needed to come into the hospital and access it from one of their computers. The rules around using the data were put into a contract, and the copy of the data would need to be destroyed when the work or contract ended. More information on how data is kept safe can be found here.
Secure Data Environments offer a different way of allowing approved users to access health and care data. Data is taken from the ‘source’ (where it originally comes from) and is all put into one place. The users then come to the data, rather than the data going to them. The environment is controlled, so its uses can be checked and audited – e.g., who did what, when, and was it in line with the agreement. Access can be strictly limited to the data that is needed, and then removed when the work is done. There still needs to be a contract or agreement, but the user’s organisation don’t need to receive the data.
These environments have become more popular in the last few years. This is likely because there have been more requests to use health data on a large scale, particularly during and after Covid-19, which needs to be done safely and securely.
Benefits and drawbacks of Secure Data Environments
Overall, Understanding Patient Data (UPD) thinks that Secure Data Environments are a positive step forwards in the safe use of health data, and they have many benefits. However, they aren’t a silver bullet, and there are some drawbacks.
-
Limit the number of places where sensitive data goes, reducing the risk of sharing the data with others or keeping it for longer than necessary or permissible
Better ability to check and audit what users have done to prevent and detect misuse
There are usually stronger cyber security controls in place compared to more traditional ways of sharing data
Access to data can be restricted to what is necessary for a particular project, for a specified amount of time
Allow for better transparency – e.g. more public information about what data has been used, by who, why and when. This is useful for members of the public who want to know what’s happening with their health data, as well as data users who want to know what other users are doing and whether there are opportunities to work together, learn from each other, etc
Datasets can be created and reused, rather than them getting created once for a particular project and then thrown away. This saves time and effort, and people can share their knowledge on how to improve them
-
Still don't completely mitigate privacy or data misuse risks, particularly when the environments are accessed remotely (e.g. from home) because there are fewer safeguards
Can be more expensive to run for the data-holding organisations, and more expensive to access for the users if these costs are passed to them, particularly for smaller or not-for-profit teams
They might not have all the data a user needs for their work, so some users might need to use multiple environments, move data into an SDE, or have data shared with them separately
The data might need to be analysed using new methods compared to what some users are used to, which might make their work more difficult or restrictive
Because many SDEs have developed without an agreed set of rules, they aren’t all built to the same standard. For users, this means they don’t all work in the same way, with the same tools. For members of the public, this can make it confusing to know how different SDEs work.
Examples of Secure Data Environments
It can be hard to find public-facing information on all these environments in one place. The closest thing might be Health Data Research UK’s gateway, which is mainly aimed at researchers.
During our “What words to use when talking about Secure Data Environments” project, our public participants wanted to know about examples of Secure Data Environments, so we’ve pulled together some information on some of the most well-known across the UK. This information was collected in June and July 2024, and is a mixture of content taken from websites and from talking to people from the organisations responsible for them.
The document below contains the detailed information that was collected at the time. This now has its own page, here, where it will be added to.
Secure Anonymised Information Linkage (SAIL) Databank
Organisation responsible: Swansea University
Short description of the data it holds: SAIL Databank contains anonymised (at the point of access to the researcher) health and administrative data for the population of Wales, some dating back roughly 30 years.
List of outputs: Here
List of approved projects: Here
Who funds it: the Welsh Government’s Health and Care Research Wales and UK Research and Innovation’s Economic and Social Research Council (ESRC)
Further information: SAIL Databank website
National Safe Haven (for Scotland)
Organisation responsible: Public Health Scotland
Short description of the data it holds: Approved data extracted from national health, social care and administrative datasets (in addition to researcher-provided cohorts – i.e. groups of people who have taken part in specific research studies). Data will include, for example, pseudonymised inpatient, outpatient, cancer, census and many other datasets. A full list is here
List of outputs: eDRIS (the electronic Data Research and Innovation Service) has little information on publications, and whilst researchers are encouraged to reference the National Safe Haven in their work, many don’t
List of approved projects: There are records of all projects approved by the Health and Social Care Public Benefit and Privacy Panel here, and the Statistics Public Benefit and Privacy Panel (if using administrative data) here, but this does not specify which projects use the National Safe Haven
Who funds it: Scottish Government (other funders unknown)
Further information: Public Health Scotland's website
Northern Ireland Honest Broker Service
Organisation responsible: Health and Social Care Business Services Organisation (BSO) Northern Ireland
Short description of the data it holds: De-identified health and social care data for the population of Northern Ireland. Data includes:
- Inpatient and outpatient (emergency department, maternity services, labs)
- The integrated health and social care system
- Dental and ophthalmic (eye) treatments
- Mortality (death)
List of outputs: Here
List of approved projects: Here
Who funds it: The Honest Broker Service is funded via the Northern Ireland Trusted Research Environment (NITRE), part of Health and Social Care Data Institute (HSCDI).
Further information: HSC BSO website
UK Biobank Research Analysis Platform
Organisation responsible: UK Biobank
Short description of the data it holds: UK Biobank contains the de-identified genetic, lifestyle, imaging and health information and biological samples of 500,000 UK participants. The participants were recruited between 2006 and 2010, aged between 40 and 69 years, from England, Wales and Scotland.
List of outputs: Researchers using UK Biobank data must put their findings in the public domain. By the end of 2023, there were over 10,000 publications based on UK Biobank data. Some publications are highlighted on the website, and all publications using UK Biobank data can be found by searching databases such as PubMed.
List of approved projects: Here
Who funds it: Wellcome
Further information:
- About the data: https://www.ukbiobank.ac.uk/enable-your-research/about-our-data
- The UK Biobank Research Analysis Platform: https://www.ukbiobank.ac.uk/enable-your-research/research-analysis-platform
- Basis of participation: https://www.ukbiobank.ac.uk/explore-your-participation/basis-of-your-participation
- About our funding: https://www.ukbiobank.ac.uk/learn-more-about-uk-biobank/about-us/our-funding
Genomics England National Genomics Research Library (NGRL)
Organisation responsible: Genomics England
Short description of the data it holds: All the data held is de-identified, consented data which participants have agreed to provide. It includes information such as:
- Data collected from genomic sequencing tests
- Diagnoses and medical test results from health records, such as blood test results
- Medical notes and relevant information from local or national disease registries
- Relevant imaging data from health records, such as MRI scans, X-rays
List of outputs: Here
List of approved projects: Here
Who funds it: Genomics England is funded by the Department of Health and Social Care
Further information:
OpenSAFELY (the NHS England OpenSAFELY Service in OPENSAFELY-TPP and OpenSAFELY-EMIS)
Organisation responsible: NHS England is the data controller for the platform and service; GP practices remain data controller for the electronic health records data; GP system providers (TPP and EMIS) are data processors; and the Bennett Institute at University of Oxford are also data processors and lead on platform design, software, and service delivery.
Short description of the data it holds: The tools sit across the primary care records data of nearly all current patients in England. Other data, linked to this data, includes: A&E, inpatients, outpatients, hospital administrative data, mortality (death), Covid tests and vaccinations, and others. See this page for more information.
List of outputs: Here (academic publications and pre-prints), here (reports), and here (code)
List of approved projects: Here
Who funds it: Wellcome, NHS England, National Institute for Health and care Research, Medical Research Council
Further information:
Clinical Practice Research Datalink Trusted Research Environment
Organisation responsible: The Clinical Practice Research Datalink (CPRD) is part of the Medicines and Healthcare products Regulatory Agency (MHRA), an executive agency of the Department of Health and Social Care (DHSC). DHSC is the data controller.
Short description of the data it holds: A representative sample of pseudonymised primary care data from contributing UK general practices with almost 30% population coverage. Linked healthcare datasets available including secondary care and mortality (death) data.
List of outputs: Here for all CPRD outputs (not specific to the TRE)
List of approved projects: Here (but not specific to the TRE)
Who funds it: Users of the platform, because CPRD operates on a full cost-recovery model via data access licence fees
Further information: CPRD's website
Our Future Health Trusted Research Environment
Organisation responsible: Our Future Health
Short description of the data it holds: There are four datasets available:
- Participant data (registrations, consent and demographic data)
- The baseline health questionnaire data (self-reported health information, and data about their household, socioeconomic status, work, education, and family history)
- Their genotype array data (genetic information from blood tests)
- Linked health records data (hospital admissions, outpatient appointments, accident and emergency attendances) and death data from the Office for National Statistics
Full information can be found here and here
List of outputs: These will be signposted from https://research.ourfuturehealth.org.uk when available.
List of approved projects: Here
Who funds it: UK Research and Innovation, life sciences companies and health research charities
Further information: The Our Future Health website
NHS (England) Research Secure Data Environment Network
The Network is made up of twelve SDEs, one national and eleven sub-national ones.
Organisation responsible:
NHS England is responsible for the “Data for Research and Development” programme, and funds all the SDEs. NHS England is entirely responsible for the NHS England SDE, whereas local / regional NHS organisations are responsible for the other SDEs.
The twelve SDEs are:
- England Secure Data Environment – NHS England
- West Midlands Secure Data Environment
- London Secure Data Environment
- Kent, Medway and Sussex Secure Data Environment
- North West Secure Data Environment
- Yorkshire & Humber Secure Data Environment
- North East and North Cumbria Secure Data Environment
- Thames Valley & Surrey Secure Data Environment
- East Midlands Secure Data Environment
- Wessex Secure Data Environment
- East of England Secure Data Environment
- Great Western Secure Data Environment
Short description of the data it holds: The National SDE holds data such as Covid-19 data, Hospital Episode Statistics, Maternity Services Data Set, Medicines Dispensed in Primary care, and others. A full list is here. A full list is here. The regional ones vary and contain a range of specialist and local datasets, depending on local agreements. Each SDE will publish information about the data it holds by the end of this financial year (2024-25). The data is:
- Pseudonymised, so that any identifying (or personal) patient information such as names, addresses, or NHS numbers are replaced with artificial, or ‘pseudo’, information
- Minimised, so that only relevant patient information is provided for research projects
List of outputs: Outputs from the National SDE are included in NHS England’s data uses register here, but this does not specify which data sharing requests have been met in the SDE specifically. The regional SDEs are in varying stages of development but all will have published data use registers by the end of this financial year, which will be accessible through their own websites or, for all registers, through the NHS England website. Examples of outputs include:
- Prediction ‘calculators’ to understand patients’ risk of and reduce hospital re-admission following treatment for heart failure in the East of England SDE
- Assessments of the effectiveness of Sotrovimab (a monoclonal antibody) when given with antiviral treatments for Covid-19 in a clinical trial in the London SDE
List of approved projects: Approved projects for the England SDE are here, but it does not specify if the data is being provided via the SDE yet.
Who funds it: The Network is largely funded through NHS England funding, but in some cases builds on pre-existing infrastructure and development. The programme was funded with £175m in total, which funds the SDE Network, the Digitrials Service, and other programme activity. More information is here.
Further information:
- Research powered by data - NHS England Digital
- East of England Sub-National Secure Data Environment (eoe-securedataenvironment.nhs.uk)
- Homepage - West Midlands Secure Data Environment (WMSDE) (westmidlandssde.nhs.uk)
- Developing the Great Western Secure Data Environment (SDE) - ARC West (nihr.ac.uk)
- Kent, Medway and Sussex Secure Data Environment for research - Sussex Health & Care (ics.nhs.uk)
- Funding awarded to London organisations to improve use of data for research – London Health Data Strategy
- Secure Data Environment | North East and North Cumbria NHS (northeastnorthcumbria.nhs.uk)
- https://thamesvalleyandsurreyhealthandcaredata.nhs.uk/using-patient-data/thames-valley-and-surrey-secure-data-environment/
Others
There are also Secure Data Environments that hold health data as well as other types of data. For example, the Secure Research Service and the Integrated Data Service at the Office for National Statistics. The Integrated Data Service is replacing the Secure Research Service.
The Secure Research Service was one of the first big secure data environments, but it is mainly for analysing administrative data; i.e., data the government holds, for Civil Servants and researchers to analyse. Data in here includes things like the census data, business data, education data, etc. However, it does contain some health data, particularly in relation to Covid-19. More information on how the Office for National Statistics uses health data can be found here. The rules around using health data are different because of the need to satisfy the Duty of Confidentiality. To find out more about the laws that govern the use of health data, click here.
Further information
During our “What words to use when talking about Secure Data Environments” project, our public participants wanted more information on the five specific areas below in relation to Secure Data Environments. We are working on adding this content to the page.
- Details of how approval to access data is granted
- How the data is kept secure and safe from hackers
- The five safes
- The types of data that are used
- Example case studies of how the data has been used
More information