Opening up admin data to research: an interview with Melanie Wright

Written by Oz Flanagan on . Posted in Features-OLD

In October of last year, the Economic and Social Research Council (ESRC) announced funding for a new Administrative Data Research Network (ADRN) across the UK. The investment was part of a much wider government effort to harness the power of big data held by the public sector.

Since then, a network has been carefully crafted that will help statisticians and researchers gain access to various linked administrative datasets that were previously restricted. This plethora of data has the potential hold a variety of profound insights locked away inside information compiled by public bodies over decades.

Melanie Wright, principal investigator of the new Administrative Data Service and associate director of the UK Data Service, talked us through how the network will operate and how the process of gaining access to this data will work for researchers.

The Administrative Data Research Network consists of four regional Administrative Data Research Centres (ADRCs) in each of the UK’s devolved regions:

  • ADRC England at the University of Southampton, directed by Peter Smith
  • ADRC Northern Ireland at Queens University Belfast and directed by Dermot O'Reilly
  • ADRC Scotland at the University of Edinburgh, directed by Chris Dibben
  • ADRC Wales at the Swansea University, directed by David Ford

Melanie explained to us why the ADRCs are spread out in this way. ‘Each of them is linked to their national statistics agency, as well as a number of other universities. They are set up that way for a number of different reasons. One is to provide centres around the country that can offer safe on-site access to the data. Another aspect is to develop relationships with devolved governments to get access to admin data on a regional level. In each nation there are also slightly different legislative frameworks and contexts which each centre will develop knowledge on.’

The Administrative Data Service (ADS) based at the University of Essex provides the coordinating umbrella to all the other centres and this where Melanie is currently based. She says, ‘the ADS runs the public face of the network and also the application procedure for proposals - all applicants to the network will come through the same front door so it’s a level playing field.’

She then took us through the process statisticians and researchers will go through when they look to access datasets. The first step will be for the researcher to begin working with the ADS to put together a proposal on their project. Once the proposal has been refined and assembled, it will then go to an approvals panel of independent experts, who will judge the proposal on a number of criteria.

As Melanie explains, ‘these criteria will include checking their proposed research has been through appropriate ethical review, that it has been reviewed for its privacy impact and it will also be judged on feasibility – that is, can it be done with a reasonable expenditure of resources. Then the last point they will be judged on is does this research have explicit potential public benefit?’

That last point is a vital one for both the ADRN and the public bodies that hold admin data. At the end of a project, researchers will be ‘required to write a two page plain English summary of what they did, how they did it and what they found.’

Melanie expanded on the political reason behind this requirement. ‘Part of what is driving this from the data owner’s side is the government transparency agenda and the impetus to make use of government assets. They need this feedback to justify the time they are spending servicing us with this data and we need it to reassure the public that quality research is going on with the potential for a positive impact on society.’

The kind of admin data that researchers can apply to analyse can range from datasets used previously in a similar context to others that have never before been outside their government department. As Melanie says, ‘We are quite keen to look at proposals for data linkages that have not been done before to establish new pathways. But from a researcher’s point of view, those projects may take a lot longer to get through.’

Staff at the ADS and the centres around the UK, depending on where the researcher is and what dataset they are looking to access, will work with applicants to develop their proposal in its initial stages. The assistance of the network will be crucial to the development of a proposal due to the complex legal and bureaucratic web that needs to be negotiated for data sharing like this.

Along with the project’s proposal being scrutinised, researchers themselves will also be need to be accredited for suitability. Melanie clarified what this will entail, ‘we are focussing on whether the researcher has the skills to do this and if they have a track record in using data. For researchers who may be at an early stage in their career, we will encourage them to work with a supervisor or mentor who can oversee and vouch for their suitability.' Researchers will also undergo training that will cover the legal context of their research and also the statistical disclosure control that they will be conducting later on in the project.

There are a multitude of hoops to jump through before any access can be granted to admin data, but Melanie pointed out the sensitive nature of what researchers will be working with. ‘The data we are talking about is unconsented data, in some contexts it may even be collected under duress (for example criminal records), so it was very strongly felt through all the work we have done in public engagement, that there has to be a demonstrable potential public benefit to the research being done.’

Once a research team and proposal has been approved, the network will negotiate with the different data originating departments for permission to access and link the data. When permission is granted and data linked, the analysis can then get underway. However, the data can only be accessed in an agreed location. ‘When we are developing the proposal, we will be working out which ADRC it sits best with.’ Melanie explained.

‘That will depend on what data they want to use but it will also depend on the data owners and where they are happy to, and legally able to, send the data for analysis. This is why we are also currently working with the Cabinet Office to try and develop some primary legislation to make this a kind of data linking easier.’

Melanie made a point of stressing how much care they will be taking with the personal information that is held within the data. To protect any personal data, the ADRN will be using the ‘pseudonymisation at source’ principles developed in Scotland.

She expanded on what this will mean in practice. ‘To link the data we will be using trusted third parties, so the data will be split into the identifying data and payload data. The identifying data will be sent with a randomly generated identifying number to the trusted third party. The third party will then match the identifying numbers to one another and create a look up table.’ The ADRC can then compile the linked data with the randomly generated number attached to the ‘payload’ data ready for analysis.

What this ultimately means is, ‘the third party only see the identifying data, so all they know is that that an individual entered the tax system or went to a hospital but nothing more. Meanwhile the researcher will see all the information about them, except their identity.’ The third parties will include organisations such as the Office for National Statistics and other regional statistical agencies, which have long track records of safeguarding personal data.

The researcher can then only use this data in a controlled environment, depending on what the data owners require. This could either be a room in the ADRC or potentially a secure room at their home institution with a remote secure connection. Finally, before researchers can take any analytical outputs from the data, they will have to be cleared for statistical disclosure control.

Melanie expanded on why the network is so keen to be transparent about its processes. ‘The public engagement side of the project is huge because we are being born right in the midst of and other data sharing controversies.’

‘The ESRC recently engaged in a public dialogue where they brought together different groups of people from all around the country and had some in-depth discussion around our plans. It was a fascinating exercise and there are a number of aspects that came out of that which we are implementing into our procedures.’

‘So for example we have lay members on our governing board and on our approvals panel. One of the strongest outcomes from the dialogue was that there should be no commercial gain from the use of admin data, so we are not allowing any commercial access - it will be purely for academic research. Every project that gets through our approvals panel will have details published on our website, so the public will always know who is using their data and for what purpose.’

The network is hoping to officially open its doors in late autumn and Melanie gave us an update as to where the ADRN currently stands. ‘We are currently in a beta testing period, where we are developing and refining all our policies and procedures. We are also taking research projects originating within the four administrative data research centres and using these as test cases to put them through the system and work out all the kinks to make sure we are ready to open the doors to researchers anywhere in the UK.’

Administrative Data Research Centre (ADRC) Data sharing

Join the RSS

Join the RSS

Become part of an organisation which works to advance statistics and support statisticians

Copyright 2019 Royal Statistical Society. All Rights Reserved.
12 Errol Street, London, EC1Y 8LX. UK registered charity in England and Wales. No.306096

Twitter Facebook YouTube RSS feed RSS feed RSS newsletter

We use cookies to understand how you use our site and to improve your experience. By continuing to use our site, you accept our use of cookies and Terms of Use.