Open data should also be about cutting red tape for research

Written by Jen Rogers

Open data has received a lot of attention of late. Data, often from government or public services, is now publicly released and made available to everyone, and they are free to use it in any way that they would like. The government have given three main reasons as to why they have made releasing data a priority: making the government more accountable, bringing better public services and feeding economic and social growth. In keeping with the open data trend, has the aim of linking coded records from general practice with data from other national data systems.

This programme will build on existing data services and expand them to cover all care settings, both in and outside of hospital, so that the quality and safety of services is consistent across the country and the identifications of different diseases and conditions that may require more NHS investment will be possible.

But what if the private sector were to get on board with the open data concept too? Making pharmaceutical clinical trial results publicly available is a campaign that is generating a lot of interest at the moment. But what if we were to take it even further, and campaign for the public release of clinical trial raw data. What sort of a difference would this make to my day-to-day work?

Quite simply: a huge one. When you hear the word ‘statistician’, you might think that ‘data’ naturally goes hand in hand, but it isn’t as straightforward as that. As a statistician, data is pretty essential to my job. It may be stating the obvious, but being a data analyst requires actually having data to analyse. As a statistical methodologist I place myself at the forefront of new statistical innovations and to evaluate the performance of any new statistical methods for the analysis of clinical trial data that I develop, it is essential to apply them to existing clinical trial datasets. And here is where I can run into some difficulties.

Let me tell you a little story, the theme of which you may be familiar with. In September 2012 I was in discussion with my collaborators about where we wanted our research to go next. I had been developing some new ideas around the analysis of recurrent events – recurrent hospitalisations in heart failure to be more precise – and we had already published reanalyses of data from two clinical trials to present our ideas. Now we entered into discussions about needing more data to analyse. We identified another trial which we wanted to analyse but there was just one problem: the data was in Glasgow and wasn’t allowed to leave.

The months that followed consisted of a lot of to-ing and fro-ing on e-mail so that I could write the code needed to carry out the analyses. Next thing, I found myself on a plane to Glasgow so that I could get my hands on the data that wasn’t allowed to come to London. What struck me as verging on the ridiculous though is that once I got to Glasgow, I was allowed to do whatever I liked with the dataset and was free to take any results away with me that I wanted. I just couldn’t take the data.

What followed on my return were the inevitable extra little bits of analysis that needed to be done when preparing the manuscript for publication. This meant me having to repeatedly send updated R scripts to a collaborator who had never used R before and then try to work out what the error codes that he was receiving meant. Needless to say, there are far easier ways to carry out research and a lot of time (and air miles) could have been saved from me being allowed access to the data in London.

This is of course an extreme example of where the privatisation of data has bordered on the ridiculous, chosen to prove a point. There are many occasions where I have been given the full datasets to analyse, but not without jumping through a few hoops and going through the relevant people first. The message that I want you to take home is that all the administration, red tape and bureaucracy are frustrating and makes for a very painful process. As a statistician I only have one simple request - can I have some data please?

I just want to end by throwing an alternative argument out there. I was discussing the idea of pharmaceutical data becoming open with a colleague and he, surprisingly, wasn’t on board with it. When I probed further, he said he appreciates that as a methodologist I might be very excited by the prospect of being able to access all the data that I want, but he has worries that data in the hands of the wrong person could be a dangerous thing and I can see his point. So I guess we need to ask the question going forward: if we were to move away from regulated datasets, do we need a move to regulated analyses?


