Privacy, data and statistics: time for an honest discussion

Written by Oz Flanagan on . Posted in Features

Over the next few months on StatsLife we will be exploring the uneasy relationship of privacy and data in our modern world. If 2013 was the year of big data, then 2014 has quickly followed up with the pertinent question of what these new realities mean for us as individuals. For statisticians, having vast amounts of data available is an exciting and challenging new development. But for the individual, it presents a philosophical debate as to what the availability of the data that tells our life story means, for better or worse.

The latest illustration of how these concerns have manifested themselves came along with the logical idea of linking up health data in the UK. When the concept of combining hospital records with GP records was first conceived, it could not have been imagined how much of a public perception challenge it would present. This shows how well intentioned data sharing for research and innovation has now inevitably (with hindsight) run into the quagmire of individual privacy concerns.

The source of this anxiety with governments using our private information can be traced back to Edward Snowden’s revelations on the activities of the security agencies. Although the opening up and sharing of public data is unrelated to the collection of metadata in digital communications, in the public mind, distrust has now fallen on the utilisation of all personal data. The unease about surveillance came about because the government structures that exist appeared with little or no public debate as to what the limits of privacy intrusion should be. The absence of this debate has led to suspicion now being cast on government efforts with all forms of data.

Pseudonymisation is the general safeguard that is offered when concerns about open data and personal information is raised. This process is able to remove identifying information from a dataset, while leaving in place the important information from which findings can be extracted. However, it is not infallible and it is still technically possible to link up an anonymised data set with others to identify an individual through a process of elimination.

In the event of this, it is argued that there are data protection laws in place to prosecute anyone who breaches privacy. The agenda of open data is relying on laws that pre-date its inception, but if these laws prove inadequate, it could make the techniques employed by the tabloid press in the phone hacking scandal look primitive in comparison.

And it’s not just governments that have fallen short on informing and listening to the general public. As Cullen Hoback, a documentary filmmaker has previously written in Significance, private companies have wrapped data collection in vast swathes of terms and conditions. Rather than informing us, these approaches lead to a damaging level of suspicion and reticence. Private technology and communication companies now hold as much sensitive information on us as any government does. The business model of social media companies relies on users giving away their personal data for free, in return for the use of a communication platform.

It should be remembered that most government agencies are well capable of preventing data privacy breaches, despite the odd misplaced laptop. Organisations with a professional statistical capacity that understand the importance of the data they hold have a good track record of protecting information before disclosing it in the public domain. A good example is the Office for National Statistics and their expertise in releasing non-disclosive census data. The know-how exists in the statistical world to ensure that data can be secured from illicit uses and this should be explored with the same amount of enthusiasm that exists for the economic and research benefits of open data.

Although scepticism remains amongst the public, the fact is that governments need to be the facilitator in opening up and linking data together. Similar to the Snowden revelations, what has upset many in the health data debacle is the lack of a proper public debate that fully explained and explored all the ramifications of the proposals.

What these issues demonstrate is that making the case for why statisticians need this data cannot be left up to politicians and the PR offices of research organisations. What we hope to explore is how statisticians can be more vocal about why they need this data and how it can be safeguarded. Taking delivery of the data and getting to work on it will no longer be enough, statisticians need to join the debate and make their case.

Edward Snowden Data privacy

Join the RSS

Join the RSS

Become part of an organisation which works to advance statistics and support statisticians

Copyright 2019 Royal Statistical Society. All Rights Reserved.
12 Errol Street, London, EC1Y 8LX. UK registered charity in England and Wales. No.306096

Twitter Facebook YouTube RSS feed RSS feed RSS newsletter

We use cookies to understand how you use our site and to improve your experience. By continuing to use our site, you accept our use of cookies and Terms of Use.