The source of this anxiety with governments using our private information can be traced back to Edward Snowden’s revelations on the activities of the security agencies. Although the opening up and sharing of public data is unrelated to the collection of metadata in digital communications, in the public mind, distrust has now fallen on the utilisation of all personal data. The unease about surveillance came about because the government structures that exist appeared with little or no public debate as to what the limits of privacy intrusion should be. The absence of this debate has led to suspicion now being cast on government efforts with all forms of data.
Pseudonymisation is the general safeguard that is offered when concerns about open data and personal information is raised. This process is able to remove identifying information from a dataset, while leaving in place the important information from which findings can be extracted. However, it is not infallible and it is still technically possible to link up an anonymised data set with others to identify an individual through a process of elimination.
In the event of this, it is argued that there are data protection laws in place to prosecute anyone who breaches privacy. The agenda of open data is relying on laws that pre-date its inception, but if these laws prove inadequate, it could make the techniques employed by the tabloid press in the phone hacking scandal look primitive in comparison.
And it’s not just governments that have fallen short on informing and listening to the general public. As Cullen Hoback, a documentary filmmaker has previously written in Significance, private companies have wrapped data collection in vast swathes of terms and conditions. Rather than informing us, these approaches lead to a damaging level of suspicion and reticence. Private technology and communication companies now hold as much sensitive information on us as any government does. The business model of social media companies relies on users giving away their personal data for free, in return for the use of a communication platform.
It should be remembered that most government agencies are well capable of preventing data privacy breaches, despite the odd misplaced laptop. Organisations with a professional statistical capacity that understand the importance of the data they hold have a good track record of protecting information before disclosing it in the public domain. A good example is the Office for National Statistics and their expertise in releasing non-disclosive census data. The know-how exists in the statistical world to ensure that data can be secured from illicit uses and this should be explored with the same amount of enthusiasm that exists for the economic and research benefits of open data.
Although scepticism remains amongst the public, the fact is that governments need to be the facilitator in opening up and linking data together. Similar to the Snowden revelations, what has upset many in the health data debacle is the lack of a proper public debate that fully explained and explored all the ramifications of the proposals.
What these issues demonstrate is that making the case for why statisticians need this data cannot be left up to politicians and the PR offices of research organisations. What we hope to explore is how statisticians can be more vocal about why they need this data and how it can be safeguarded. Taking delivery of the data and getting to work on it will no longer be enough, statisticians need to join the debate and make their case.