Why aren’t more statisticians involved in data privacy?

Written by David Smith on . Posted in Opinion

In a poll conducted among statisticians in August 2013 by Revolution Analytics, 88% of respondents answered 'yes' to the question, 'Should consumers worry about privacy issues related to the data that is being collected on them?' Given recent media stories about discomfiting uses of consumer data and predictive models - not to mention the recent rash of data breaches - it’s not exactly surprising that such levels of concern for consumers were reflected in the survey.

Now, our poll wasn’t a scientific, randomized survey: the 865 respondents were self-selected from the attendees of the 2013 ASA Joint Statistics Meetings held in Montreal. (Users of the Wi-Fi service provided at the conference were invited to take the survey.) But given that the participants in the survey were predominantly statisticians, the responses to the question, 'Should there be an ethical framework in place for collecting and using data' were rather surprising (see the results in the graph below).

Survey of 865 respondents at the 2013 Joint Statistical Meetings Click to enlarge table 1lightboxSurvey of 865 respondents at the 2013 Joint Statistical Meetings

While 51% of respondents agreed and said that 'ethics already plays a big part in my research', 28% called for an industry standard on ethics to be implemented - implying they were unaware of any such a standard and how it might affect their work. The remainder either aren’t routinely influenced by ethical considerations, or simply answered 'I don’t know'.

So do such standards for privacy and ethical use of data exist? Not long after I wrote about the survey results Revolution Analytics' blog, I was contacted by Dr Ann Cavoukian, Information and Privacy Commissioner of Ontario, Canada. Dr Cavoukian pointed out to me that privacy laws in many jurisdictions are governed by a set of Fair Information Practice Principles, which provide an ethical framework for the collection and use of personal information. These principles were first codified by the Organization for Economic Cooperation and Development (OECD) in the 1980s, and later informed laws including the EU Directive on Data Protection, The Canadian Standards Association’s Privacy Code, the Asia-Pacific Economic Cooperation (APEC) Privacy Framework, the US Safe Harbor Principles and the Global Privacy Standard. Dr Cavoukian later developed those principles further to become 'Privacy by Design', recognised as an international standard for privacy protection in 2010.

Nonetheless, the fact remains that such privacy and ethics standards are largely government and/or industry-led, and not driven by statistical organisations - a surprise, given our profession’s intimate connection with data. Search for the words 'privacy' or 'ethics' on the websites of the Royal Statistical Society or the Statistical Society of Canada, and you’ll find little of relevance. The American Statistical Association does provide Ethical Guidelines for Statistical Practice that address privacy issues as they relate to individual research subjects, but they haven’t been updated since 1999.

All of this raises the question: why don’t statisticians have more of voice when it comes to data privacy, especially given the implications of data sharing, data security, and the large-scale collection of data enabled by the technological revolutions of the last decade? Perhaps statisticians feel that data governance and ethics are the domain of the primary researcher. But statisticians should have a pivotal role to play in the process of understanding and implementing data privacy.

This is a pressing issue, given the growing movement to share more data in conjunction with research. Since March 1, authors who publish papers in the open-access journal PLOS have been required to 'make all data underlying the findings described in their manuscript fully available without restriction'. Refusal to share data is grounds for rejection, with exceptions granted only in rare cases (such as when providing the data would be unethical or illegal). In my opinion, this effort is a welcome one: I support a policy that promotes the principles of reproducible research, and this is an area where the R community has taken a leadership role. Following the principles of reproducible research doesn’t just facilitate progress in science: it can help detect fundamental errors in data analysis before they have major, even life-threatening consequences.

But even outside the cases where data sharing is clearly unethical, data privacy issues can easily manifest themselves when 'open data' is the norm. Famously, Netflix learned this lesson the hard way, after publishing anonymised DVD-rental data for a data-prediction contest. Even though user-identifiable information was anonymised, enough details were included that allowed individuals to be identified, leading to a lawsuit that was ultimately settled, and a planned second iteration of the contest was cancelled.

In 1990, 87% of Americans could be uniquely identified given only their gender, date of birth and the 5-digit ZIP code of their home. US residents can calculate how identifiable they are today based on these data, thanks to the Data Privacy Lab. Now, in this new age of open data, social sharing, and ubiquitous on-line data sources, it’s critical that the statistical community - academics, practitioners, consultants and software companies alike - take a leadership role in promoting the benefits of open data, while working to develop the processes and systems to make such sharing ethical and safe. Balancing the competing challenges of open data and privacy is no simple task, but it’s one that the statistical community is uniquely equipped to address.


The views expressed in the Opinion section of StatsLife are solely those of the original authors and other contributors. These views and opinions do not necessarily represent those of The Royal Statistical Society.

Data privacy

Add comment

People in this conversation

  • Guest - John Bacon-Shone

    I believe part of the issue is that the US, unlike almost all developed countries, does not have a general data protection law that covers the private and government sectors. I am not sure that your title relates well to the piece - I believe that many statisticians ARE involved - I chair my universities human research ethics committee and was a member and later the chair of the Hong Kong law reform committee on privacy that recommended our personal data privacy legislation. John

    0 Like Short URL:

Join the RSS

Join the RSS

Become part of an organisation which works to advance statistics and support statisticians


Twitter Facebook YouTube RSS feed

Copyright 2014 Royal Statistical Society. All Rights Reserved.
12 Errol Street, London, EC1Y 8LX. UK registered charity in England and Wales. No.306096