While 51% of respondents agreed and said that 'ethics already plays a big part in my research', 28% called for an industry standard on ethics to be implemented - implying they were unaware of any such a standard and how it might affect their work. The remainder either aren’t routinely influenced by ethical considerations, or simply answered 'I don’t know'.
So do such standards for privacy and ethical use of data exist? Not long after I wrote about the survey results Revolution Analytics' blog, I was contacted by Dr Ann Cavoukian, Information and Privacy Commissioner of Ontario, Canada. Dr Cavoukian pointed out to me that privacy laws in many jurisdictions are governed by a set of Fair Information Practice Principles, which provide an ethical framework for the collection and use of personal information. These principles were first codified by the Organization for Economic Cooperation and Development (OECD) in the 1980s, and later informed laws including the EU Directive on Data Protection, The Canadian Standards Association’s Privacy Code, the Asia-Pacific Economic Cooperation (APEC) Privacy Framework, the US Safe Harbor Principles and the Global Privacy Standard. Dr Cavoukian later developed those principles further to become 'Privacy by Design', recognised as an international standard for privacy protection in 2010.
Nonetheless, the fact remains that such privacy and ethics standards are largely government and/or industry-led, and not driven by statistical organisations - a surprise, given our profession’s intimate connection with data. Search for the words 'privacy' or 'ethics' on the websites of the Royal Statistical Society or the Statistical Society of Canada, and you’ll find little of relevance. The American Statistical Association does provide Ethical Guidelines for Statistical Practice that address privacy issues as they relate to individual research subjects, but they haven’t been updated since 1999.
All of this raises the question: why don’t statisticians have more of voice when it comes to data privacy, especially given the implications of data sharing, data security, and the large-scale collection of data enabled by the technological revolutions of the last decade? Perhaps statisticians feel that data governance and ethics are the domain of the primary researcher. But statisticians should have a pivotal role to play in the process of understanding and implementing data privacy.
This is a pressing issue, given the growing movement to share more data in conjunction with research. Since March 1, authors who publish papers in the open-access journal PLOS have been required to 'make all data underlying the findings described in their manuscript fully available without restriction'. Refusal to share data is grounds for rejection, with exceptions granted only in rare cases (such as when providing the data would be unethical or illegal). In my opinion, this effort is a welcome one: I support a policy that promotes the principles of reproducible research, and this is an area where the R community has taken a leadership role. Following the principles of reproducible research doesn’t just facilitate progress in science: it can help detect fundamental errors in data analysis before they have major, even life-threatening consequences.
But even outside the cases where data sharing is clearly unethical, data privacy issues can easily manifest themselves when 'open data' is the norm. Famously, Netflix learned this lesson the hard way, after publishing anonymised DVD-rental data for a data-prediction contest. Even though user-identifiable information was anonymised, enough details were included that allowed individuals to be identified, leading to a lawsuit that was ultimately settled, and a planned second iteration of the contest was cancelled.
In 1990, 87% of Americans could be uniquely identified given only their gender, date of birth and the 5-digit ZIP code of their home. US residents can calculate how identifiable they are today based on these data, thanks to the Data Privacy Lab. Now, in this new age of open data, social sharing, and ubiquitous on-line data sources, it’s critical that the statistical community - academics, practitioners, consultants and software companies alike - take a leadership role in promoting the benefits of open data, while working to develop the processes and systems to make such sharing ethical and safe. Balancing the competing challenges of open data and privacy is no simple task, but it’s one that the statistical community is uniquely equipped to address.
The views expressed in the Opinion section of StatsLife are solely those of the original authors and other contributors. These views and opinions do not necessarily represent those of The Royal Statistical Society.