The Social Statistics Section Meeting on The Data Linkage and Anonymization (DLA) Programme, Isaac Newton Institute: New Directions and Challenges, took place at the Royal Statistical Society in London on November 11, 2019.
The DLA programme was held at the Isaac Newton Institute, University of Cambridge from July 2016 to December 2016 and was supported by the EPSRC grant EP/K032208/1. The focus of the programme was to foster cross-disciplinary exchange between statisticians, mathematicians, computer scientists, social scientists and practitioners to stimulate methodological development on two emerging data challenges themes: anonymisation and data linkage.
Linking databases can considerably enrich data sources and is a necessary step under recent initiatives to replace some of the official statistics data sources with administrative data. In addition, protecting confidentiality and privacy is critical to data access and requires new approaches to data anonymisation under evolving definitions of disclosure risks. The potential for connections between the two themes was a distinctive aim of the DLA programme.
The aim of the Social Statistics Section meeting was to follow-up on recent developments and continuing challenges on the two themes and to present ongoing research that was initiated and stimulated by the DLA programme. The meeting was chaired by Nick Moon of Moonlight Research and included two speakers: Professor Rainer Schnell of the University of Duisburg-Essen and Professor Natalie Shlomo of the University of Manchester; with discussant Professor Chris Skinner of the London School of Economics and Political Science.
Prof Schnell spoke on linking population files with sensitive identifiers and introduced the notion of masking the identifiers using computer science approaches such as bloom filters, hashing and combinations of these approaches including probabilistic masking. Professor Schnell covered possible attack scenarios based on the different approaches and provided examples. He concluded that more research is needed to develop approaches that withstand possible attack scenarios and to develop these approaches in production settings.
Prof Shlomo spoke on emerging challenges for anonymising statistical outputs from the National Statistical Institute (NSI) perspective. With increasing demands for more accessible and open data, new ways of thinking about disclosure risks are necessary. She introduced the computer science definition of differential privacy and the probability mechanism for perturbing tabular counts and how this can be used as part of the statistical disclosure control (SDC) tool-kit at NSIs. In particular, an online flexible table builder where users can generate and download tables of interest from secure microdata are currently under consideration at the Office for National Statistics for their next census dissemination strategy. Introducing differential privacy to ensure the protection of data subjects from inferential disclosure where tables can be linked and manipulated to reproduce microdata is an important step in securing the online system.
The discussant, Professor Chris Skinner, raised relevant questions regarding the state-of-the-art in linkage and anonymisation and opened the floor for questions from the participants. Although small in numbers, the participants queried the need to introduce more perturbative methods for disclosure control in statistical outputs and how this impacts on future dissemination strategies.