During RSS Members' Week, the Official Statistics and Data Science Sections hosted a re-run of 'Data Science for Official Statistics: the story so far' from the 2018 RSS Conference, organised by Owen Abbott from the Office for National Statistics (ONS). The session was held in the fabulous St Martin-in-the-field church on Thursday 15th October at 6pm.
Tom Smith, managing director of the ONS Data Science Campus (DSC) chaired the meeting. He opened to a full house by emphasising that this session was about rising to the challenge of using data more effectively. Official Statistics affect us all, and so we must ensure they are robust, transparent and fit for purpose. The speakers would show the ONS response to these challenges by exploring new techniques and data.
The first speaker was Karen Gask, a data scientist within the ONS Big Data team. Karen outlined projects which explored the application of data science to official statistics. First up was the address index – an open-source address matching service. This was designed to enable fast referencing of any source with an address. Her second example was exploring whether online job portals could replace or enrich statistics on job vacancies currently provided by a survey. The conclusion was that due to data quality issues, nowcasting was the most likely outcome to improve timeliness. A spin-off from this was the development of an ONS web-scraping policy. Karen then gave examples of Natural Language Processing (NLP), including automatically coding text data in the Crime Survey for England and Wales and in property description data from Zoopla. In summary, Karen described the journey as an evolution rather than a revolution.
Jasmine Latham, a DSC senior data scientist then gave the audience a flavour of campus research. The campus works within ONS and externally on data science projects for the public good. Within ONS, projects range from using machine learning with image data to measure street level vegetation to unsupervised clustering to improve the search function on the ONS website. Externally, the campus and the Patent Office have analysed emerging technology trends through NLP of global patent applications. The Welsh Government and the campus have explored an application which maps access to services via public transport, allowing analysis of the population which can get to a sports stadium within 90 minutes.
Discussant Professor Suzy Moat from the Warwick Business School Data Science Lab talked about the commonalities between the data science research in academia and official statistics. Both are trying to use new sources of data to measure human behaviour where we have not been able to do before. She congratulated ONS on its inspiring work, noting the care and passion demonstrated.
Tom then invited questions and discussion. These included questions about the risk of manipulated data streams, what questions are being asked and why, what to do when different sources are saying different things and how to potentially combine sources, error frameworks, prioritisation of projects and whether data science was a modern form of archaeology. A lively meeting ended with drinks and canapes.