How do we formulate policy in order to utilize the swathes of ‘big data’ that are now being produced? And how do we use this data in order to formulate policy? These were the big questions being discussed by the three speakers at a session on 'Big data for policy' at the RSS 2016 Conference.
Current policy around innovation does not make much use of data, and this is something Stian Westlake, head of policy at Nesta – an innovation foundation - would like changed. In his talk, ‘Towards a strategic brain for industrial and innovation policy’ Stian described how once, the British government used to record a lot of data about industry, at macro and micro level. Since the days of the 1965 National Plan, however, we have gradually stopped gathering a lot of this micro-economic data.
There has been a recent move towards new industry, with the creation of flagship science projects such as the Graphene Innovation Engineering Centre. But data infrastructure has not kept pace on how technology is playing out and data often fails to meet policymakers’ needs.
Stian gave examples including the Cambridge Cluster Map, a big data resource on businesses around Cambridge which are key for the government to make investments. While government has developed a new interest in data innovation, the ‘strategic brain’ mentioned in Stian's talk title is required to take it forward.*
Peter van den Besselaar (pictured left) of RISIS (Research infrastructure for research and innovation policy studies) spoke about this EU data linking project that involves 12 countries. With governments making lots of data open by default, there is now a mass of data that could be linked to make it richer. The project’s Semantically Mapping Science (SMS) infrastructure uses data in different ways, such as integrating heterogeneous datasets to produce larger and richer data for social research. But there are challenges around this, including harmonizing different datasets, ie using a CSV rather than Excel or PDF format. Ensuring data quality is another priority, Peter said, as is safe data access.
Kaye Husbands Fealing (pictured right) of the Georgia Institute of Technology began her talk by describing the wide range of data that ‘big data’ can refer to –nanodata, microdata, unstructured data, real time data and business data such as those emerging from companies such as Amazon, the Internet of things and elsewhere.
This data is different to the data collected by government statistical services – the Bureau of Labor Statistics in the US claims to produce 'gold standard' data and data from other sources are complimentary. There are risks in depending on data from private companies which could shut down tomorrow. Other risks include privacy issues in using this data.
Kaye talked of how new ‘smart cities’ across the world are trying to use ‘Internet of things’ data to, for example, reduce traffic congestion. However, funding is required to research this properly and train people to use it and again, questions remain around privacy and using personal data privacy. Kaye asserted that we need new techniques and strengthened analytics to best utilise big data and assess its economic benefits.
A short discussion after the speakers covered the importance of training statisticians in order link data. Linking data will throw up errors so we need ways of working around this. There are also concerns around ethical and privacy controls. The panel agreed that we need to be very concerned about privacy and confidentiality.
* An earlier version of Stian’s talk can be viewed in more detail on Nesta’s blog.