Recently the Open Data User Group published a position paper on the UKNII, as an attempt to engage wider interest and reinvigorate discussion. The UKNII was also covered in a recent all-star open data briefing at techUK in London.
There is one major barrier to this project that Cabinet Office seems to have shied away from in its initial draft of the UKNII. Of course there is still plenty of work to do in persuading government departments to unlock the data they hold. But some important datasets, of relevance to the daily life of the nation, are not held or maintained by government departments or public bodies - they are in the private sector.
I’m not talking here about commercially sensitive data that big businesses use to compete in the marketplace. Neither is this a broadside against the evils of privatisation - we are where we are. But the fact is that, in the UK at the moment, companies and business organisations have primary responsibility for maintaining key reference datasets on transport, utilities, banking, the food supply and in many other areas.
The open data community could make good use of some of those datasets, in apps and as a source of analytic insights. The arguments for release are not quite the same as for government data, because these datasets may not be directly funded from taxation. But selectively, as part of the UKNII conversation, I think we should be on the look out for particular datasets that are (a) authoritative and irreplacable in their domain, and (b) important to public transparency or the operation of services across that domain. In other words, datasets that we would normally expect government to make available if that area of the economy was state-run rather than market-led.
Last summer HM Treasury announced that banks and building societies would publish mortgage lending data based on postcode sectors. The outputs are not perfect - the banking industry hasn’t grasped all the nuances of open data release - but in principle this is a useful contribution to the body of national information.
Banking is to some extent a special case, because following the financial crisis of 2007-08 there is a broad consensus that lending practices need to be more transparent. But it is not difficult to identify other datasets held by the private sector that, arguably, should be more widely available for reuse.
Below are five examples of datasets that I personally would like to see released, either by the relevant business organisations on their own initiative or, even better, with coordination from government. These particular examples are datasets with a geographic theme.
1. Supermarket locations
All of the major supermarket chains maintain datasets with the names, locations and opening times of their stores, as the basis of “locator” or “finder” search functions on their websites. (E.g. Tesco, Morrisons, Lidl.)
There is a clear public interest in availability of a single combined source of data on these locations, so that people can more easily find their nearest sources for groceries (irrespective of brand). There is also considerable potential for analytic use of this data, as a basis for study of food supply chains, public health and accessibility of services.
At the moment bulk data on supermarket locations is included in Ordnance Survey’s Points of Interest product, but there is no open source. To see the difference this makes to the UKNII, have a look at DfT’s excellent accessibility statistics release: of the eight key services measured by DfT, food stores is the only one for which the underlying point data is not available to the public (“commercial data not available for release”).
Tony Hirst points out that an open list of supermarket addresses can be extracted from food hygiene ratings data published by the Food Standards Agency. Some of those records are a few years old but, in the absence of open data from the supermarket chains themselves, this is certainly a useful alternative.
2. ATM locations
In a similar vein there is no national open data available for ATMs, i.e. cash machine locations. At least one local authority (Sunderland) publishes a local list. However the national dataset is maintained by LINK, an industry-backed scheme. The public can search the dataset from the LINK website, but bulk data is available only on commercial terms.
As with supermarkets, access to free cash machines - particularly in rural and disadvantaged areas - is a matter suitable for public scrutiny. Policy on that issue should be supported by public availability of the data.
3. Post box locations
Release of the Postcode Address File (PAF) has long been a cause célèbre for the UK’s open data community - never more so than in the run-up to last year’s privatisation of Royal Mail. However PAF was not the only piece of information infrastructure lost to the public when Royal Mail moved into the private sector.
Royal Mail holds data on post box locations (and collection times) in two databases, the Central Collections Management Database and the Final Plate Database. Although this location data has never been explicitly released as open data, Royal Mail was compelled to provide access to it (in response to Freedom of Information requests) following an ICO decision in 2011. Third-party developers have made good use of the data: see here, here and here.
Post-privatisation, Royal Mail is no longer subject to FOI. Official updates to post box locations are no longer available as free bulk data.
In my view the government should fix that problem by prevailing on Royal Mail to release bulk data on post box locations, under an open licence, as a function of the company’s special status as designated provider of the Universal Service.
4. Water company boundaries
Water company boundaries are a glaring exception. There are a few high-level raster maps around - on the Water UK and Ofwat websites, for example - but the boundary data itself is not readily available. That makes it difficult to analyse water company services and activities in conjunction with other location data, or to build apps that, for example, enable consumers to identify their likely water company by entering a postcode.
Individual water companies maintain their operating boundaries as polygons called Water Resource Zones. The boundaries are provided to the Environment Agency as part of the companies’ Water Resource Management Plans. For internal use the EA collates those polygons into a combined layer called Water Company Boundaries. However due to 'complex rights issues', i.e. water company ownership of the underlying data, the EA cannot release that dataset for external reuse.
It is unlikely that this boundary data has any great commercial value to the individual water companies. In my view the government should negotiate with Water UK, as representative of the water companies, for release of these boundaries as open data.
5. Mobile phone base stations
Information rights aficionados will be familiar with the extremely protracted battle to secure release of Ofcom’s Sitefinder database under the Environmental Information Regulations. This battle raged up and down the court system, from the original information request in 2005 until a final ruling in 2012 that required Ofcom to release the data.
The Sitefinder dataset is now available in bulk as well as via Ofcom’s search interface. I have released an enhanced and tidied-up version of the bulk data via ShareGeo Open.
Unfortunately the EIR outcome was rather a Pyrrhic victory. Ofcom has not updated Sitefinder since May 2012 and some of the mobile network operators had withdrawn their participation well before then.
However there is still substantial public interest in the locations of mobile phone base stations. (That’s why they’re so often built into church towers and steeples, or otherwise disguised within the built environment.) In my view the government should require network operators to make base station locations public, as a regulatory requirement rather than on the previous voluntary basis.
This article first appeared on Owen Boswarva's personal blog.
The views expressed in the Opinion section of StatsLife are solely those of the original authors and other contributors. These views and opinions do not necessarily represent those of The Royal Statistical Society.