Is the public’s date with data heading for disaster, or could it be a match made in heaven?
That was the question addressed in the first meeting since the launch of the Royal Statistical Society’s getstats campaign, jointly organised by the Society with the British Academy.
Professor David Hand
, RSS president and BA fellow, opened by highlighting how much data is gathered about us each day, describing this as a long ‘data shadow’. Three of the examples he gave were supermarket loyalty cards, the use of data by credit card company, Capital One, to run marketing ‘experiments’ to attract new customers, and recommender systems such as those used by Amazon which collect data and use it in individual specific ways.
Further into his presentation he discussed the potential for statistics to probe oddities in data, such as the distribution between regions of votes cast in the Russian presidential election, or the spread of the last two digits of votes cast among candidates in the UK general election. With the voting theme in mind he briefly discussed the potential for individual targeting of votes by political parties according to their very specific interests.
Hand turned at the end to the problems of data errors and of data falling into the wrong hands, highlighting recent losses by HMRC, the NHS and HSBC among a number of others. Hand concluded that data technology was neither good nor bad.
Following was David Spiegelhalter
, Winton professor of the public understanding of risk. He described ‘wading’ through the Human Fertilisation and Embryology Authority (HFEA) for data from which he produced a ‘funnel plot’ of observed/expected birth events. He noted how funnel plots provided a tool to easily identify which data might be statistically outlying, in a way that league tables, for example, cannot.
He then discussed the work of Professor Sheila Bird and Clive Fairweather in looking at military risks, explaining the development of the ‘micromort’ as a unit of risk of death. As an aside Spiegelhalter noted the 75 micromorts per day of simply staying in hospital exposed.
Finally he warned against ‘data dredging’, where data were over-interpreted. If you test a lot of things you’ll find something significant, he said. As an example he showed funnel plots of ‘excess winter deaths’ (the difference between deaths in the six winter months and the two three month periods either side). The variation, while appearing significant, fell within what would be expected when set out on funnel plots, he said. These were good examples of how there was a need for respected, high-profile statistical response to claims.
The final speaker was Simon Rogers
, editor of the Guardian
newspaper’s datablog and datastore. He explained how the Guardian
was accessing and analysing publicly available official data, and then reporting it using innovative visualisation in order to stimulate not only public interest but public debate on what the figures meant. As an example of the usefulness of their work, Rogers cited the number of requests from government departments for copies of the Guardian
’s graphical depiction of government spending.
It turns out people do want the figures behind the stories, Rogers said. He described what he saw as the ‘mutualisation’ of data – people working with their data and sharing it. This had been utilised during the analysis of UK MPs’ expenses data. The amount of data precluded a Guardian analysis, so the data had been turned over to its readers to work on.
Finally, he discussed the use of data made available by Wikileaks with illustrations of data on IEDs (Improvised Explosive Devices) plotted online over time before turning to the publishing of data published by Wikileaks from US embassy communications.
Hear the full presentations and the question and comment session afterwards (links to the British Academy web site):