This session was chaired by executive director of the RSS, Hetan Shah, and included a panel of Peter Diggle, Duncan Ross and Sylvia Richardson. The panel and audience discussed whether data science was a threat or a friend of statistics.
Syliva Richardson (from the MRC Biostatistics unit in Cambridge) began by asking if ‘big data’ is all that different from regular ‘data’? She highlighted the volume, velocity, variety and veracity involved in big data that Diego Kuonen has talked of previously. She also asked if data science is just a rebrand of data mining.
She then looked at how data science is mostly interested in secondary observational data, whereas statistical science looks at primary data (from trials or surveys). She also went over the pitfalls that Tim Harford discussed in his session on the same stage previously.
She finished by highlighting what David Hand said last year on how vital inference skills are to this area and that statisticians need to engage with computer science going forward.
Duncan Ross (from Teradata and the Society of Data Miners) then began his talk by contrasting this conference and the Strata conference he attended in Silicon Valley. He noted how at the RSS conference, the book stalls contained many books on R software, as oppose to the Strata stalls where Python textbooks were far more common.
In the difference between data science and statistics, he commented that predictions in data science are not required to stand up to the rigorous statistical standards expected by statisticians. Data science was more concerned with getting things done, whereas statistics was more concerned with the understanding of the data and subject.
Peter Diggle (from Lancaster University and President of the RSS) began his presentation by pointing out how back in the 1970’s, new statistical computing software was seen as a threat similar to data science today. These developments meant anyone could possibly become an amateur statistician. Also, the world of politics and commerce embracing data science had to be a good thing in making evidence based decisions.
He acknowledged how data science, with its maths plus statistics and computing will change the syllabus on third level statistics courses. This is because the design of studies and coding skills will need to have a more prominent place in the future. He finished by stating that the generic applied statistician was now an endangered species in his opinion.
The chair then opened the discussion up for questions. The first question asked if there was a conspiracy in the IT world to take over statistics. Duncan explained that this perception was down to IT departments having larger budgets. Also, it was difficult for companies to find the ‘unicorn’ mix of expert coding skills along with expert statistical knowledge. Instead, teams consisting of both skills needed to be assembled.
Robert Grant in the audience asked if silos within statistical academic departments were a problem in this regard. Sylvia commented that statistical departments would need to be more collaborative and outward looking in the future.
Terry Speed, also in the audience, cited the McKinsey report that stated how the US economy ‘faces a shortage of 140,000 to 190,000 people with deep analytical skills’. Duncan responded by admitting that there is a definite large demand in places like New York and California in his experience. These positions are being filled by individuals from many different scientific backgrounds and not from the pure statistical world.
Kevin McConway asked how these new modules of coding and other skills would be incorporated into the already tight syllabus of statistics degrees. Peter commented that the British three year degree was at a disadvantage here when compared to the four or five year courses that the US and Europe enjoy.
Final questions centred on statistics reputation for highlighting uncertainty in results and what exactly was there to fear from data science. Peter commented that communicating uncertainty was more about honesty than being vague. Sylvia also said her biggest fear about data science was about dangerous conclusions being reached, that neglect the statistical thinking needed.