Did big data kill the statistician?

Written by Martyn Jones on . Posted in Features

Hold this thought: ‘There are big lies, damn big lies and big data science’. Statistics is a science, some argue that it is the oldest of sciences. It can be traced back in history to the days of Augustus Caesar, and before. In 1998, Lynn Billard wrote a paper that laid out the role of the statistician and statistics. She said, 'no science began until man mastered the concepts and arts of counting, measuring, and weighting'.

I first became aware of the role of the statistician when I was studying a combination of philosophy, politics and economics. Later, my first two managers were also enthusiastic and pedagogic members of the Royal Statistical Society (RSS), whose aim is 'advancing the science and application of statistics, and promoting use and awareness for public benefit'.

The RSS do a good job of raising awareness about statistics and statisticians, but maybe they aren’t getting enough people’s attention. After all, many people seem to think that statistical methods and quantitative analysis were born somewhere around 2001. Which, and sorry for raining on anyone’s parade, is not in fact the case.

To me a statistician is like a true artist - let me explain what I mean by that. Picasso was perhaps the greatest painter of the 20th century. He is down on record as saying that 'It took me four years to paint like Raphael, but a lifetime to paint like a child'. But that’s not the same as a child painting, with little or no technique, skill or experience.

Picasso projected the visions of a child, through the hands of a genius. Picasso could paint like Raphael, but also as 'a child'. He could paint like anyone. Many would argue that he was a true artist. Which isn’t the same as splodging some abstract and random colourful shapes on canvas. That doesn’t automatically make someone an artist. Not in any modern formal sense. Although, that said, in the age of postmodern nonsense, anything can be anything. This still does not make it a fact.

Those who watched the American television medical drama House might also make this connection. In the series, Hugh Laurie (pictured with his other cast members) played the part of Dr Gregory House. In entertainment terms, Laurie convinced viewers that he was a credible physician. The only thing is, he wasn’t a physician. He was an actor pretending to be a physician, and he did a great job. He learned his lines well, and he knew how to interpret them to perfection. But as an actor, not as a doctor.

So why do we think big data is more than just a new name for a collection of old ideas, and why do we think that data science is forward looking and statistics is just dealing with the past? Why do we lend more credibility to rebranding than to historical fact?

More to the point why do people clamour to self-define themselves as data scientists rather than as the more recognisable, measurable and manageable role of a statistician? A modern statistician who can both interpret the past and try to correctly forecast the future?

I am well aware that there has been a proclivity to hire enthusiastic amateurs or certificate harvesters in place of trained, experienced and qualified professionals - especially if ‘the price is right’. But it is a proclivity firmly planted in the absurd, incoherent and irrational. As absurd as the dialectic notion that two-a-halfpenny qualifications are more important than knowledge and experience.

So, call me old fashioned, but when I need a haircut I will go to a barber, and not to a hair artiste or a mop-follicle scientist. When I need a person who really knows how to do a wide range of statistics, I will hire a professional and experienced statistician.

A good statistician will understand that 'not everything that counts can be counted, and not everything that can be counted counts'. A quote which is variously attributed to either Albert Einstein or William Bruce Cameron. So, getting down to fundamentals. Why would a statistician prefer to call themselves a data scientist, and why are some data scientists oblivious to or misinformed about the nature of contemporary statistics?

I think the biggest problem is in the way that the IT industry relentlessly flogs new fads. It’s new lamps for old, but no matter how much obfuscation and marketing is churned into the mixture, it’s still a massive dose of flimflam and hyperbole.

The other ‘big’ problem is in how so many people are willing to jump on the flimflam trend wagon in order to wing their way into a ‘data scientist’ niche. Or rebrand themselves as data scientists as a reaction to the IT industry’s crude ‘downgrading’ of the role of statistician - quite often backed by a long concatenation of meaningless clichés, logical fallacies, inaccuracies and blatant misrepresentation. Using the past to predict or shape the future is nothing new, so why do people pretend that it is new?

Finally, I think it’s clear where this is leading. My prediction for 2015 is that big data will not kill the statistician? My prediction for 2025 is that the ‘data scientists’ of the day will be criticising the next big data-like fad and especially its evangelists. Hopefully they will be able to make it clear that this is about something with a very long and rich history.

That said, I think the predicament and the ‘challenge’ we face with much of the industry hype and the unquestioning zeal of many big data and data science ‘evangelists’ can be summed up by two absolutely fabulous quotes from Ben Goldacre in his book Bad Science. 'These corporations run our culture, and they riddle it with bullshit', and 'You cannot reason people out of a position that they did not reason themselves into'.

 

This article first appeared on Martyn Jones' Good Strat Blog.

The views expressed in the Opinion section of StatsLife are solely those of the original authors and other contributors. These views and opinions do not necessarily represent those of The Royal Statistical Society.

Big Data

Join the RSS

Join the RSS

Become part of an organisation which works to advance statistics and support statisticians

Copyright 2019 Royal Statistical Society. All Rights Reserved.
12 Errol Street, London, EC1Y 8LX. UK registered charity in England and Wales. No.306096

Twitter Facebook YouTube RSS feed RSS feed RSS newsletter

We use cookies to understand how you use our site and to improve your experience. By continuing to use our site, you accept our use of cookies and Terms of Use.