Last week, the Prime Minister's chief strategic adviser - Dominic Cummings - wrote a blog which attracted a huge amount of media attention. He called for a radical new approach to civil service recruitment - suggesting that data scientists (among others) should play increasingly important roles.
But while they were top of Cummings' list, it was his call, later on, for more 'weirdos' in Whitehall which really caught the media's imagination.
Here, our Data Science Section responds by outlining some do's and don'ts when building a data science team.
For anyone kicking off the year with a new data science initiative, we applaud you! Embedding data and technology into decision making processes can be a wonderful thing. To help you along your way, here are a few do’s and don’ts that have been borne out of experience.
Don't... Assume R&D is easy
Do... Appoint a technical leader
If you’ve been tasked with managing this initiative, but you’re not an experienced data scientist, then you need someone who is. You need a team leader who lives and breathes selection bias, measurement bias, and knows when a result is meaningless. Without this experience in your team you will at best waste time and resources, and at worst create dangerously unsound technology.
Don't... Just hire weirdos and misfits
Do... Carefully craft your team
The notion that data scientists are geniuses who can solve all your problems, armed only with a computer and some data, is flattering - but ridiculous. Data scientists come in many flavours, with different interests and experience, and the problems worth solving require a team effort - with the best ideas coming from diverse teams who can communicate well.
Don't... Trust textbook knowledge alone
Do... Hire for experience too
There is data science knowledge you can glean from a textbook, and then there is the hard-earned stuff you learn from years of building models and algorithms with real data, implemented in the real world. Nothing makes you understand overfitting and the limits of theoretical models like living through that cycle a few (hundred) times.
Don't... Ignore ethical issues
Do... Take an ethics-first approach
Get ahead of any ethical and legal issues with your work, or the data you are using. Don’t assume it’s OK to do something just because you heard a Silicon Valley start-up does it like that.
Don't... Obsess on the latest academic papers
Do... Identify questions
Normal rules of business apply to data science; you want a return for your investment. Start by identifying the intersection of high-value business problems and the information contained in the data. You could ‘dart about’, trying out ideas from cool papers you’ve read, to see if anything useful comes out. But such unstructured work is akin to randomly digging for treasure on a beach. Get yourself a metal detector—identify business problems first.
Don't... show off
Do... Keep it simple, stupid
Unless you have been specifically asked to build something superficially clever and incomprehensible (and this is a genuine objective for some), then you should use interpretable models first. Often this will be good enough. Only introduce complexity if you need to, and use a simple model as a baseline against which you can measure improvements.
Don't... Propagate hype
Do... Manage expectations
So, you’ve been thrown some resources to set up a data science team and you’re embedded in an organisation that doesn’t necessarily understand what data science is. With such power comes responsibility! Avoid hype. Manage expectations. Help your peers and leaders understand what you are doing, and make sure they have input to it. This is a joint effort and they bring important domain knowledge. Agree on goals, and be transparent about progress.
Don't... Command and control
Do... Create a scientific culture
Do your team feel they can challenge the scientific views of the leadership—or are they scared of being ‘binned’ if they step out of line? Your team is on a mission to solve a problem, and it is unlikely the path will be an easy one. Your data scientists will spend most of their time stuck, navigating a sea of unknowns, while in pursuit of answers. Scientists need to be able to talk freely about what they do and don’t know, and to share ideas with each other without any sense of one-upmanship.
The Data Science Section of the Royal Statistical Society is run by data science leaders from large organisations such as the BBC, Economist, AstraZeneca, Unilever, and Oracle as well as several AI start-ups. To find out more about the section, please sign up to our email list and follow our activities on LinkedIn.