- feel comfortable working with incomplete and messy data
- be able to explore data and find inferences
- be able to manage manage datasets
- be able to communicate their findings in ways that enable strategic and operational decisions in enterprises.
More evidence of the booming demand for data scientists has come from both an industry study and sign-ups for seminars at Strata Conference, an event which majors on the data revolution.
In December, IT service company EMC published its Data Scientist Study 2011, which forecast a shortage of people with suitable skills for at least the next five years.
The study – which EMC said was the largest-ever global survey of the data science community spanning the US, UK, France, Germany, India and China – comes at a time when the volume of data being generated is growing exponentially.
It revealed “a rampant scarcity across the globe for the prerequisite skills necessary … to capitalize on the opportunities found at the intersection of Big Data and data analytics. Only one-third of companies are able to effectively use new data to assist their business decision-making, gain competitive advantage, drive productivity growth, yield innovation and reveal customer insights.”
A lack of training and resources are the two biggest barriers to data science adoption. But the growth in data science is likely to drive demand for more scientists as analysis throws up new possibilities.
According to EMC, 65 per cent of data science professionals believe demand for data science talent will outpace the supply over the next 5 years, and most feel that new college graduates rather than “today’s business professionals” will be the best source of new talent.
The study found that currently, data scientists are significantly more likely to have advanced degrees than business professionals and come from more varied backgrounds, including computer sciences, engineering and natural sciences rather than business studies.
This diversity is reflected in the mix of skills required by individuals and the teams in which they characterisitally work. In particular, those working with ‘big data’ need to :
In a report entitled ‘Career of the Future: Data Scientist’, the website mashable.com summarised EMC’s findings as an infographic.
Meanwhile, in the run up to one of the data community’s biggest annual events, the O’Reilly Strata Conference from 28 February – 1 March, Tim O’Reilly tweeted that the “Data Science track … is by far the most popular”. The conference, to be held in Santa Clara, California includes more than 30 sessions on all aspects of data science.
These include Hadoop and other big data technologies and techniques, the R language and environment for statistical computing, bootstrapping, data mining, machine learning, data journalism, visualisation tools and techniques, data cleansing, data linking, disambiguation, inference, microwork, crowdsourcing and one enticingly called ‘From Predictive Modelling to Optimization: The Next Frontier’.
Bookings are open for the Radical Statistics conference taking place on 24 February in London. This year it is hosted by the British Library with a challenging programme covering a range of statistical themes:
- measuring health – history and methods
- deception in medical research – scientific and regulatory failure
- deception in financial statistics – how this contributes to financial mayhem.
The conference gives an opportunity to learn how misleading statistics are used to bolster political preferences and how difficult issues can be demystified with clear statistics.
Anyone interested in research and statistics is welcome. The conference is neither technical nor limited to professional researchers. There are eight speakers and smaller group sessions, with lunch included.
Full details, including the programme, can be found on conference 2012 page of the Radical Statistics website, where bookings can be made.
Adding her voice to concerns about the credibility of official inflation figures, the economist Kate Barker CBE has urged the UK Statistics Authority to encourage the ONS to conduct its investigation into the Consumer Price Index (CPI) “as swiftly as possible, but without skimping on sound consultation or on statistical resource.”
In a letter dated 6 January 2012 Ms Barker, who served on the Bank of England’s Monetary Policy Committee (MPC) from 2001 to 2010, said “it might have been preferable for the shift towards the wider use of the CPI to have been delayed until the known weaknesses of this measure had been addressed, and the differential with RPI [Retail Price Index] fully understood. Early resolution and of, and honesty about, these issues seems vital to retain public confidence.”
Her views echo concerns expressed by the RSS, which wrote to the UK Statistics Authority in August 2010.
In December 2003 the CPI inflation rate replaced one based upon the RPI as the Bank of England’s monetary policy target. Subsequently, it has been applied as the statutory measure of inflation to revisions of public-sector pensions and state benefits.
Ms Barker’s letter highlighted some the differences in how the two measures are calculated, specifically the CPI’s exclusion of housing costs, inclusion of university accommodation fees and its use of the ‘formula effect’ to reflect consumers’ substitutions between goods and services as relative prices changed. “Overall, there seemed little reason … to object to the use of CPI as a reasonable definition of inflation for use as the target variable for monetary policy,” she commented.
But she went on to catalogue some of the CPI’s disadvantages in the public perception of inflation. “Firstly, the rapid rise in house prices led to a view that the CPI was a misleading target for monetary policy. This view is largely mistaken, but the present composition of the CPI makes it hard to refute. Secondly, it added to existing confusion in commentary about wage increases, due to the unfortunate habit in some parts of the media of referring to any wage increase above the present inflation rate as ‘inflationary’ (which it obviously may or may not be).”
Ms Barker also drew attention to “the widening of the formula effect from 2010, increasing the gap between RPI and CPI has added to concerns”. The formula effect keeps the RPI inflation rate – which uses an arithmetic mean at the lowest level of aggregation – consistently higher than the CPI rate, which uses a geometric mean. At the time of the move of the inflation target to CPI ,the RPI was about 0.75–0.8 per cent higher than CPI, but analysis from the Office for Budget Responsibility, published in November 2011, implies this gap could widen to 1.3-1.5 per cent.
“The question of whether the geometric mean is appropriate to use in the aggregation of prices turns on how far products within a particular grouping are substitutes for the consumer,” wrote Barker. She also highlighted the complexities of measuring housing costs, “as the rental and owner-occupied sectors of the housing market are not perfect substitutes.”
Furthermore, “the change to uprating pensions in some defined benefit schemes by CPI has changed the relationship in some actuarial calculations between projections of price and wage inflation. Alterations to the CPI may shift this relationship again, altering estimates of pension deficits or surpluses,” she observed.
Ms Barker acknowledged that “There is of course no perfect way to measure inflation, and … I certainly do not wish to imply that there should be a return to the RPI.”
But her concern was the possible implication that “the shift to the CPI as presently calculated means that the UK has gone from using an inflation rate which overstates the cost of living to one which understates it. It would therefore seem particularly urgent for the ONS to resolve the question of the choice of formula for all product groups. At present the ONS aims to have considered elasticity of substitution estimates by July 2012. It is very important that their conclusions can then be published for wider discussion and consultation before the CPI calculation is amended.”
Dates have been set for some of the most popular courses in the 2012 PDC course programme. Prices and booking details will be available later this month. For more details or to register your interest in the course, email the PDC
presented by Roland Calcutt
Dates: Tuesday 27 March and Wednesday 28 March
presented by Ed Swires-Hennessy
Dates: Thursday 19 April and Thursday 11 October
Introduction to modelling in R
presented by Paul Baxter
Dates: Wednesday 2 May and Thursday 3 May
Inspirational leadership within a statistical organisation
presented by Duncan Miles and Denis Greer
Dates: Wednesday 12 September and Thursday 13 September
Developing and testing survey questions
presented by Pamela Campanelli
Date: Thursday 15 November
presented by Harvey Goldstein and George Leckie
Dates: Thursday 6 December and Friday 7 December
Other courses are in development and will be publicised when dates are fixed.
More than 70 teams of 11-16 year old pupils from schools across the North of England have already signed up for a code-breaking competition that is part of the 2012 centennial celebration of the life and career of Alan Turing.
The Alan Turing Centenary Cryptography Competition started formally on 9 January and runs to 16 April; it is open to schoolchildren from anywhere in the UK in years 7-11.
Entrants must solve a series of cryptographic challenges based around the story of Mike and Ellie – two children who get caught up in the search for the missing ‘Turing Treasure’. There are six chapters to the story – a new one with associated puzzle starts every fortnight during the competition.
The competition is organised by the School of Mathematics at the University of Manchester, where Turing helped develop the earliest stored-program computers following his pioneering code-breaking work at Bletchley Park during WW2.
According to one of the organisers, Dr Charles Walkden: “You don’t need to be an expert mathematician or a computer programming whizz to take part, you just need to be good at problem solving and thinking logically. It is a great opportunity for children to solve mathematical puzzles in a fun way.”
The competition is sponsored by travel search site Skyscanner, whose founders Gareth Williams and Bonamy Grimes are graduates of the University of Manchester.
The prizes – Amazon vouchers for each team member – will be awarded at a ceremony on Tuesday 12 June 2012 at The University of Manchester, prior to the public lecture on Alan Turing by Dr Andrew Hodges. The Turing 100 international conference is being held at the same venue.