Online price checking, the ONS way

Written by Brian Tarran on . Posted in Conference Blog

If a shopper wants to know how much something will cost, they head online to check. But for the Office for National Statistics (ONS), keeping tabs on the price of goods is a lot more complicated. Professional price collectors submit data once a month, on thousands of products bought from the same shops – month in, month out. That data makes up the Consumer Price Index (CPI), the UK’s headline measure of inflation. But that’s not to say the ONS isn’t experimenting with online price checking.

At RSS Conference this afternoon, Matthew Mayhew explained how the ONS is using web scraping technologies to collect price information from the websites of three of the UK’s major retailers with a view to considering whether this sort of information could, one day, supplement – or even supercede – existing measures. When scraping the websites of Tesco, Sainsbury’s and Waitrose, the ONS looks for prices of items that match those in the current ‘basket’ of goods that is used to calculate the CPI.

Of course, this isn’t straightforward: out of reams of web code, only a few dozen characters will be relevant. And then there’s the problem of product choice: statisticians might want to know the price of an apple – and only an apple – but the scraper might return information on apple juice, or pre-prepared packs of sliced apple. Decimal points can also, sometimes, go astray, resulting in loaves of bread costing £100.

But despite the problems, and the limitations – such as the fact that the three currently-tracked retailers only account for 50% of total market share, and they do not include discount chains like Aldi and Lidl – the potential of online price checking is too good to ignore. Whether or not it becomes the dominant means of calculating CPI (and, by extension, inflation), web scraping would allow ONS to track prices on a daily basis. And thanks to some data science know-how and some machine learning expertise, the accuracy and efficiency of the data collection process should continue to improve.

  • Matthew Mayhew’s talk, ‘Using machine learning techniques to clean web-scraped price data via cluster analysis’, was part of session 3.4 Contributed – Data Science: Applications to Online Data


RSS Conference

Join the RSS

Join the RSS

Become part of an organisation which works to advance statistics and support statisticians

Copyright 2019 Royal Statistical Society. All Rights Reserved.
12 Errol Street, London, EC1Y 8LX. UK registered charity in England and Wales. No.306096

Twitter Facebook YouTube RSS feed RSS feed RSS newsletter

We use cookies to understand how you use our site and to improve your experience. By continuing to use our site, you accept our use of cookies and Terms of Use.