On Wednesday 18 April 2018, the RSS North Eastern local group hosted a meeting in the School of Mathematics, Statistics and Physics at Newcastle University. Dr James Sweeney, ICON Lecturer in Business Analytics at University College Dublin, gave an excellent talk, titled: 'House price modelling of the Dublin metropolitan area: what is the impact of an address?'
James opened his lively, engaging presentation with a thank you to the North Eastern local group, whose tweet advertising the meeting had led to a local company offering an expanded house price data set for analysis. During the talk, James described the limitations of current property price estimators used in Dublin, which are based on a small set of predictor variables, such as number of bedrooms and bathrooms, and which do not allow for any uncertainty in their price predictions. Moreover, they fail to provide any allowance for the premium or discount that arises due to the effect of 'postcode snobbery'. James presented an interpretable, statistical model to address each of these issues, fitting it to a house price data set, which contained a large number of predictor variables, from the Dublin metropolitan area.
After running through the findings of this initial analysis, James went on to show the benefit of incorporating spatial structure in the errors; by allowing valuable information to be shared across neighbouring regions, the effects of various predictor variables were estimated more precisely, clarifying interpretation of some aspects of the model. As an interesting aside, James demonstrated that subjectively assessed estimates from estate agents on the effects on house prices of additional bathrooms, bedrooms, and so on, were closely aligned with the effects his model predicted.
After the talk, there was some animated discussion, during which James took a number of questions from the audience. These were mostly concerned with the value of incorporating additional predictor variables, most notably interactions and a spatial trend. There was also some friendly debate around the relative merits of linear regression and tree-based methods.