On 17 October 2018, during the inaugural Members Week of the Royal Statistical Society (RSS), the RSS Merseyside Local Group hosted an event on Machine Learning. There were around 42 individuals in attendance, just over a third of whom were RSS members. Attendees were from range of institutes including Lancaster University, Liverpool Cancer Trials Unit, HMRC, the Health and Safety Executive, the Government Statistical Service, and a range of departments across the University of Liverpool (including Biostatistics, Mathematics, Chemistry, Eye and Vision and Pharmacology). The event attracted staff, and both post graduate and undergraduate students from the University of Liverpool.
The event started with a presentation from Professor Chris Williams of the University of Edinburgh. He began by defining machine learning as the problem of building computer systems that adapt and learn from experience. The roots of machine learning stem from artificial intelligence, neural networks, statistics, statistical pattern recognition and adaptive control theory. Examples of applications of machine learning include categorising documents using the 'bag of words' approach, handwriting recognition for cheques and handwritten envelopes, and robot inverse dynamics (which is calculating torques to help robots move correctly). Other areas defined and discussed included clustering (the discovery of new classes of infrastructures), principal component analysis (such as animating faces), unsupervised learning, and quick medical reference graphical models (which provide inference for diseases given symptoms observed in GP surgeries).
Chris went on to discuss the overlap between statistics and machine learning. He defined statistics as focusing on data in terms of models and their interpretability, and testing of hypotheses, whilst machine learning focuses on predictions and analysis of learning algorithms. The talk then concluded with an in depth description on a project involving recognising a 3D scene such as coffee cups standing on a table from a 2D image. Variables including illumination, location of the observer, and types of objects involved were discussed. The deep learning methods and neural network architectures used to analyse such problems were discussed.
The second presentation was given by Juhi Gupta, a PhD student at the University of Liverpool. The talk involved applying machine learning techniques to preterm birth data. The main aim of the project was to identify biomarkers to predict preterm birth. A random forest was used to analyse transcriptomics data, specifically gene expression levels at 16 and 20 weeks gestation. The data suggested that levels of selenium might be linked to pre-term birth. This was reinforced by performing hierarchical clustering of patient expression profiles derived from the selenium pathway, which produced clear clustering of preterm versus term birth groups. Future planned work is to complete omics data collections for a validation cohort, a planned prediction modelling of integrated multi-omics data, and a pathway analysis of integrated data.
The final speaker was Professor Simon Maskell who gave an overview of the Big Hypothesis project at Liverpool. The presentation began with a definition of Bayesian inference as a tool that allows you to make inferences and support decision making. Applications include detecting flu strains by monitoring GPs, port security, satellites crashing into debris, drug safety, and particle physics. Overall, models are needed that relate existing knowledge to the data, but often non-standard integrals are required – and a tool is needed to calculate this. Traditional Monte Carlo Markov Chain (MCMC) uses single computer sampling, and so in general the method cannot be parallelised and so made faster. A tool is needed to make MCMC faster in general. One option is sequential Monte Carlo (SMC) Sampling, which shows promise to provide a faster option than MCMC that can be parallelised across computer cores. The Big Hypothesis project will integrate the SMC sampler in Stan, with the hope of speeding up models fits that would have taken six months to fit within three minutes.
The next RSS Merseyside Local Group meeting, 'The Hidden Statistician' took place on the 11 December, showcasing the use of statistics in 'unexpected' places.