One such paper was published in the RSS Series B journal in 2009. The paper was titled ‘Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations’, written by Håvard Rue, Sara Martino and Nicolas Chopin. The idea would later be given the abbreviation INLA.
Their new INLA approach tried to find a different way of computationally analysing models that are too complex for normal Markov chain Monte Carlo (MCMC) methods. Subsequently, an R statistical software package was developed and this enabled the method to be applied in a diverse range of fields from econometrics to ecology.
In the years since its publication, new uses and tools for implementing the method have been developed and new applications are being discovered all the time. An online community has also been set-up to foster knowledge in the area.
Håvard Rue, who has led the project from the beginning, talked to us about how the method was developed and how interest in its various uses has grown since then.
Can you tell us a bit about your own education and research work in statistics? When did you begin to focus on Bayesian statistics?
I did an MSc in ocean engineering in 1988 at the former Norwegian University of Science and Technology in Trondheim. After 16 months with alternative military service, I started studying statistics and ended up with a PhD in 1993. My thesis was on the topic of statistical image analysis.
During that time, we used a lot of MCMC analysis and the Bayesian approach was indeed natural. I also found the Bayesian approach much more convincing than the frequentist approach, as it is based on a set of first principles and you then have to proceed logically with a problem.
Back in 2009, when you and your co-authors wrote the paper for Series B, can you give me some background as to what prompted you to research the INLA method?
It came quite naturally. I started looking at Gaussian Markov random fields (GMRFs) around 1999 and especially how to speed up computations using numerical methods for sparse matrices. The motivation was to speedup MCMC based inference for related models and also models where GMRFs had a part.
After working on various ways to block-update MCMC algorithms using these tools, it became clear that we could construct independence samplers for these models. I thought at that time that this would 'solve' the inferential problem, but the independence samplers were still far too slow.
This was despite the fact that we used efficient numerical methods to deal with sparse matrices and we used them throughout the code. Having a working independence sampler with a good enough acceptance rate, and then computing the posterior marginals from the proposal distribution seemed like an obvious thing to do.
In fact, the last pages of the GMRF book I wrote with Leonhard Held in 2004 outlines the initial ideas of INLA. After finishing the book, in January 2005, I started work on what is now the INLA approach. I did that with Sara Martino, a talented PhD student, and from that point on it was just a lot of work. Nicolas Chopin also joined the project after a few months during a visit to Trondheim.
When was the R package for INLA first developed? Was this in response to outside demand for a way to implement the method?
The approach is very computational, and there is much more ‘under the hood' than appears in the paper about INLA. This is related to efficient calculations of GMRFs and various tricks to cut the computational costs. We have been very focused on this, as our intention from the very beginning was for it to be used on 'big models'. Well, 'big' has been redefined since then with the advent of 'big data'.
The INLA-code is written in C, and was very tedious to use in the beginning, as the model itself also had to be provided in C. We realised that we needed a tool to help ourselves with this and started to write an easier interface for which we could define the model itself in a text file. We were quite happy with it and we presented it to a colleague in Oslo, Arnoldo Frigessi. He told us we had to provide an interface in R. I recall we were not very happy with this, as neither I nor Sara Martino were particularly fluent in R. Anyway, Sara worked over two weeks to come up with a prototype of an R interface, and things developed from there.
INLA has since gone on to be used in various research projects around the world. Has the level of interest in the method surprised you?
The use of the software is now quite widespread, both in location and also in various application areas. I must admit that others have seen the wider usage potential and encouraged us to add new features, give courses about its use and so on. Leonhard Held from the University of Zurich was probably the first one to do so.
I think we have been too engrossed in the details to see the big picture. I had no idea that this would catch on so well and have so many widespread uses. We also have a parallel project with Finn Lindgren and Johan Lindstrom (RSS Series B, 2011), about representing Gaussian fields as GMRFs, using stochastic partial differential equations. This means that spatial models can be much more efficiently represented and computed in R-INLA.
As a spin-off effect, this gives access to non-stationary models and spatial models on spheres, for example. These models and tools are now integrated into R-INLA by Finn Lindgren (from the University of Bath), and simply expand the models we can easily do with the package.
I must also mention Janine Illian (from the University of St Andrews), who from very early on promoted its use in the field of ecology and worked closely with us to develop features needed to do that. These features are of course also useful in other applications too.
I still get somewhat surprised when I see applications of INLA in areas I have never heard of and outside 'core statistics'. This demonstrates that what we are doing is important and has an impact on how people work with statistics.
One particularly nice example, is an article this year in The Lancet, where they studied the changing risks of malaria in Africa. They used R-INLA for all the analysis and made clever use of the new methods we have developed. (Credit is due to one of the authors, D. K. Kinyoki, who visited us and learned a lot!)
It also reminds us, that there is much more to do. This includes the ‘problem' of defining prior distributions for such models, a part which is often neglected in practice, but also something that is very difficult. We have made real progress here in my opinion, and we will to use these results to develop R-INLA further in this direction with the kind support of the Norwegian Research Council.
Can you tell us about the R-INLA courses and lectures that you are now doing around the world?
We decided some years ago to do some 'free R-INLA courses' if there was enough interest for it and this has gone on to become very popular. Essentially, if someone can assemble enough people who are interested, we will (try to) come and give a course/lecture series, paying our own expenses.
Although we are not selling anything, since R-INLA is free, this has shown to be a good 'business model'. It enables us to inform people in the research community about the new opportunities that R-INLA offers. The last course of this kind was a two-day event in November 2014 preceding the ‘First Sub-Saharan Africa Conference on Spatial and Spatiotemporal Statistics’ where two of our PhD-students (Elias Krainski and Geir-Arne Fuglstad) gave a course on R-INLA.
Another highlight was a memorable ‘US-tour’ we did in 2013, where Daniel Simpson and I visited five top universities in the US starting on the east coast ending up on the west coast. The news section on our R-INLA website lists all such courses we have given and announcements of new ones coming up in the future.