But first, a little background from my point of view. When statistics evolved as a distinct academic discipline in the mid-20th century, it was invariably in the shadow of the mathematics department. To be taken seriously as an academic, one had (and still has) to display the trappings of expertise and rigour. Yet this could be done either by lots of abstraction and mathematics, or by lots of application and real-life problems. Some of greatest names of that era, like Frank Wilcoxon and George Box, learned their skills from application (as did Fisher and Gosset fifty years earlier), but mostly the maths won. It was the path of least resistance in a larger university setting, and that informed the teaching.
However, as the years went by, everybody wanted to learn some stats - economists, ecologists, archaeologists, doctors, you name it. But they typically weren’t so good at maths, so to accommodate these students, many introductory statistics courses for non-statisticians had their mathematical foundations mostly dumped and replaced with recipes. You all know the sort:
- Do a Kolmogorov-Smirnov test.
- If p<0.05, do a Mann-Whitney.
- If not, do a t-test.
- Either way, if p<0.05, you can say the ‘difference’ is ‘significant’ (but not worry too much about what that really means).
This has the effect of getting people to pass exams, but then have no idea what to do in real life, or worse, have inflated notions of their own competence. It's surface learning, not deep understanding. The choice between mathemania and cookbooks is the reason why people feel comfortable telling you that statistics was the only course they failed at university. Or even more worrying – that they got an A but never understood what was going on.
The movement to reform introductory statistics courses is really focussed around the American Statistical Association’s guidelines for assessment and instruction in statistics education (GAISE). This is the only set of guidelines on how to teach statistics, yet if you are a British statistics teacher you’ve probably never have heard of them. They are fairly widely used in the USA, Australia and Canada, though not universally by any means. But they are wholeheartedly adopted in New Zealand, where they inform the national policy on teaching statistics. The principles are:
- Use real data, warts and all
- Introduce inference (the hardest bit) with simulation, not asymptotic formulas
- Emphasise computing skills (not a vocational course in one software package)
- Emphasise flexible problem-solving in context (not abstracted recipes)
- Use more active learning (call this ‘flipping’, if you really must)
Reversing the traditional syllabus
Most introductory statistics courses follow an order unchanged since Snedecor’s 1937 textbook: the first to be aimed at people studying statistics (rather than learning how to analyse their own research data as an aside to their subject of interest). It may begin with probability theory, though sometimes this is removed along with other mathematical content.
But the problem is, without the maths, the role of probability is unclear to the student. It is at best a fun introduction, full of flipping coins, rolling dice and goats hiding behind doors. But the contemporary, vocationally focussed student has less patience for goats and dice than their parents and grandparents did. They want marketable skills.
Next, we deal with distributions and their parameters, which also introduce descriptive statistics (although the distribution is an abstract and subtle concept and there are many statistics which are not parameters.) Again, the argument goes, once the properties of estimators was removed so as not to scare the students, it was no longer obvious why they should learn about parameters and distributions.
Then we move to tests and confidence intervals, though we may not talk much about the meaning or purpose of inference in case it discourages the students. This is where they are at danger of acquiring the usual confusions: that the sampling distribution and data distribution are the same, p-values are the chance of being wrong and that inference can be done without consideration for the relationship between the sample and the population.
Comparison of multiple groups is then introduced and perhaps some experimental design. There may be some mention of fixed and random effects (but necessarily vague) before we move to the final, ‘advanced’ topic: regression.
The appearance of regression at the end is Snedecor’s choice. If presented mathematically, that’s probably the right order because it depends on other concepts already introduced. But if we drop the maths, we can adopt a different order, one that follows the gradual building of students’ intuition and deeper understanding. The maths can be brought in after the intuition is in place.
Andy Zieffler and colleagues at Minnesota have a programme called CATALST that does this. First it introduces simulation from a model (marginal then conditional), then permutation tests, then bootstrapping. This equates to distributions, then regression, then hypothesis tests, then confidence intervals, but sets aside formal definitions from the early part. This flips around Snedecor’s curriculum and was echoed in a different talk by David Spiegelhalter.
CATALST emphasises model plus data throughout as an overarching framework. Data may be expected to scatter above and below the model value(s), and by examining this, students start to think about the ideas of goodness of fit and model selection, long before these concepts are introduced formally. However, Zieffler noted that after five weeks the students do not yet have a deep concept of quantitative uncertainty (so don’t expect too much too quickly).
Spiegelhalter’s version is focussed on dichotomous variables: start with a problem, represent it physically, do experiments, represent the results as trees or two-way tables or Venn diagrams to get conditional proportions, talk about expectation in future experiments, and finally get to probability. Probability manipulations like Bayes or P(a,b)=P(a|b)P(b) arrive naturally at the end and then lead to abstract notions of probability rather than the other way round. Visual aids are used throughout.
Inference by simulation
The GAISE recommendation on introducing inference is a particularly hot topic. The notion is that students can get an intuitive grasp with bootstrapping and randomisation tests far more easily than by asking them to envisage a sampling distribution (arising from an infinite number of identical studies and drawing from a population where the null hypothesis is true.)
This makes perfect sense to us teachers who have had years to think about it. When you pause to reflect that I have just described something that doesn’t exist, arising from a situation that can never happen, drawn from something you can never know, under circumstances that you know are not true, you see how this might not be the simplest mental somersault to ask of your students.
A common counter-argument is that simulation is an advanced topic. But this is an accident of history: non-parametrics, randomisation tests and bootstrapping were harder to do before computers, so we had to rely on relatively simple asymptotic formulas. That just isn’t true anymore, and it hasn’t been since the advent of the personal computer, which brings home for me the extent of inertia in statistics teaching.
Another argument is that the asymptotics are programmed in the software, so all students have to do is choose the right test and they get an answer. But you could also see this as a weakness. For many years statisticians have worried about software making things ‘too easy’, and this is exactly what that worry is about, that novices can get all manner of results out, pick an exciting p-value, write it up with some technical sounding words and get it published.
As for bootstrapping, most of us recall thinking it was too good to be true when we first heard about it, and we may fear the same reaction from our students, but that reaction is largely because we had previously been trained in getting confidence intervals the hard way, perhaps via second derivatives of the log-likelihood function. I tell my students we’re doing the next best thing to re-running the experiment many times, and they accept that quite readily.
Keeping quantitative skills up to scratch
Another new idea for me was that academic statisticians can provide effective refresher courses to colleagues in other academic disciplines, in commercial settings and in schools. I was really impressed by Esther Isabelle Wilder of CUNY’s project NICHE, which aims to boost statistical literacy in further and higher education, cutting across specialisms and silos in an institution.
It acknowledges that many educators outside stats have to teach some, that they may be rusty and feel uncomfortable about it, and provides a safe environment for them to boost their stats skills and share good ideas. This is a very big and real problem and it would be great to see a UK version. Pre and post-testing among the participants showed great improvement in their comprehension, and they have to turn people away each summer because it has become so popular.
This is an edited version of an article that first appeared on Robert's blog.
The views expressed in the Opinion section of StatsLife are solely those of the original authors and other contributors. These views and opinions do not necessarily represent those of The Royal Statistical Society.