The current issue of The American Statistician is a special issue on statistics education in the context of undergraduate specialisation in statistics. Tom King looks at some of the findings presented in it, and considers the changing landscape of undergraduate statistical study.
In the US, there is a strong perception that because of data science, many more students are opting for statistics as part of their degree.
The system in the US of majors and BS specialisation means that many students can opt into some statistics content taught by statisticians. And where courses are not taught by statistics departments, there may well be some quality control or coordination organised by a committee responsible across the college. There may even be a compulsory statistics course for all students, similar to the statistical literacy offerings that are appearing in the UK.Many of the individual articles in The American Statistician are about data science or statistical computing, be it simulation, bootstrapping, data wrangling, machine learning or statistical programming. It is the 'analytics' aspect which seems to be resonating with students, both in terms of utilising real problems and future employability.
The rise of statistical computing
According to new curriculum guidelines for undergraduate programmes produced by the ASA, computing has supplanted mathematics as the mode of abstraction in the statistics curriculum and computing-based second courses are diversely presented.
Nolan and Temple Lang (2010) (PDF) advanced the case for statistical computing some years ago and it is challenging established programmes. SQL and R are recommended in tandem, with Python being acknowledged as a better basis but more likely to emphasise the disparity of coding skills among students. These allow for development of skills in data wrangling – working with data as it comes, as well as going through to visualisation.
Computation allows for inference and data to be introduced in a graduated way to students, rather than proving theorems about sufficient statistics. And critical approaches can show that exact and randomisation/bootstrap-based inference is a better approach than the asymptotic theory (Nolan and Temple Lang now have a website providing for problem-focused teaching of all of these skills).
Working with real data
Statistical literacy typically focuses on interpretation after the fact - but there is a great deal of conceptualisation required beforehand, not just in design and data collection but problem formulation. Working with real data is about understanding its context and what we want to learn from it and embracing its intricacies to solve problems. The pedagogical challenge is introducing these things to students in a manageable manner, and in a way that develops their ability and confidence over the course of a programme.
The old, established conceptualisation acknowledges the importance of the full data handling cycle and the significance of substantive context, mathematics and computing. But the importance of inference in an epistemic sense is becoming more apparent. It was a distinction between approaches that Breiman (2001) dubbed as two cultures - but until students get to work with data, they may not have any culture of statistical practice.
There is a clear focus on employability within data-driven businesses in the developing programmes, not just optional courses or projects. This extends to alumni surveys and engagement with private sector to develop courses and lead sessions in summer schools, or what is called ‘capstone’ courses.
A capstone tops off the student experience by consolidating their learning with an in-depth piece of work on a real problem, usually in a group. This combines the dissertation/project element with a statistical consulting project so that the students don’t just advise a client but aim to produce a solution. This means they have to engage with a messy problem, messy data and a pragmatic output.
For all the obvious risks, capstones are well-received by students and seem to be successfully delivered. They fit the US model of majors who may have taken a variety of courses in statistics and therefore need something to draw it together in a way which is flexible. The focus on real problems sidesteps the challenge of having students with a range of technical sophistication.
Extra-curricular data wrangling
It is possible to leave the more inferential thinking necessary to scientific statistical practice until postgraduate studies, as Peter Diggle suggested in his Presidential address (PDF). But in a contemporary context where data is ubiquitous, not all data science is likely to be left to those with higher qualifications. Indeed, there seems no reason why undergraduates, or even younger students, would not be doing some self-directed data wrangling.
It therefore seems a pity that none of the articles in The American Statistician mention what students do with their skills when they are not in class. The new skills would have direct application in developing apps or data journalism that students might be doing in their spare time. This would make sense as a reason for popularity and success of courses but if they are seeing extracurricular uses, they are not mentioned. Data science projects would make interesting adverts for the professional prospects in institutional outreach – perhaps something else for a collaborative website.
Lack of programme evaluation is criticised by some authors in The American Statistician, but procedures for it are described by others. Identifying and mapping learning outcomes may sound obvious or bureaucratic, but it is this sort of exercise which sees employability as a focus and presses for statistical computing and data management to emerge as priorities. It also challenges the use of completely sanitised datasets, or no data at all, early in programmes.
The opinion that there is plenty of cognitive research done already is perhaps based on selection into the programmes. However, research for the Nuffield Foundation (PDF) has shown that very little is known about how children develop conceptions of randomness, sample spaces etc. It is assumed, therefore, that these fundamentals are established at entry.
Establishing the first course as basic probability and inference meets a challenge (one that the UK has led on). Statistical literacy is an obvious first course that can be open to all (in the way that probability theory will never be) and is much more open to diverse backgrounds than statistical computing. Indeed, the ubiquity of computing power means that much more is being produced which requires critical interpretation.
The views expressed in the Opinion section of StatsLife are solely those of the original authors and other contributors. These views and opinions do not necessarily represent those of The Royal Statistical Society.