The current issue of The American Statistician <http://www.tandfonline.com/toc/utas20/69/4> is a special issue on statistics education in the context of undergraduate specialisation in statistics. This exposes a tension between the various college types from the four year programmes of the elite Ivy League and Liberal Arts to the provision of smaller community colleges and associate degrees.
The system in the US of majors and BS specialisation means that many students can opt into some statistics taught by statisticians. And where the courses are not taught by statistics departments, there may well be some quality control or coordination organised by a committee responsible across the college. There may even be a compulsory statistics course for all students, similar to the statistical literacy offerings that are appearing in the UK.
The experience in the US is that many more students are opting for statistics as part of their degree, and there is a strong perception that this has a lot to do with data science. Many of the individual articles in the issue are about data science or statistical computing, whether that is simulation, boostrapping, data wrangling, machine learning or statistical programming. And it is this aspect of ‘analytics’ which seems to be resonating with students, both in terms of utilising real problems and future employability.
There is a clear focus on employability in data driven businesses as being important to the academics developing programmes, not just optional courses or projects. This extends to alumni surveys and serious engagement with private sector actors to develop courses and lead sessions in summer schools or what they call ‘capstone’ courses.
A capstone tops off the student experience by consolidating their learning with an in depth piece of work on a real problem, usually in a group. This combines the dissertation/project element with a statistical consulting project so that the students don’t just advise a client but go through to produce a solution. This means they have to engage with a messy problem, messy data, and a pragmatic output.
For all the obvious risks, capstones are very well received by students and seem to be successfully delivered. They do of course fit the US model of majors who may have taken a variety of courses in statistics and therefore need something to draw it together in a way which is very flexible. The focus on a real problem sidesteps the challenge of having students who have a range of technical sophistication.
This is the germ of the special issue which relates to the new (2014) curriculum guidelines for undergraduate programmes <http://www.amstat.org/education/curriculumguidelines.cfm> which the ASA have produced. A change is acknowledged, that computing has supplanted mathematics as the mode of abstraction in the statistics curriculum. Logically, the mathematical route is no longer natural, and computing based second courses are diversely presented.
Nolan and Temple Lang (2010) <http://www.stat.berkeley.edu/~statcur/Preprints/ComputingCurric3.pdf> advanced the case for statistical computing some years ago and it is challenging established programmes. SQL and R are recommended in tandem, with Python being acknowledged as a better basis but more likely to emphasise the disparity of coding skills among students. These allow for development of skills in data wrangling – working with data as it comes, as well as going through to visualisation.
In a graduated way, it is feasible for inference and data to be introduced to students through computation rather than proving theorems about sufficient statistics. And critical approaches can show that exact and randomisation/bootstrap based inference is a better approach than the asymptotic theory. Nolan and Temple Lang now have a website providing for problem focused teaching of all of these skilss, to which they hope others will add: rdatasciencecases.org.
Lack of programme evaluation is criticised by some authors, but procedures for it are described by others. Identifying and mapping learning outcomes may sound obvious or bureaucratic, but it is this sort of exercise which sees employability as a focus and presses for statistical computing and data management emerge as priorities. It also challenges the use of completely sanitised datasets, or no data at all, early in programmes, if the focus is no longer on mathematics.
The opinion that there is plenty of cognitive research done already and it is not nearly so necessary is perhaps a conclusion based on a selection into the programmes. Research for the Nuffield Foundation <http://www.nuffieldfoundation.org/sites/default/files/files/Nuffield_CuP_FULL_REPORTv_FINAL.pdf> has shown that very little is known about how children develop conceptions of randomness, sample spaces etc. It is assumed, therefore that these fundamental are established at entry.
Thus the presumption that the first course is established as basic probability and inference meets a challenge which is something the UK has led. Statistical literacy is an obvious first course which can be open to all in the way that probability theory never will be and is much more flexible to diverse backgrounds than statistical computing. Indeed the ubiquity of computing power means much more is being produced which requires critical interpretation.
Components of Practice
Statistical literacy typically focuses on interpretation, after the fact, whereas there is a great deal of conceptualisation required before, not just in design and data collection but problem formulation. Working with real data is about understanding its context, what we want to learn from it and embracing its intricacies to solve problems. The pedagogical challenge is introducing these things to students in a manageable manner, and in a way that develops their ability and confidence over the course of a programme.
The old, established conceptualisation acknowledging the importance of the full data handling cycle and the significance of substantive context, mathematics and computing. But the importance of inference in an epistemic sense is becoming more apparent. It was a distinction between approaches that Breiman (2001) dubbed as two cultures <https://projecteuclid.org/euclid.ss/1009213726>, but until they have to work with some data students may not have any culture of statistical practice.
Extracurricular Data Wrangling
It is possible to leave the more inferential thinking necessary to scientific statistical practice until postgraduate studies as Peter Diggle has suggested <Presidential Address>. But in a contemporary context where data is ubiquitous, it cannot be expected that all data science is left to those with higher qualifications. Indeed, there seems no reason why undergraduates, or even younger students, would not be doing some self-directed data wrangling.
It therefore seems a pity that none of the articles mention what students do with their skills when they are not in class. The new skills would have direct application in developing apps or data journalism that students are likely to be doing in their spare time. This would make sense as a reason for popularity and success of courses but if they are seeing extracurricular uses, they are not mentioned. Data science projects would make interesting adverts for the professional prospects in institutional outreach – perhaps something else for a collaborative website.