There are few more controversial and high profile applications of statistics than the assessment of secondary school performance. On 13 November, Dr George Leckie of Bristol University’s Centre for Multilevel Modelling nimbly guided a meeting of the Glasgow Local Group through the statistical complexities of this process, focusing on the Expected Progress (EP) measure used in England since 2011. The meeting, titled 'Monitoring school performance: A multilevel value-added modelling alternative to England’s ‘expected progress’ measure', was hosted at the MRC/CSO Social and Public Health Sciences Unit and chaired by Professor Alastair Leyland. It was attended by both RSS members and non-members, and attracted students from the High School of Glasgow accompanied by their maths teacher.
The aim of EP is to gauge the 'effect' of a school on its pupils’ academic attainment, measured by their performance in GCSE exams at around age 16. Judging schools on raw GCSE results would conflate the effect of a school with the prior attainment of its intake, biasing the system in favour of schools with a higher attaining intake. EP attempts to adjust for differences in potential by judging schools on their pupils’ progress, rather than attainment.
Potential on entry to secondary school is measured by Key Stage 2 (KS2) test scores. Both KS2 and GCSE results are converted to ordinal scales, and expected progress in a given subject is defined as achieving a GSCE score equivalent to KS2 level + 3. For example, expected progress is a grade D at GCSE for pupils starting from KS2 level 3, and a grade B for pupils starting at KS2 level 5. A school’s EP score is the percentage of pupils making the expected progress. An additional “floor standard” assessment is made by judging schools to be underperforming if fewer than 40% of pupils achieve at least five GCSE passes at grade C or better, unless the school achieves at least median EP in English or mathematics.
While acknowledging the difficulty of constructing any measure of school effectiveness, Dr Leckie highlighted three statistical weaknesses of EP:
- The binary nature of EP creates perverse incentives by driving schools to invest in pupils close to achieving their expected progress, to keep them above the threshold. The EP system does not reward schools for improvements in the GCSE results of pupils who nevertheless fall short of EP, or for further improving the grades of pupils who easily exceed it.
- The assumption that units of progress have equal value regardless of a pupil’s starting level. For example, progress from KS2 level 3 to GCSE grade D is deemed equivalent to progress from KS2 level 5 to GCSE grade B (equivalent assuming that the relationship between GCSE and KS2 attainment is not only linear but also has a slope of one). Given that the KS2 and GCSE were not designed to correspond in this simple way, it is hardly surprising that this assumption turns out to be faulty, and in fact expected progress is easier to achieve for children who performed better at KS2, biasing EP in favour of schools with higher prior attainment. This is a serious failing given that one of the primary aims of EP is to adjust effectively for prior attainment.
- The government presents EP statistics as point estimates, without any indication of uncertainty due to factors such as sampling error. For example, a parent comparing three schools with small differences in EP has no guidance as to the reliability of the schools’ rank order. Is the top school intrinsically 'better' (in the narrow sense of its contribution to EP) than the other two, or was it simply lucky?
The remedies proposed by Dr Leckie fell into two broad categories: better modelling and more cautious interpretation. He advocates a multilevel value-added (VA) modelling approach to assessing school performance. In this approach, a pupil’s attainment at GCSE is modelled in a multilevel linear regression model as a flexible function of attainment at KS2. Random variation is modelled at two levels (hence the 'multilevel'), between pupils and between schools. This simplified summary of the model views a pupil’s GCSE grade as the sum of fixed and random factors:
- The fixed global mean GCSE grade
- A fixed flexible function of her/his KS2 grade
- Other fixed factors that predict GCSE performance, for example, deprivation
- The pupil’s random deviation from her/his expected GCSE grade
- The school’s random deviation from its expected mean GCSE grade
The last of these factors represents the value added by the school after the other factors (most importantly, prior attainment) have been taken into consideration, and can be straightforwardly estimated and presented as a measure of school performance. This multilevel VA approach goes a long way to addressing the three aforementioned flaws in EP because (1) it recognises that all progress has value, not just progress close to an artificial threshold, (2) the flexibility of the linear regression model framework allows relaxation of the unrealistic assumption of a one-to-one correspondence between GCSE and KS2 grades and (3) it is relatively straightforward to incorporate uncertainty in the VA estimates.
Finally, Dr Leckie stressed the importance of recognising the limitations of all methods for assessing school performance, and the consequent need for caution in interpreting the resulting scores.
Further details of his research can be found at: http://www.bristol.ac.uk/cmm/research/mm-gov-new-school-performance/.