On 26 October 2017, the Social Statistics Section met to consider the Design and Methodology of Large Longitudinal Studies. Around 40 people attended to hear the five speakers and the panel discussion. Copies of the presentations and an audio recording of the proceedings are available from the event page.
The event began with three presentations on how administrative data could be used to fulfill the role that longitudinal surveys had pioneered. These were followed by two presentations relating to more traditional approaches based around cohorts and panel surveys respectively. The event concluded with a panel discussion including news from the Wellcome Trust about its new model for funding such studies.
Piero Falorsi, from the Italian National Statistical Agency (Istat) spoke about how access to longitudinal administrative records was allowing estimation of error. Istat has been working to integrate social, place and economic concepts in a way that keeps track of the statistical units and population coverage. Assessment shows that there are errors in both directions and that the demographic accounting approach is not reliable. Longitudinal data can borrow strength by looking at latent variables to understand the linkage relations; this is possible because Istat has access to all of the data. However, much more research is still needed.
Lorraine Dearden, who is affiliated to both University College London and the Institute for Fiscal Studies, looked at how cohort survey data could be used in combination with administrative sources. While admin data covers the whole population, it can contain certain biases and has a limited range of covariates. Cohort studies surveyed in waves have to rely on recall of when critical events took place. Putting this information together into a form that supports research is essential. Present restrictions on practices such as keeping data and putting researchers in competition rather than sharing were highlighted as holding back research.
Ruth Gilbert, also from UCL and deputy director of the Administrative Data Research Centre for England, highlighted the potential of service delivery data and possible pitfalls of linkage processes. Admin data often lacks socio-economic content and there are also problems with social coverage when linkage is done on names where unusual spellings can be poorly matched due to transcriptions from other cultures. Ruth proposed that the NHS population demographic service offered an ideal population spine for England; one is already available in Scotland. But there are issues around regulation and a framework needs to be developed enabling analysts to work alongside data custodians.
George Ploubidis, again from UCL, talked about the work of the Centre for Longitudinal Studies in understanding missing data profiles from the national cohort studies. Comparing imputed patterns to those seen in the population shows impressive correspondence in the distribution and these were sustained in interactions and time-varying patterns. He went on to outline some of the work on causal inference, a particular challenge for observational studies, even those with strong designs. One method is the use of negative controls, whose significance would suggest a problem with data. An audience question pointed out the need for technical understanding to be complemented by understanding of substantive theory in this work, for example, in specifying what was suitable as a negative control.
Peter Lynn from the University of Essex drew on his experience with the Understanding Society longitudinal study. He described some of the practicalities of achieving population coverage and representation such as offering incentives to participants and using online responses to reduce costs. He explained the challenge of maintaining survey quality while sustaining its scientific opportunities in methodological experiments. He concluded that running these large surveys is intensive but there is a wealth of analytical potential and they are a unique national resource.
The panel discussion that followed discussed access to admin data, which has been an ongoing challenge for many. The RSS Social Statistics Section advocated the need for studies to have good social coverage of a defined population, and that this was reliably achieved by random sampling and complex analysis.
Erica Pufall from the Wellcome Trust explained how the Trust's new call for large longitudinal population studies would support both data collection and infrastructure projects. A sum of £5m over five years for new and continuing studies (with additional funds possible from MRC) will put existing ad hoc grants on a more formal footing, with an additional £1.5m over a similar time period to support other aspects such as networks, infrastructure and complex methods development. The latter clearly offers a new opportunity.
Harvey Goldstein of the University of Bristol reflected on his experiences analysing the effects of smoking in pregnancy. Random sampling is taken for granted in social surveys, but of the large UK cohort studies, only the MCS used this approach. Analyses of maternal smoking showed inconsistent effects on neonatal survival, but this was because the mechanism of reduction in birthweight was not known. As both high and low birthweight pose risks, and population averages vary depending on social factors, a simple analysis can show positive, negative or neutral effects. Other panel members said this showed the need for complex analyses to understand mechanisms. Indeed, the need for replication and repeated specific longitudinal studies to understand heterogeneity across study populations might well be achieved through ensuring population coverage in the first place.
Ray Chambers of the University of Wollongong and ESRC international review team member, highlighted the rich heritage of UK longitudinal studies and need for sustained investment to maintain it, comparing what was available in Australia. Piero Falorsi explained that while Istat had access to administrative data, researchers in universities and elsewhere faced extensive barriers in terms of rights and practicalities. In Denmark, however, all of this information is readily available, supported by strong penalties for inappropriate usage.
Barriers to access to UK administrative data were identified by many speakers, who had experienced delays or even rejections in gaining permission. Regulations are applied more restrictively than needed. For example, health data is not allowed for use for non-health research, analysts are unable to be part of the linkage process, and data is destroyed after use.
The RSS maintains an interest in progress on the big picture around the importance of research in longitudinal data. This extends to making the case publicly for what cannot be known without such data, and the importance of complex analysis in order to reach robust understanding. The Social Statistics section would be pleased to hear from members about their views and experiences, not least examples of what has been learned from these studies.