The ESRC has developed a wide portfolio of data resources, from major longitudinal studies and the birth cohorts housed by the Centre for Longitudinal Studies, through to cross sectional studies such as the British Election Study. These are part of an incredibly rich social science data infrastructure available to researchers in the UK and the goal is to broaden the use of these resources beyond their primary objectives.
Paul Meller is the head of data and resources at the ESRC. As the SDAI enters its third phase, we asked him about what the scheme has achieved so far and how it enables secondary data to be researched in innovative ways that develops new methodologies and capacity to use them within the research community.
What has been the impact of secondary data use, and how would this not have been possible through any other research means?
So far two phases of the SDAI have been commissioned, enabling us to fund 79 successful grants, and phase three is currently underway. Researchers in the first two phases have worked with a variety of partners, from the Department for Education and Bank of England to the World Health Organisation and the OECD. They have drawn on data ranging from the British Election Studies and birth cohorts to the Census and Labour Force Survey.
Alongside addressing key social science research questions, the projects highlight how these data resources can be used to address interdisciplinary challenges at the boundaries with other scientific areas.
The impact from these projects covers a diverse range of areas from advising the Home Office’s Office for Security and Counter-Terrorism on attitudes to ethnic change. To the creation of the first comprehensive Longitudinal Business Structure Database of firms and local units which uncovered the factors that are influencing economic growth and job creation in England’s 39 Local Enterprise Partnerships. We have just published a details of the research topics tackled by the scheme so far.
One of the key features of secondary analysis is that it is an efficient way to conduct research, drawing on existing resources rather than a reliance on primary data collection for each new project. Without secondary analysis of these kinds of resources it would not be possible for individual research projects to investigate research projects at the same scale or in the same depth given the breadth of topics and projects would be much more lengthy and costly. Some of the research funded through SDAI was also made possible by drawing on or combining different secondary data sources.
What constraints are there on what of secondary data and projects that researchers can pursue?
This has varied across the phases as SDAI has evolved. The first two phases were kept very open in terms of theme and dataset. For phase three, the decision was taken to focus attention on key ESRC data resources due to their richness and to maximise the use of our own data resources – including data accessed through the recent investment in the ESRC Big Data Network.
However, the open approach to research areas and of themes was kept for phase three as well. Across all three phases, an open approach to the development of new methodologies and partnership with non-academic organisations has also been maintained.
Have there been any difficulties with engaging external organisations in order to gain access to secondary datasets?
Many projects funded through the SDAI have accessed data through the UK Data Service (UKDS) which plays a major role is supporting researchers wanting to get access to a wide range of social and economic data resources. Much of the work involved in negotiating access to different data resources, working with data owners and other stakeholders, is therefore already done by the UKDS.
However, particularly where this support and structure is not in place, some SDAI award holders have experienced issues getting data from the owners as promised and have received data which does not have the depth of information expected.
Outside of the data accessible through the UKDS, there are also issues relating to data owners charging for their data or imposing different conditions or delays on access. In the case of data held by government departments, some commercial organisations and local authorities, the ESRC’s investment in the Administrative Data Research Network and Business and Local Government Data Research Centres is aimed at helping to address this and support researchers looking to access these resources.
With regards to early career researchers, how successful have you been in developing capacity that is leading to insightful and interdisciplinary analyses?
One of the core aims of the SDAI has been to develop high quality data analysis capacity in the community and use the variety of rich data resources available. In the first two phases, the involvement of early career researchers was encouraged and the projects funded have generated a number of positions for postdoctoral researchers, enabling them to develop experience of secondary data analysis with a view to leading such projects in the future.
To further catalyse this process, there is a requirement for all applications to phase three to include an early career researcher as a primary investigator or co-investigator, and include a programme of training and development for those researchers. It is too early to tell the long term effect of this policy and investment but it is hoped that further cohorts of researchers will be generated with the skills and experience to exploit a wide variety of data resources.
In terms of interdisciplinarity, the range of data resources available and open approach to topics and themes has meant that the majority of applications (around two thirds across all three phases) have been interdisciplinary in nature – both between different social science disciplines, and at the boundaries of the social sciences plus other scientific areas.
What difficulties have there been in understanding the constraints of secondary data analysis and the complexity of the output inferences?
The SDAI is open to applications drawing on data that have not been collected for a specific purpose or to answer a particular question such as our portfolio of major longitudinal studies. These studies were developed and designed to enable a wide range of research rather than being theory bound.
The Initiative is also open to applications using data not originally collected for research purposes, and the ESRC Big Data Network was established to facilitate access to these types of data – whether administrative data within government departments or data held by commercial organisations.
Clearly there are challenges associated with this type of data and applying it to particular research questions, including issues around data quality and bias. However, given the scale and variety of some of these data, and the range of possibilities they provide to look at previous questions in a new light or to pose new questions, the opportunity should not be ignored.
With regards to biosocial science, is there confidence that any of these secondary analyses are going to work, in that we are collecting the right data?
Our framework for biosocial research set out our ambitions for the engagement of the social sciences with biosocial research. We are investing in a rich and diverse range of resources for biosocial research and recognise the need to ensure that social scientists realise the huge potential of this biosocial data emerging from longitudinal studies. The types of biological measures being collected alongside our major longitudinal studies are well used in a wide range of research areas.
For instance, we are already seeing exciting new interdisciplinary research being conducted using the enhancements to major longitudinal studies, such as Understanding Society, through other mechanisms. You can hear more about the Understanding Society biomarker and genetic data by listening to the Understanding Society podcast.
In terms of the SDAI, none of the grants made so far have made use of the various biosocial data resources available. This is clearly a novel area of research and undoubtedly comes with its own methodological and capacity building challenges. However, there isn’t evidence to date that secondary analysis of these resources isn’t effective and has exciting potential to underpin high quality, impactful research. The Initiative provides funding for the exploitation of such data resources, encouraging applications in interdisciplinary areas such as biosocial research.