Think back…how often in debate – on the radio, on TV – have you heard people state that “x is linked to y”?. ‘y’, for example, could be cancer or the economic slump and ‘x’ anything from pollution levels to bacon consumption, low confidence to the weather. In saying that the two are linked, they are really only referring to an association, a statistical pattern, between them. But the implication, sometimes implicit, sometimes explicit, is that ‘x’ causes ‘y’.
But where is the evidence? The job of statistical tests is to tell us whether correlation between two measurable things (what statisticians term ‘variables’) is down to coincidence or otherwise significant. But even when there seems to be strong correlation, this still does not prove causation.
The much-used phrase “correlation does not prove causation” is, in itself, correct but when used as ammunition in debate, it often marks the end of informed discussion as participants struggle to think where and how to move things forward. And that’s because they are usually talking about causation but basing their arguments on correlational data and it’s a big leap from finding that there is an association between two things to knowing that one actually causes the other.
Our advice? Above all, don’t crumble in the face of debate around correlation and causation but just as statisticians don’t do one off yes/no tests, keep probing, keep building up your evidence incrementally and you’ll be nearer to the truth of the matter.
Basically, bringing ‘correlation’ into a debate is pointless unless you know something about the strength (how close) and the direction (positive or negative) of the relationship between the two things you are discussing.
In short, you need to know:
whether there is positive correlation (i.e. as one thing increases, so does the other) or negative correlation (as one increases, the other decreases)
the correlation coefficient (symbolised as ‘r’ – a figure which will always be between -1.0 and +1.0) which tells you how close the association is between the two things (when there is a causal link, this measure represents information which can begin to help you to begin to predict future interventions… a very simple example would be that of an electricity plant planning for higher outputs during a cold spell based on the correlation between high demand for electricity and cold weather)
the regression coefficient, Again, providing there is a causal relationship between two things, this measurable unit of the relationship between two things explains how closely connected they are in practice and how much one thing will change if the other changes
It’s also useful to know the effect size eg there might be a strong correlation between two things but, in practice, the actual number of imports-exports or sick people etc involved - according to the debate - might be very small. But then again, they might still be very important and carry huge impact.
If you know these things then you’re not only in a strong position to talk about correlation and causation, you’ll also have useful evidence to help you begin to plan how you are going to tackle the problem or issue you’re facing.
SO, when you are debating and hear the terms ‘correlation’ and ‘causation’ in the same sentence, don’t crumble, keep probing and ask some or all of the following questions:
how strong is the correlation between the two things being considered? ? i.e. what is (the measure of) the strength of that relationship ? (if you are being technical…what is the co-efficient?)
what more do we know about the interaction between the two things? i.e. how much change in one thing is required for a unit of change in the other, be it cancer or the economy? (what is the regression coefficient?)
what do we know about ‘third factors’ which may have a confounding effect e.g. if you were looking at suicides and unemployment, you might ask about the potential effects of divorce and partnership breakdown which are also likely to increase as people experience longer-term unemployment…
what do we know about the project/research/survey design? e.g. was it focused on looking for a specific effect, or intended to report what was significant of a large number of correlations?
do we know whether other projects/ research/surveys have had similar results?
is there other related evidence out there to support the case being made for cause and effect?