Whenever I write something controversial, and cite some real world practical experiences of said outcomes/behaviours, I inevitably get someone emailing me to declare some variation of:
“Correlation does not imply causation. Your statistics are appalling. And therefore, so is your argument.”
Some general defences:
- Yes, but correlation suggests causation.
- Your statistics are possibly even worse.
- This is not logic; and therefore, yours is not an argument.
Anyway – I’ve given it some thought, and I wanted to share some of said thoughts on the correlation/causation “fallacy”.
The Statistical Definitions
- Correlation is a relationship between two variables.
- Causation is a relationship where the second variable causes the first variable.
So to put this in practical terms, I’m going to go back to my Cecil the Lion post/example about the impact of professional hunting on conservation. There were two sets of data points:
- In 1964, South Africa had a national herd of wild game that consisted of around 575,000 wild animals. South Africa then started issuing hunting permits and establishing safari hunting lodges. Today, the wildlife population is up to ±24 million.
- By contrast, Kenya banned all hunting in 1977. Since then, it has lost 85% of its large animal population.
Here are the two variables:
- Professional hunting permits; and
- Rate of growth in the wildlife herd.
Here is the correlation:
- The banning of professional hunting coincides with the reduction in the numbers of wild game.
- The introduction of professional hunting permits coincides with the growth in the numbers of wild game.
The trouble is, I want to infer a causal relationship between these two things. That is:
- The introduction of professional hunting permits caused the growth in the numbers of wild game; and
- The banning of professional hunting led to the decline in the numbers of wild game.
Under strict and academic statistics, I can’t make that inference. Mainly because I can’t isolate out these conditions in a laboratory-esque experience.
If I Wanted To Demonstrate Causation
So let’s say that I wanted to say something stronger than: “Oh look, I see these things happening together, and the academic statisticians amongst us won’t let me say whether they are related by more than just coincidence – but weird, huh?”
I would need to do something like this:
- Find two countries/areas that are identical (and, I mean, identical – you want the same everything – people, culture, governance systems, economy, population size, environment, etc);
- Ban hunting in the one.
- Permit hunting in the other.
- Make observations about what happens to the wildlife herd in those two places over time.
- Make some conclusions about causality.
Even then, you might get an academic declaring that the sample size is too small. And that you’d need to repeat the experiment in multiple identical areas in order to make any kind of reasonable statistical inference.
I’m sure you can see where this is going: deep into the dark nether regions of philosophy as we approach awkward questions like “Can we know anything?” and “Is all the world a series of inductions based on empirical experience that need not imply repetition or causation at all?” and “Who was this David Hume person?”
So the reason that statisticians are so obsessed with this “correlation does not imply causation” fallacy is that they’re trying to avoid making idiotic assessments like this:
That is: we can be pretty sure that death-by-bedsheet-tangling has nothing to do with how much cheese gets eaten by a particular population.
But because we live in a world of infinite variables, you’re always going to get some kind of correlation happening around the edges. And that is more a symptom of living in a world of infinite variables, rather than there being an actual underlying relationship between every pair of variables that demonstrates some statistical correlation.
Having said that, in the full knowledge that you’re going to get some false positives -I still think that allowing the fear of making a false causal inference to preclude people from making any causal links whatsoever is equally idiotic. It would be the equivalent of never making a decision because you never know all the facts. And let’s be serious – you can’t await omniscience before deciding to order poached eggs for breakfast.
What To Do Then
We deal with the problem of correlation and causation in the same way that we established the falseness of the cheese-consumption/death-by-tangled-bedsheets correlation:
- Observe that there is a correlation; and if there is, then:
- Logically establish whether there is a strong underlying reason for the correlation.
- Is there a relationship? And
- Was there a reason that we expected there to be a relationship in the first place?
Taking it back to the professional hunting
The correlation is pretty clear: banning hunting is associated with lower game numbers; regulating the professional hunting industry by allowing permits and establishing hunting conservancies is associated with higher game numbers.
But the bigger question is: was there a reason to expect there to be a relationship in the first place?
I guess that’s up to each person, really.
I’ve already explained why I think there are good economic reasons to expect that causal relationship to exist (see here). And perhaps I should have included that in the Cecil post.
But either way, I think it’s a far greater stretch to argue that the two data sets have as little to do with each other as the death-by-tangled-bedsheets and cheese-consumption.
And as a final observation: I think we need to be just as careful about using the “correlation does not imply causality” fallacy to disprove causality. Because it certainly doesn’t do that.