Correlation vs. Causation: Understand the Difference for Better Interventions

If you have taken a research methods course at some point, you may remember the mantra “correlation does not imply causation.” People say they understand the difference between a correlation and causation, but when I hear them talk, I can tell that they don’t.

As a quick refresher, correlation simply refers to what occurs when two variables co-vary together. Essentially as one variable increases so does another variable (positive correlation, see graph on the left).  Or as one variable increases another variable decreases (negative correlation, see graph on the right). On the other hand, causation can be thought of as a specialized correlation in which two variables are co-varying because of one of the variables.

Figure 1. Simplified representation of a “positive” and “negative” correlation.

The distinction between correlation and causation is clearer when we look at variables that are correlated simply by chance. For example, a correlation exists between letters in the winning word in the Scripps National Spelling Bee and deaths due to venomous spiders (Vigens, 2015). Basically, as the number of letters in the Scripps winning word increased, so too did the number of deaths that year by venomous spiders increase.

If your reaction to that correlation is that the two cannot be correlated because there is no reason for the correlation to occur, what you are actually trying to establish is a causal relationship. In which case you are correct, there is no causal relationship between these two variables.

The lack of a causal relationship is clearer when two variables are in no way conceptually related. However, a causal relationship still has not been established even when there is a correlation established between two variables that appear to be related.

Take, for example, the obvious correlation between class attendance and course performance. The two variables are correlated such that course performance tends to increase with class attendance. If we do not address the possibility of other variables, we cannot say with certainty that class attendance increases performance because class attendance could be a proxy variable for course engagement, for instance, or some other circumstance.

Why is it important to disentangle these two concepts?

Disentangling these two concepts is more than just an interesting intellectual exercise; the distinction is important to achieve optimal outcomes. For example, when making big decisions about what to do to improve student success, we have to be careful that we are pressing on the right levers that will lead to the return on investment. When we think about interventions, the more we understand about the causal variable itself, the better the intervention we will have.

Consider the prevailing understanding that first-generation students are at risk for not completing a degree. It is critical for us to understand what the causal factor is in order to figure out a better approach for helping students who are the first in their families to attend college to persist and complete degrees. Any of the following could be causing the challenge that is “correlated” with a first-generation student not completing a degree: not understanding how to navigate college expectations; not having a strong resource network to troubleshoot issues; or feeling like an “imposter” whose lack of familiarity with campus life can lead to thinking that one does not belong in college.

If we understand what is occurring at the causal level and not simplify or misuse the concept of correlation, then we will be in a better position to design more effective interventions.  

Vigen, T. (2015). Spurious Correlations. New York, NY: Hachette Books.