Correlation vs. Causation: Understand the Difference for Better Interventions

If you have taken a research methods course at some point, you may remember the mantra “correlation does not imply causation.” People say they understand the difference between a correlation and causation, but when I hear them talk, I can tell that they don’t.

As a quick refresher, correlation simply refers to what occurs when two variables co-vary together. Essentially as one variable increases so does another variable (positive correlation, see graph on the left).  Or as one variable increases another variable decreases (negative correlation, see graph on the right). On the other hand, causation can be thought of as a specialized correlation in which two variables are co-varying because of one of the variables.

Figure 1. Simplified representation of a “positive” and “negative” correlation.

The distinction between correlation and causation is clearer when we look at variables that are correlated simply by chance. For example, a correlation exists between letters in the winning word in the Scripps National Spelling Bee and deaths due to venomous spiders (Vigens, 2015). Basically, as the number of letters in the Scripps winning word increased, so too did the number of deaths that year by venomous spiders increase.

If your reaction to that correlation is that the two cannot be correlated because there is no reason for the correlation to occur, what you are actually trying to establish is a causal relationship. In which case you are correct, there is no causal relationship between these two variables.

The lack of a causal relationship is clearer when two variables are in no way conceptually related. However, a causal relationship still has not been established even when there is a correlation established between two variables that appear to be related.

Take, for example, the obvious correlation between class attendance and course performance. The two variables are correlated such that course performance tends to increase with class attendance. If we do not address the possibility of other variables, we cannot say with certainty that class attendance increases performance because class attendance could be a proxy variable for course engagement, for instance, or some other circumstance.

Why is it important to disentangle these two concepts?

Disentangling these two concepts is more than just an interesting intellectual exercise; the distinction is important to achieve optimal outcomes. For example, when making big decisions about what to do to improve student success, we have to be careful that we are pressing on the right levers that will lead to the return on investment. When we think about interventions, the more we understand about the causal variable itself, the better the intervention we will have.

Consider the prevailing understanding that first-generation students are at risk for not completing a degree. It is critical for us to understand what the causal factor is in order to figure out a better approach for helping students who are the first in their families to attend college to persist and complete degrees. Any of the following could be causing the challenge that is “correlated” with a first-generation student not completing a degree: not understanding how to navigate college expectations; not having a strong resource network to troubleshoot issues; or feeling like an “imposter” whose lack of familiarity with campus life can lead to thinking that one does not belong in college.

If we understand what is occurring at the causal level and not simplify or misuse the concept of correlation, then we will be in a better position to design more effective interventions.  

Vigen, T. (2015). Spurious Correlations. New York, NY: Hachette Books.

Obtaining Credible Evidence of “Long” Long-Term Outcomes

Another challenge that The Rucks Group team sees across projects is what we call “aspirational goals.” This phrase is how we refer to goals and objectives that will likely not occur until after a project’s grant funding ends. Many projects have them. The question is: How do you measure them?

We struggled with measuring aspirational goals until, through a conversation with another evaluator, the idea of using the transitive mathematical property to address this challenge created an “aha” moment. 

As you may (or may not) recall from math class, the transitive property is this:

If a = b, and b = c, then a = c.

We can apply this mathematical property to the evaluation of grant-funded projects as well.

If, for instance, a college receives a three-year grant to increase the number of underrepresented individuals in a non-traditional field, progress toward the goal (which is unlikely to occur within the three-year time frame when the first year will be dedicated to implementing the grant) can be gauged using a sequence of propositions that follow the logic of the transitive property:

  • Proposition A = Start with a known phenomenon that is linked to the desired outcome.

Green and Green (2003) [1] argue that to increase the number of workers in the field, the pipeline needs to be increased.

Proposition B = Establish that the project’s outcomes are linked to Proposition A.

The current project has increased the pipeline by increasing the number of underrepresented individuals declaring this field as a major.

  • Proposition C = Argue that while the project (because of time) has not demonstrated the desired outcome, based on established knowledge it likely will.

If the number of individual majors increased, assuming a similar rate of retention, then there will be more individuals graduating and prepared to work in the field.

By using the transitive property it is possible to create a persuasive evidence-based projection that by increasing the number of individuals majoring in the field and in the pipeline to become workers, the project has instigated the changes to achieve its aspirational goals.

[1] This is a fictitious citation of illustration purposes only.

When Fuzzy Wuzzy Isn’t a Bear, But What You Need to Measure

I had a professor who believed that you could measure anything, even the impact of prayer. For many, that may seem like an arrogant pronouncement, but what he was illustrating was that in measuring fuzzy constructs you have to think outside of the box (and besides, there are actual studies that have measured the impact of prayer).

In much of the work at The Rucks Group, we encounter things that are difficult to measure. We often deal with clients’ understandable angst about identifying key nebulous variables such as measuring changes in a coordinated network, the impact of adding a new role like a coach or navigator, or the impact of outreach activities to increase interest in a particular field. 

One approach to measuring difficult-to-measure constructs is through the counterfactual survey (click here to read our blog about counterfactual surveys). 

Whether or not you use a counterfactual survey when measuring difficult-to-measure variables, it is essential to build a case that the intervention is making a difference through the “preponderance of evidence.” There is rarely a single magic bullet. The evidence, instead, usually comes from multiple observable outcomes. In legal terms, it is akin to building a circumstantial case. 

With preponderance of evidence in mind, our team often talks about “telling the story” of a project. Here are two approaches for effectively “telling the story” in an evaluation context.

Incorporate mixed-methods for data gathering

Using a mixed-methods approach in an evaluation can paint a compelling picture. For instance, many of the projects we work with strive to build relationships with industry partners for their important work in curriculum development. Measuring the changes in industry partners’ involvement as well as the impact of these relationships is very challenging. However, we have found three useful ways to measure industry partnerships. They are 

    1. conversations with the project team to obtain information regarding the impact of the industry partnerships (e.g., any stories of donations, assistance in identifying instructors, etc.);
    2. data from industry partners themselves (gathered either through surveys or interviews); and 
    3. rubrics for tallying quantitative changes that result from industry partnerships. 

Incorporating multiple approaches to data gathering is one way to measure otherwise nebulous variables.

Leverage what is easily measurable

Another common challenge is measuring the broader impact of outreach activities. For one client with this goal, our team struggled to find credible evidence because outreach involved two different audiences: individuals within a grant-funded community and the larger general audience of individuals who may be interested in the work of the grant-funded community. 

For some time we really struggled with how to find an approach to demonstrate successful outreach to the general audience. As we reviewed the available data it dawned on us that we could leverage the data related to the visits to the project’s website because the grant-funded audience had a known size. We made an assumption around how many hits the website would have if the known community members were to visit it. By subtracting that number from the total website visitors, we arrived at the number we identified as the general audience of individuals from outside the grant-funded community who accessed the project’s website. 

We then employed a mixture of methods to combine our audience calculation with other data to tell a cogent story. We have used this approach for other clients, sometimes using Google searches and literature searches to find a number as a reference point. 

Hopefully these tips (along with a prayer or two to help with insight) will help the next time you’re confronted with difficult-to-measure variables.