Correlation and Causation

Correlation and causation are often misunderstood. Correlation simply shows that two observations seem to be occurring or changing together. Census Explorer allows for visualization of various trends in the US population. Through these maps, several correlations can be made. One example is the percentage of high school graduates:

One can even focus on a particular region. For instance, the counties surrounding Washington, DC are markedly different from the rest of the country:

More than 90% of residents in counties surrounding the Washington, DC are high school graduates
In fact, these counties also dramatically differ by the percentage of college graduates:

Nearly half of the residents of counties surrounding
Washington, DC are college graduates.
The high degree of educational attainment becomes much more pronounced with graduate degrees (Master's and PhD's):

The counties surrounding Washington, DC also have high percentage of residents having a Master's degree or higher.
One can look at the counties above through a different measure and see likewise how the region stands out. For example, in terms of median income, these counties are also marked with wealthier households:

The median income in the counties surrounding Washington, DC is above $90,000.
The counties also have a large percentage of workers in professional, scientific and technical fields:

One interesting feature of these counties is their high percentage of foreign-born residents.

About 1 in 4 residents in counties surrounding Washington, DC is an immigrant.
The above are examples of correlations. To establish causation, a mechanism is required to explain how one trend leads to another. One example is that immigrants in these counties take the education of their children more seriously. As a result, education attainment is higher in these regions. This is a hypothesis at this point. It still needs to be tested. Suggesting a causation therefore sometimes amounts to forming a hypothesis. There are instances, however, where the causation is already based on a theory or law of nature. These are situations in which the "cause and effect" relationship is clear. Here is an example. Global temperatures have been shown to correlate with carbon dioxide levels:

This figure visually shows the strong linear relation between the radiative forcing and the global temperature response since 1880. It is a simplified version of fig. 3a of [Lovejoy, 2014a, in Climate Dynamics] showing the 5-year running average. Above figure is copied from Yahoo Live Science.
The above is a correlation. The causation is established firmly by theories and laws in physics and chemistry: black-body radiation and how gases like carbon dioxide can absorb infrared light.

It is true that in the physical sciences, it is more common to see a correlation being presented and understood as a causation. In social sciences, like education, it is much more challenging. Sometimes, it even sounds like a "hen and egg" problem. For instance, do poor neighborhoods lead to poor schools. Or is it the other way around, poor schools leading to poor neighborhoods? More importantly, is there even a causation?


  1. So in consideration of the aforementioned problem and misconception on the technicalities of causation and correlation, are you implying that the Social Sciences is a problematic or discredited discipline?

  2. I am not implying that. Establishing correlations is important. One, however, must keep in mind the limitations. Oftentimes, people jump into conclusions.


Post a Comment