Big Data, Big Blindspots
I really liked this post on some of the more subtle problems with sampling.
Now here's where we get to the math. The logician, computer scientist, and fellow UCLA faculty Judea Pearl uses a graph theoretic approach to logic that emphasizes using counter-factual understandings to get at the underlying structure of causation. (His magnum opus is Causality. For an introduction relevant to the social sciences see Morgan and Winship.) One of Pearl's most interesting deductions is the idea of conditioning on a collider. If a case being observed is a function of two variables then this will induce an artifactual negative correlation between the variables. This is true even if in the broader population there is no correlation (or even a mild positive correlation) between the variables.Totally true. And that's assuming you have clean data to begin with. I have another great example of that, which shows how correlation really is not any kind of causation at all.