When Learning Causal Inference Almost Broke My Brain

Apr 22, 2025

I spent most of my early training thinking I was doing epidemiology wrong.

It started during my first ever statistics class, an introductory course in undergrad as a psych major, taught with the usual disclaimers like correlation is not causation, regression is not proof, and never say “X causes Y” unless you ran an experiment. These warnings were delivered like commandments, and for good reason. But the deeper I got into psychology, the more I noticed something strange: journal articles were full of causal claims dressed up in correlation’s clothing. “Associated with,” “linked to,” “predicts”, but never “causes.” Everyone was doing causal inference, but no one was saying it out loud.

Then I picked up Judea Pearl’s The Book of Why as an undergrad psych major with just that one stats course under my belt. And honestly? I loved it. Pearl made causal inference feel like the secret physics of the social world. Just draw the right graph, follow the arrows, block the backdoor paths, and you’d unlock the hidden structure of everything from cancer risk to college admissions.

It felt rigorous. It felt powerful. It felt like the future of statistics.

By the time I started my Master’s, I was convinced that any serious epidemiologist needed to speak the language of g-methods, inverse probability weighting, and marginal structural models. I wasn’t just trying to understand these methods, I was trying to figure out which one I was supposed to use and when. I assumed they all did different things. That IPW was for one kind of study, g-computation for another, TMLE for the pros. It took me a long time to realize they’re basically all doing the same thing: trying to estimate counterfactual outcomes under assumptions that rarely hold in observational data.

Do-calculus ≈ g-methods ≈ inverse probability weighting. Different notation, same promise: that if you can adjust for the right variables, you can turn your messy dataset into a “pseudo” RCT. And that’s when the fog rolled in.

Because I started noticing how often the conditions required for these methods just didn’t exist in the real world. You need to have measured every relevant confounder, modeled every relationship correctly, and somehow avoided measurement error, missing data, and sample size limitations. You also needed counterfactual thinking, but each counterfactual would have it’s own version of the confounders that may or may not be equal to that expressed in the real-world data. But that aspect of these methods seemed to be papered over with mathematical proofs and confidence.

That’s when I found Noel Weiss of University of Washington (and adjunct at UCLA, where I was studying).

His textbook (written with Thomas Koepsell) didn’t feel like it was selling me a framework or a revolution. It felt like someone handing me a flashlight and saying, “Look here, this is the part that matters.” He didn’t dismiss causal inference, but he didn’t pretend it was magic either. His explanation of inverse probability weighting was the first time I felt like someone was being honest about what these methods could and couldn’t do.

And maybe more importantly: it was the first time I felt like I wasn’t doing epidemiology wrong just because I hadn’t memorized the formula for inverse probability of censorship weighting.

The Allure of Causal Inference

It tells you that you can finally say what you’ve been wanting to say all along: X causes Y. You no longer have to hide behind “associated with,” or carefully wordsmith your abstract to avoid reviewer wrath. With the right graph and the right adjustment set, you can say the thing directly. And when you’re a student learning methods for the first time, especially after years of being told that correlation isn’t causation, it felt like a revolution.

That’s what pulled me in. Not just the elegance of the math that I struggled with at the time, but the clarity of the promise: that we could move beyond inference as a vague gesture and into something that looked like scientific certainty.

Even the names of the methods sound like tools from a high-trust discipline. Inverse probability weighting. Targeted maximum likelihood estimation. Doubly robust estimators. The terminology projects seriousness, like you’re performing an operation on a very fragile truth. Compare that to “linear regression,” which sounds like something you do in Excel.

And in theory, these frameworks are beautiful. They force you to think more carefully about what you’re actually estimating. They help you define a causal question. They ask you to make your assumptions visible (something that traditional regression, to its detriment, often lets you ignore).

But the more I studied them, the more I realized how detached they can become from reality. Not because they’re wrong, but because the assumptions are so brittle. You don’t just need to measure some confounders. You need to measure all of them correctly, precisely, and without error. You need the treatment assignment to be conditionally independent of the outcome given your covariates. You need enough overlap to avoid violating positivity. You need a well-defined intervention, one that could exist in the real world, not just in the mind of a statistical modeler.

And here’s the kicker: you have to assume that your model is right. Not just in its structure, but in how it captures reality. Which means that even when you’re doing the “right” thing mathematically, you may be leaning harder on your data than your data can support.

Still, I kept thinking: if I just learned enough of the theory, just internalized the assumptions and formulas and conditions, maybe I’d start to see which method fit which problem. Maybe I’d finally feel like I knew what I was doing.

Spoiler: that clarity never came. Not from the methods themselves.

The Method Anxiety Spiral

I didn’t get clarity from learning causal inference. I got hesitation.

The more I read, the less I felt confident in using any method at all. Regression started to feel naive. Matching felt incomplete. Inverse probability weighting seemed both too fragile and too good to be true. I had learned enough to know how the methods worked—but not enough to believe any of them could actually deliver what they promised, at least not with the kind of data most of us use.

Suddenly, everything felt like a methodological trap.

I’d open a dataset to practice the methods in R and start by asking the basic epidemiologic questions. What’s the exposure, what’s the outcome, what’s confounding the relationship? But before I even got to model building, I’d be in my head: Should I use g-computation? Is there time-varying confounding? Is this estimand even identifiable? Can I emulate a target trial here? Do I need inverse probability of censoring weights on top of the IP treatment weights?

And this was before I even started coding.

I was stuck in a kind of methodological purgatory, convinced that there was a “correct” way to do causal inference, and equally convinced that I didn’t have the data to do it. So instead, I just hovered there, flipping between textbooks and Twitter threads, looking for answers that weren’t coming.

And all the while, I ignored the thing that actually mattered: whether the simpler model I already knew how to use would have answered the question well enough.

The irony is that I didn’t actually need to do a causal inference analysis for most of my work. I wasn’t studying time-varying treatments or trying to recover dynamic treatment effects across marginal subgroups. I wasn’t emulating a trial. I was asking reasonable questions using decent data. Data where the assumptions behind a well-specified regression were, if anything, more transparent and defensible than those hiding inside a multi-layered causal pipeline.

But once you’re immersed in the culture of causal inference, simple starts to feel like failure. Like if you’re not using TMLE, you’re not being rigorous. Like every regression coefficient is suspect until proven robust under every possible DAG you could’ve drawn.

That’s when I realized: causal inference didn’t just make me confused. It made me worse. It made me doubt solid answers because they weren’t complex enough. It made me afraid of being accused of not thinking hard enough when in fact, I was thinking so hard I had forgotten to just do the damn science.

The Return to Simplicity

Eventually, I gave up trying to find the “right” method and just opened NHANES.

I had a basic question: Was higher sugar-sweetened beverage consumption associated with worse self-reported cognitive functioning in older adults?

Nothing fancy. The exposure was self-reported soda consumption, and the outcome was a continuous composite score, and while they’re not perfect, they’re usable.

I had a decent covariate set: age, sex, education, income, smoking, BMI, depression screener scores. All variables that could reasonably confound the relationship between diet and cognition.

I spent way too long debating how to model it. Should I match on the propensity score for high soda intake? Should I construct inverse probability of treatment weights? Should I try to define a hypothetical intervention and emulate a trial where people reduced their intake by one serving per day?

Eventually, I just ran a linear model. And it worked. The results were interpretable. The coefficients made sense. I checked residuals, fit diagnostics, potential multicollinearity. Nothing fancy, but it did the job. No bootstrapped TMLE. No double machine learning. No twenty-page supplement explaining weight truncation.

It wasn’t a causal estimate in the strict sense. But neither would IPW have magically made it so. Not with noisy self-reported intake. Not with residual confounding from things like early-life adversity, neighborhood-level stressors, or sleep quality dynamics that are either imperfectly measured or missing altogether in most NHANES waves.

What I got from running that model was clarity. And after months of chasing the perfect method, that felt like a win.

Causal Inference Isn’t the Enemy. But It’s Not the Answer Either

This isn’t a rant against causal inference. It’s a rant against the idea that causal inference methods are a simple gateway to causal truth.

Because they’re not. They’re just tools. Sophisticated, flexible, sometimes useful, but also brittle, assumption-heavy, and easily misused. What they don’t do is rescue you from a poorly measured exposure, a vague research question, or a confounded design dressed up in mathematical formalism.

That doesn’t mean we should throw them out. If anything, learning causal inference methods made me better at spotting weak claims masquerading as strong ones. It helped me understand why certain adjustments matter, what kinds of bias can creep in, and when a question simply isn’t answerable with the data at hand.

But it also made me realize something else: you don’t need to turn every study into a pseudo-RCT just to feel like you’re doing “real science.” You don’t need to invoke TMLE or a target trial every time you’re estimating an association. Sometimes, a simple regression—carefully specified, transparently reported, grounded in a clear question, is not only sufficient but superior. Not just because it’s simpler, but because it’s honest.

From my reading of his work, Weiss got this. He never said causal inference was impossible. He just didn’t pretend it was guaranteed. And maybe that’s the lesson here. Not that causal inference is wrong but that the way it’s often taught, talked about, and deployed can lead smart(ish) people into a fog of overcomplication and performative rigor.

I didn’t need more methods. I needed better judgment. And no method, causal or otherwise, could give me that.

Warren Coons

Apr 23

Hey Devin!

I’ve been seeing your blogs and this one definitely piqued my interest (you know me). I’m glad you’ve read the Book of Why, it definitely makes causal inference a lot more intuitive. Everything does feel a bit arbitrary when you don’t have the 1000-foot overview.

What I’ve been thinking about a lot is an issue with some of the language we use in epidemiology. It seems like there isn’t much of a middle ground between association and causation. Association feels too weak (any two things can be associated) but causation feels too strong (there is no way of knowing if there are other confounders that haven’t been measured). The problem with this is that we do all of this work and then there does not seem to be a very nice way of summarizing everything when there still is that unknown bias. What I’m thinking is that we should start concluding that things are “plausibly causal” or that it is “reasonable to believe” that two things can be causally related. In this sense you’re not explicitly saying that two things are causal (because bias is inevitable) but it’s a step beyond association.

At the end of the day, we are all paid to help clinicians and policymakers make decisions about people’s health. Believing in causation is good enough as far as I’m concerned.

Expand full comment

1 reply by Devin Teichrow

1 more comment...

The Edge of Epidemiology

Discussion about this post