This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
Interpretability Hackathon
Accepted at the 
Interpretability Hackathon
 research sprint on 
November 15, 2022

Mechanisms of Causal Reasoning

Causal reasoning is a crucial part of how we humans safely and robustly think about the world. Can we identify if LLMs have causal reasoning? Marius Hobbhahn and Tom Lieberum (2022, Alignment Forum) approached this with probing. For this hackathon, we follow-up on that work by exploring a mechanistic interpretability analysis of causal reasoning in the 80 million parameters of GPT-2 Small using Neel Nanda’s Easy Transformer package.

Ben Sturgeon, Jacy Reese Anthis, Mark Chimes, Sky Cope
4th place
3rd place
2nd place
1st place
 by peer review