This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
Interpretability Hackathon
Accepted at the 
Interpretability Hackathon
 research sprint on 
November 15, 2022

Algorithmic bit-wise boolean task on a transformer

We sought to carry out interpretability assessment on boolean logical operations solved by single layer transformers. We found that attention layer activations were interpretable for single bit/single operation and 6-bit/3 operation tasks with attention being paid to the correct tokens, i.e. those that contained information. In the 6-bit/3 operation task we additionally found that the operation was also attended to, but not as much as we had anticipated. The MLP layers appeared to have achieved significant separation in the representation of the tokens provided - not a surprising result. However there were some unexpected results in the representation of the begin sentence token and the operator in the 1bit task that we did not understand. Our best guess is that the MLP layers separate out these tokens from others as they do not have any variability or information for carrying out the task.

Catalin Mitelut, Jeremy Scheurer, Lukas Petersson, Javier Rando
4th place
3rd place
2nd place
1st place
 by peer review