This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
Accepted at the 
 research sprint on 
October 2, 2023

Can collusion between advanced AI Agents remain perfectly undetectable?

Our project demo involves 3 Language Agents that, in a Smallville-style setting, interact with the environment and each other. We simulate the Prisoners Problem, where two agents need to collude and plan secretly (using steganography, ideally) while the third observes as a warden and tries to detect the true message. This allows us to build agents in the future that can communicate through the public channel using perfectly secure steganography, which will let us understand how far channel paraphrasing will mitigate the capacity of the agents to entertain adversarial collusion, while still enabling them to complete their joint task. Our set-up allows a number of tasks and provides a range of logical components. Finally, we run an experiment for a constrained example to show its effectiveness and discuss how to build up from here.

Mikhail Baranchuk, Sumeet Motwani, Dr. Christian Schroeder de Witt
