July 4, 2023

From Sparse to Dense: Refining the MACHIAVELLI Benchmark for Real-World AI Safety

In this paper, we extend the MACHIAVELLI framework by incorporating sensitivity to event density, thereby enhancing the benchmark's ability to discern diverse value systems among models. This enhancement enables the identification of potential malicious actors who are prone to engaging in a rapid succession of harmful actions, distinguishing them from well-intentioned actors.

Heramb Podar, Vladislav Bargatin
