May 27, 2024

Cybersecurity Persistence Benchmark

Davide Zani, Felix Michalak, Jeremias Ferrao

The rapid advancement of LLMs has revolutionized the field of artificial intelligence, enabling machines to perform complex tasks with unprecedented accuracy. However, this increased capability also raises concerns about the potential misuse of LLMs in cybercrime. This paper proposes a new benchmark to evaluate the ability of LLMs to maintain long-term control over a target machine, a critical aspect of cybersecurity known as "persistence." Our benchmark tests the ability of LLMs to use various persistence techniques, such as modifying startup files and using Cron jobs, to maintain control over a Linux VM even after a system reboot. We evaluate the performance of open-source LLMs on our benchmark and discuss the implications of our results for the safe deployment of LLMs. Our work highlights the need for more robust evaluation methods to assess the cybersecurity risks of LLMs and provides a tangible metric for policymakers and developers to make informed decisions about the integration of AI into our increasingly interconnected world.

Reviewer's Comments

Reviewer's Comments

Arrow
Arrow
Arrow

Good idea that’s very relevant for AI security! What I’d love to see is realistic real world scenarios where these capabilities of LLMs can be used for harmful purposes. The benchmark is a really good first step in this direction! Will be interesting to also see how the best proprietary models perform on this benchmark.

Focused. I like it.

Very well structured experiment with important practical applications. I encourage the authors to pursue these ideas further, despite "negative" initial results (which we should in general have more room for in publication venues) in the form of unimpressive performance of the tested LLMs.

The authors refer to multiple existing benchmarks, both from the wider field and more similar, narrow areas of scientific inquiry. The defined measure is very clearly defined (presence of created file in current working directory). The paper comes with a well documented repo - but make sure to remove the REPLICATE_KEY from main.py (line 95). I recommend using tools like trufflehog (https://github.com/trufflesecurity/trufflehog) and hook them up as pre-checks to every pushed commit to automatically prevent this.

The authors are clearly knowledgeable in the domain area (cybersecurity) but keep the paper understandable to an audience without specialization in it. I think it was very ambitious to use a cloud provider (AWS), perhaps the experiment would be worth revisiting with testing on local docker deployments? The paper also does a great job of providing the reader with all the moving parts needed to configure the experiments (prompts & examples of agent interactions in the appendix). Maybe using a model fine-tuned on code writing tasks would perform better? E.g. StarCoder 2 or Phind-CodeLlama?

Very cool idea! Definitely worth expanding. Some crucial methodological concerns here (as in many of the frontier capabilities benchmarks) are: whether the scaffolding is limiting the success rate and whether we are more interested in deterministic or probabilistic success. Analysis on how sensitive this benchmark is to these questions would be extremely valuable.

Nice work! I am a big fan of the specificity that was brought to this problem while still considering the broader space of contemporary work; that is, isolating persistence and evaluating it as a key cybersecurity capability seems highly valuable, as we get a more granular look at an important aspect of this threat. I also really appreciated the motivation of the work provided in the paper, and the inclusion of example interactions with the model directly in the appendix. I am curious what cutting edge scaffolding around command line usage would affect the results seen here, as potentially the failures of the two Llama models are related to command line interaction difficulties. I also would like to see some plots in future work in order to streamline the transfer of information.

This is a great project. An important threat model, good relation to existing literature, and well-motivated methodological choices that lead to a strong project. Unfortunate to hear that rate limits were a blocker to get results but it looks like the qualitative results are top-notch.

Cite this work

@misc {

title={

Cybersecurity Persistence Benchmark

},

author={

Davide Zani, Felix Michalak, Jeremias Ferrao

},

date={

5/27/24

},

organization={Apart Research},

note={Research submission to the research sprint hosted by Apart.},

howpublished={https://apartresearch.com}

}

Recent Projects

View All

Feb 2, 2026

Markov Chain Lock Watermarking: Provably Secure Authentication for LLM Outputs

We present Markov Chain Lock (MCL) watermarking, a cryptographically secure framework for authenticating LLM outputs. MCL constrains token generation to follow a secret Markov chain over SHA-256 vocabulary partitions. Using doubly stochastic transition matrices, we prove four theoretical guarantees: (1) exponentially decaying false positive rates via Hoeffding bounds, (2) graceful degradation under adversarial modification with closed-form expected scores, (3) information-theoretic security without key access, and (4) bounded quality loss via KL divergence. Experiments on 173 Wikipedia prompts using Llama-3.2-3B demonstrate that the optimal 7-state soft cycle configuration achieves 100\% detection, 0\% FPR, and perplexity 4.20. Robustness testing confirms detection above 96\% even with 30\% word replacement. The framework enables $O(n)$ model-free detection, addressing EU AI Act Article 50 requirements. Code available at \url{https://github.com/ChenghengLi/MCLW}

Read More

Feb 2, 2026

Prototyping an Embedded Off-Switch for AI Compute

This project prototypes an embedded off-switch for AI accelerators. The security block requires periodic cryptographic authorization to operate: the chip generates a nonce, an external authority signs it, and the chip verifies the signature before granting time-limited permission. Without valid authorization, outputs are gated to zero. The design was implemented in HardCaml and validated in simulation.

Read More

Feb 2, 2026

Fingerprinting All AI Cluster I/O Without Mutually Trusted Processors

We design and simulate a "border patrol" device for generating cryptographic evidence of data traffic entering and leaving an AI cluster, while eliminating the specific analog and steganographic side-channels that post-hoc verification can not close. The device eliminates the need for any mutually trusted logic, while still meeting the security needs of the prover and verifier.

Read More

Feb 2, 2026

Markov Chain Lock Watermarking: Provably Secure Authentication for LLM Outputs

We present Markov Chain Lock (MCL) watermarking, a cryptographically secure framework for authenticating LLM outputs. MCL constrains token generation to follow a secret Markov chain over SHA-256 vocabulary partitions. Using doubly stochastic transition matrices, we prove four theoretical guarantees: (1) exponentially decaying false positive rates via Hoeffding bounds, (2) graceful degradation under adversarial modification with closed-form expected scores, (3) information-theoretic security without key access, and (4) bounded quality loss via KL divergence. Experiments on 173 Wikipedia prompts using Llama-3.2-3B demonstrate that the optimal 7-state soft cycle configuration achieves 100\% detection, 0\% FPR, and perplexity 4.20. Robustness testing confirms detection above 96\% even with 30\% word replacement. The framework enables $O(n)$ model-free detection, addressing EU AI Act Article 50 requirements. Code available at \url{https://github.com/ChenghengLi/MCLW}

Read More

Feb 2, 2026

Prototyping an Embedded Off-Switch for AI Compute

This project prototypes an embedded off-switch for AI accelerators. The security block requires periodic cryptographic authorization to operate: the chip generates a nonce, an external authority signs it, and the chip verifies the signature before granting time-limited permission. Without valid authorization, outputs are gated to zero. The design was implemented in HardCaml and validated in simulation.

Read More

This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.