Researching solutions for AI Safety

Explore our projects, publications and pilot experiments

Safe artificial intelligence

Publishing rigorous empirical work for safe AI: evaluations, interpretability and more.

Research index

Novel approaches

Our research is underpinned by novel approaches focused on neglected topics.

Our approach

Pilot experiments

Apart Sprints have kickstarted hundreds of pilot experiments in AI Safety.

Pilot experiments

Highlights

Factual model editing techniques don't edit facts

Model editing techniques can introduce unwanted side effects in neural networks not detected by existing benchmarks.
 
J. Hoelscher-Obermaier, J Persson, E Kran, I Konstas, F Barez. Detecting Edit Failures In Large Language Models: An Improved Specificity Benchmark. ACL 2023

Neuroplasticity in language models

Advances in model editing promise to remove undesirable concepts from LLMs by pruning neurons. Can models actually recover unwanted concepts after pruning, or not? We take a closer look in this paper.
 
M Lo, SB Cohen, F Barez. Large Language Models Relearn Removed Concepts. ACL 2024

Our research focuses on critical research paradigms in AI Safety. We utilize interpretability to understand how AIs make decisions; process supervision techniques to ensure models behave as intended; benchmark tasks evaluate safety and alignment; and more. Apart produces foundational research enabling the safe and beneficial development of advanced AI.

Research index

341 Apart Sprint pilot experiments