AI safety & Security

Apart Research

Artificial intelligence will change the world. Our mission is to ensure this happens safely and to the benefit of everyone.

Finding Deception in Language Models

Read
Apart > Research

Foundational research for safe and beneficial advanced AI

Apart > Sprints

Global hackathons in AI safety for aspiring and established researchers

Apart > Lab

Incubating talented research teams towards real-world impact

With partners and collaborators from

Published research

We aim to produce foundational research enabling the safe and beneficial development of advanced AI.

Google Scholar

Interpreting Context Look-ups in Transformers: Investigating Attention-MLP Interactions

Clement Neo*, Shay B. Cohen, Fazl Barez*

Increasing Trust in Language Models through the Reuse of Verified Circuits

Philip Quirke, Clement Neo, Fazl Barez

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Evan Hubinger et al.

Large Language Models Relearn Removed Concepts

Michelle Lo*, Shay Cohen, Fazl Barez

See all publications

Luke Marks*, Amir Abdullah*, Luna Mendez, Rauno Arike, Philip Torr, Fazl Barez

arXiv
[Read Publication}

Interpreting Reward Models in RLHF-Tuned Language Models Using Sparse Autoencoders

Alex Foote*, Neel Nanda, Esben Kran, Ionnis Konstas, Shay Cohen, Fazl Barez*

RTML workshop at ICLR 2023
[Read Publication}

Neuron to Graph: Interpreting Language Model Neurons at Scale

Albert Garde*, Esben Kran*, Fazl Barez

NeurIPS 2023 XAI in Action Workshop
[Read Publication}

DeepDecipher

Philip Quirke, Fazl Barez

arXiv
[Read Publication}

Understanding Addition in Transformers

Michael Lan, Fazl Barez

arXiv
[Read Publication}

Locating Cross-Task Sequence Continuation Circuits in Transformers

Michelle Lo*, Shay Cohen, Fazl Barez

[Read Publication}

Neuroplasticity in LLMs

Clement Neo*, Shay B. Cohen, Fazl Barez*

arXiv
[Read Publication}

Interpreting Context Look-ups in Transformers: Investigating Attention-MLP Interactions

Detecting Edit Failures In Large Language Models: An Improved Specificity Benchmark

Jason Hoelscher-Obermaier, Julia Persson, Esben Kran, Ionnis Konstas, Fazl Barez

ACL 2023
[Read Publication}

Sleeper Agents

Evan Hubinger et al.

Anthropic
[Read Publication}

Research Augmentation Hackathon: Supercharging AI Alignment

This event concluded on
Jul 29, 2024
with
entries from
signups
Join us for an exhilarating weekend at the Research Augmentation Hackathon, where we'll develop innovative tools and methods to boost AI safety research progress by 5x or even 10x with new research tools!
Jul 26
to
Jul 29, 2024
26
Jul
26
Jul
26
Jul

Research Augmentation Hackathon: Supercharging AI Alignment

Independently organized SprintX
📈 Accelerating AI Safety
26
Jul
Canceled

Research Augmentation Hackathon: Supercharging AI Alignment

Research Augmentation Hackathon: Supercharging AI Alignment

Independently organized SprintX
📈 Accelerating AI Safety
30
Aug
30
Aug
30
Aug

AI Safety Deep Tech Startup Hackathon

Independently organized SprintX
📈 Accelerating AI Safety
Virtual & local
30
Aug
Canceled

AI Safety Deep Tech Startup Hackathon

AI Safety Deep Tech Startup Hackathon

Independently organized SprintX
📈 Accelerating AI Safety
Virtual & local
13
Sep
13
Sep
13
Sep

Agent Security Hackathon: Expanding our Knowledge

Independently organized SprintX
📈 Accelerating AI Safety
Virtual & local
13
Sep
Canceled

Agent Security Hackathon: Expanding our Knowledge

Agent Security Hackathon: Expanding our Knowledge

Independently organized SprintX
📈 Accelerating AI Safety
Virtual & local

Welcome to our Accelerating AI Safety Sprint Season! Here, we'll focus on building concrete solutions and tools to solve key questions in AI safety. Join our exciting hackathons to develop new research in deep tech AI safety, agent security, research tooling, and policy.

The Flagging AI Risks Sprint Season has finished and ran from March to June 2024 after four research hackathons focused on catastrophic risk evaluations of AI. A cohort of researchers has been invited to continue their research from these.

What does Apart do?

We solve high-impact, neglected, and tractable problems in AI safety

Field-building for AI safety

Our initiatives allow people from diverse backgrounds to have an impact on AI safety. On 7 continents, more than 250 projects have been developed by over 1,200 researchers. Some teams have gone on to publish their research at major academic venues, such as NeurIPS, ICML, and ACL.

High-impact research

We engage in both original research and contracting for research projects that aim to translate academic insights into actionable strategies for mitigating catastrophic risks associated with AI. We have co-authored with researchers from the University of Oxford, DeepMind, Anthropic, and more.

A vision for the future

Our aim is to foster a positive vision and an action-focused approach to AI safety. With projects driven by fundamental innovation to the research process inspired by over 10 years of experimenting with research processes, solving large problems has become our specialization.

Apart Research

Get involved

Check out the list below for ways you can interact or research with Apart!

Let's have a meeting!

You can book a meeting here and we can talk about anything between the clouds and the dirt. We're looking forward to meeting you.

I would love to mentor research ideas

We have a design where ideas are validated by experts on the website. If you would like to be one of these experts, write to us here. It can be a huge help for the community!

Get updated on A*PART's work

Blog & Mailing list

The blog contains the public outreach for A*PART. Sign up for the mailing list below to get future updates.

People

Members

Central committee board

Associate Kranc
Head of Research Department
Commanding Center Management Executive

Partner Associate Juhasz
Head of Global Research
Commanding Cross-Cultural Research Executive

Associate Soha
Commanding Research Executive
Manager of Experimental Design

Partner Associate Lækra
Head of Climate Research Associations
Research Equality- and Diversity Manager

Partner Associate Hvithammar
Honorary Fellow of Data Science and AI
P0rM Deep Fake Expert

Partner Associate Waade
Head of Free Energy Principle Modelling
London Subsidiary Manager

Partner Associate Dankvid
Partner Snus Executive
Bodily Contamination Manager

Partner Associate Nips
Head of Graphics Department
Cake Coding Expert

Honorary members

Associate Professor Formula T.
Honorary Associate Fellow of Research Ethics and Linguistics
Optimal Science Prediction Analyst

Alumni

Partner Associate A.L.T.
Commander of the Internally Restricted CINeMa Research
Keeper of Secrets and Manager of the Internal REC

Contact

Get in touch

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Apart Updates

Sign up for our updates

Follow the latest from the Apart Community and stay updated on our research and events.