Apart > Blog

Blog

February 1, 2024
 – 
AI Security

AI safety needs to scale, and here's how you can do it

AI development attracts more than $67 billion in yearly investments, contrasting sharply with the $250 million allocated to AI safety. This gap suggests there's a large opportunity for AI safety to tap into the commercial market. The big question then is, how do you close that gap?
July 13, 2023
 – 
Guides

Updated quickstart guide for mechanistic interpretability

Written by Neel Nanda, who previously worked on mech interp under Chris Olah at Anthropic, who is currently a researcher on the DeepMind mechanistic interpretability team.
February 22, 2023
 – 
Events

Results from the Scale Oversight hackathon

Check out the top projects from the "Scale Oversight" hackathon hosted in February 2023: Playing games with LLMs, scaling of prompt specificity, and more.
January 2, 2023
 – 
Events

Results from the AI testing hackathon

See the winning projects from the AI testing hackathon held in December 2022: Trojan networks, unsupervised latent knowledge representation, and token loss trajectories to target interpretability methods.
November 21, 2022
 – 
Events

Results from the language model hackathon

See winning projects from the language model hackathon hosted November 2022: GPT-3 shows sycophancy, OpenAI's flagging is biased, and truthfulness is sensitive to prompt design.
November 17, 2022
 – 
Events

Results from the interpretability hackathon

Read the winning projects from the interpretability hackathon hosted in November 2022: Automatic interpretability, backup backup name mover heads, and "loud facts" in memory editing.