Join us at the AI x Democracy research hackathon
Participate online or in-person on the weekend 3rd to 5th May in an exciting and intense AI safety research hackathon focused on demonstrating and extrapolating risks to democracy from real-life threat models. We invite researchers, cybersecurity professionals, and governance experts to join but it is open for everyone, and we will introduce starter code templates to help you kickstart your team's projects. Join at apartresearch.com/event/ai-democracy.
Join the AI Evaluation Tasks Bounty Hackathon with METR
In this collaboration between METR and Apart, you get the chance to contribute directly to model evaluations research. Take part in the Code Red Hackathon, where you can earn money, connect with experts, and help create tasks to evaluate frontier AI systems.
February 1, 2024
–
AI Security
For-profit AI Safety
AI development attracts more than $67 billion in yearly investments, contrasting sharply with the $250 million allocated to AI safety. This gap suggests there's a large opportunity for AI safety to tap into the commercial market. The big question then is, how do you close that gap?
Updated quickstart guide for mechanistic interpretability
Written by Neel Nanda, who previously worked on mech interp under Chris Olah at Anthropic, who is currently a researcher on the DeepMind mechanistic interpretability team.
February 22, 2023
–
Events
Results from the Scale Oversight hackathon
Check out the top projects from the "Scale Oversight" hackathon hosted in February 2023: Playing games with LLMs, scaling of prompt specificity, and more.
Results from the AI testing hackathon
See the winning projects from the AI testing hackathon held in December 2022: Trojan networks, unsupervised latent knowledge representation, and token loss trajectories to target interpretability methods.
November 21, 2022
–
Events
Results from the language model hackathon
See winning projects from the language model hackathon hosted November 2022: GPT-3 shows sycophancy, OpenAI's flagging is biased, and truthfulness is sensitive to prompt design.
November 17, 2022
–
Events
Results from the interpretability hackathon
Read the winning projects from the interpretability hackathon hosted in November 2022: Automatic interpretability, backup backup name mover heads, and "loud facts" in memory editing.