The hackathon is happening right now! Join by signing up below and be a part of our community server.
Apart > Sprints

Agent Security Hackathon

--
Signups
--
Entries
October 4, 2024 4:00 PM
 to
October 7, 2024 3:00 AM
 (UTC)
Hackathon starts in
--
Days
--
Hours
--
Minutes
--
Seconds
Sign upSign up
This event is finished. It occurred between 
October 4, 2024
 and 
October 7, 2024

What is Agent Security?

While much AI safety research focuses on large language models, the AI systems being deployed in the real world are far more complex. Enter the realm of Agents — sophisticated combinations of language models and other programs that are reshaping our digital world.

During this hackathon, we'll put our research skills to the test by diving deep into the world of agents:

  • What are their unique safety properties?
  • Under what conditions do they fail?
  • How do they differ from raw language models?

Join us to uncover the unknowns of agent security by signing up below!

Why Agent Security Matters

The development of AI has brought about systems capable of increasingly autonomous operation. AI agents, which integrate large language models with other programs, represent a significant step in this evolution. These agents can make decisions, execute tasks, and interact with their environment in ways that surpass traditional AI systems.

This progression, while promising, introduces new challenges in ensuring the safety and security of AI systems. The complexity of agents necessitates a reevaluation of existing safety frameworks and the development of novel approaches to security. Agent security research is crucial because it:

  • It ensures AI agents act in alignment with human values and intentions
  • It prevents potential misuse or manipulation of AI systems
  • It protects against unintended consequences of autonomous decision-making
  • It builds trust in AI technologies, helping responsible adoption in society

What to Expect:

During this hackathon, you'll have the opportunity to:

  • Collaborate with like-minded individuals passionate about AI safety
  • Receive continuous feedback from mentors and peers
  • Attend inspiring HackTalks and keynote speeches
  • Participate in office hours with established researchers
  • Develop innovative solutions to real-world AI security challenges
  • Network with experts in the field of AI safety and security
  • Contribute to groundbreaking research in agent security

Submission

You will join in teams to submit a report and code repository of your research from this weekend. Established researchers will judge your submission and provide reviews following the hackathon.

What is it like to participate?

Doroteya Stoyanova, Computer Vision Intern

I learnt so much about AI Safety and Computation Mechanics. It is a field I never heard of, and it combines two of my interests - AI, and Physics. Through the hackathons I gained valuable connections, learnt a lot by researchers, people with a lot of experience and this will help me in my research-oriented career-path.

Kevin Vegda, AI Engineer

I loved taking part in the AI Risk Demo-Jam by Apart Research and LISA. It was my first hackathon ever. I greatly appreciate the ability of the environment to churn out ideas as well as to incentivise you to make demo-able projects that are always good for your CV. Moreover, meeting people from the field gave me an opportunity to network and maybe that will help me with my career.

Mustafa Yasir, The Alan Turing Institute

[The technical AI safety startups hackathon] completely changed my idea of what working on 'AI Safety' means, especially from a for-profit entrepreneurial perspective. I went in with very little idea of how a startup can be a means to tackle AI Safety and left with incredibly exciting ideas to work on. This is the first hackathon in which I've kept thinking about my idea, even after the hackathon ended.

Winning the hackathon!

You have a unique chance to win during this hackathon! With our expert panel of judges, we'll review your submissions on the following criteria:

  1. Agent safety: Does the project move the field of agent safety forward? After reading this, do we know more about how to detect dangerous agents, protect against dangerous agents, or build safer agents than before?
  2. AI safety: Does the project solve a concrete problem in AI safety? If this project is fully realized, would we expect the world with superintelligence to be a safer (even marginally) than yesterday?
  3. Methodology: Is the project well-executed and is the code available so we can review it? Do we expect the results to generalize beyond the specific case(s) presented in the submission?

Join us

This hackathon is for anyone who is passionate about AI safety and secure systems research. Whether you're an AI researcher, developer, entrepreneur, or simply someone with a great idea, we invite you to be part of this ambitious journey. Together, we can build the tools and research needed to ensure that agents develop safely.

By participating in the Agent Security Hackathon, you'll:

  • Gain hands-on experience in cutting-edge AI safety research
  • Develop valuable skills in AI development, security analysis, and collaborative problem-solving
  • Network with leading experts and potential future collaborators in the field
  • Contribute to solving one of the most pressing challenges in AI development
  • Enhance your resume with a unique and highly relevant project
  • Potentially kickstart a career in AI safety and securityWhether you're an AI researcher, developer, cybersecurity expert, or simply passionate about ensuring safe AI, your perspective is valuable. Let's work together to build a safer future for AI!

Image source.

Speakers & Collaborators

Archana Vaidheeswaran

Archana is responsible for organizing the Apart Sprints, research hackathons to solve the most important questions in AI safety.
Organizer

Esben Kran

Esben is the co-director of Apart Research and specializes in organizing research teams on pivotal AI security questions.
Organizer

Jason Schreiber

Jason is co-director of Apart Research and leads Apart Lab, our remote-first AI safety research fellowship.
Organizer

Astha Puri

Astha Puri has over 8 years of experience in the data science field. Astha works on improving the search and recommendation experience for customers for a Fortune 10 company.
Judge

Sachin Dharashivka

Sachin Dharashivkar is the CEO of TrojanVectors, a company specializing in security for RAG chatbots and AI agents.
Judge

Ankush Garg

Ankush is a Research Engineer at Meta AI's Llama team, working on pre-training Large Language Models. Before that he spent 5 years at Google Brain/ Deepmind and brings extensive ex
Judge

Abhishek Harshvardhan Mishra

Abhishek is an independent hacker and consultant. who specializes in building code generation and roleplay models. He is the creator of evolSeeker and codeCherryPop models.
Judge

Andrey Anurin

Andrey is a senior software engineer at Google and a researcher at Apart, working on automated cyber capability evaluation and capability elicitation.
Judge

Pranjal Mehta

Pranjal Mehta, a tech entrepreneur and IIT Madras graduate, is the co-founder of Neohumans.ai, a venture-backed startup. He is an active angel investor and mentor.
Judge

Samuel Watts

Sam is the product manager at Lakera, the leading GenAI security platform. Lakera develops AI security and safety guardrails that best serve startups & enterprises.
Keynote Speaker

Jaime Raldua

Jaime has 8+ years of experience in the tech industry. Started his own data consultancy to support EA Organisations and currently works at Apart Research as Research Engineer.
Organiser and Judge
If [AI] agents advance to a level of intelligence surpassing human capabilities and develop ambitions, they could potentially attempt to seize control of the world, resulting in irreversible consequences for humanity.
-
"The Rise and Potential of LLM-Based Agents"

AI agents are "robots in cyberspace" (He et al. 2024), systems with a brain that orchestrates actions from perception.

  • A Discord bot receives every message on a server (Perception), decides whether the message is spam (Brain), and deletes the message (Action)
  • A cyber operative bot receives orders to find all vulnerabilities on Danish government websites (Perception), decides a course of action to 1) map out government websites, 2) categorize potential vulnerabilities based on the tech stack used, and 3) test each potential vulnerability using a cyber offense tooling suite (Brain), and starts an Action-Perception loop to fulfill the plan (Action -> Perception -> Brain -> Action).
Overview of an agent. Adapted from Xi et al. (2023)

As we explore agent safety during this hackathon, our work will ensure that the world is safe from high-risk agent deployments and one of the main risks to avoid is the possibility that AI agents "go rogue" (Bengio 2023, He et al. 2024) - that they take autonomous actions outside the oversight of humans and cause catastrophic damage or otherwise disenfranchise society.

Required reading:

Optional reading:

Ideas & inspiration

There's a few ways to approach this problem: 1) we implement algorithms that make agents verifiably safe or 2) we evaluate when an agent is worrying us and build automatic capabilities to shut it down. Let's make a few ideas for each of these categories:

With "building safer agents" and "improving the deployment infrastructure to support control and safety" in mind:

  • Agent sandboxes: Create an open source agent sandbox (e.g. with Docker) that logs all interactions with your system. Procedurally implement better and better tracking and automated oversight of actions that an agent takes, e.g. automatically identifying high-vulnerability areas of your computer that it interacts with. This improves local and remote oversight of agent systems.
  • Automatic shutdown - latent space deserts: Create an automated functionality to shut down an agent by guiding reasoning down an "empty" part of the latent space when jailbreaks are used or when it begins acting in a specific domain, such as cyber offense.
  • Automatic shutdown - control systems: Develop a system that uses e.g. the agent sandbox to monitor actions and automatically encrypts and destroys the agent when a verifiable dangerous, misaligned, or otherwise rogue action is taken.
  • High-risk tooling control: Develop a permissions system for state-of-the-art agents similar to user permissions in bash tools. With agents using tools, it is likely that the permissions need to be more complex and we might want to integrate a permissions recommendation system into the permissions tool.
  • Formalize agent action: Right now, agents are very diverse. We don't have standardized frameworks to study or implement them. At best, we can use API standards for agents like LangChain. At worst, every single agent is developed with precise specifications for their domain. Is there a chance that we can formalize how an agent should work to a degree similar to the internet? Communication protocols, packet management, sessions, OpenSSL, SSH, sFTP, etc...
  • Agent-tool interfaces: One of the most dangerous aspects of agents is how we can give them any tool. An agent with a chat interface isn't as dangerous as an agent with a gun. How can we create standardized tool use protocols that defines key permissions and makes it clear what an agent is able to do if it wants.
  • Website agent control tools: Today, thousands of agents crawl the web to scrape content. Can we develop tools that we can easily deploy on our website (e.g. simple JS) that monitors and reports agent activity on our website?

With "detecting and monitoring catastrophic risk from agents" in mind:

  • Agent tech tree: Develop an overview of all the capabilities that agents are currently able to do and help us understand where they fall short of dangerous abilities.
  • Challenge creation: Create challenges that test for abilities that are currently not tested for but that seem worrying. Apart has a cyber capabilities evaluation project and both METR and AISI invite you to send them challenges that are compatible with inspect (a pretty great framework) or the task-standard.
  • Agent testing tooling: Much of agent tooling is currently disparate and primitive with some exceptions such as inspect currently in development. If you develop a tool to interface with inspect even better or a test creation tool, this might speed up agent testing significantly and democratize research (see e.g. Vivaria).
  • Other dangerous capabilities: A lot of work is currently happening in testing for cyber abilities, autonomy, and more. However, we might be missing crucial information in terms of risks from military drones or targeting planning agents.
  • Understanding reasoning: A large part of agent capability is the introduction of reasoning and planning. At the moment, we are severely lacking understanding of what goes on in this process and testing for it will be valuable (read more).
  • Differences in reasoning between agents and chatbots: How does "the brain" (the LLM) of agents change its functionality when it is put in different contexts? This is related to prompt robustness and prompt sensitivity but also involves the action space an agent acts within.
  • Red-teaming agents: Find the most used agent on the market today and try to break it.
  • Detection-in-the-wild - detection agents / detection traps / honeypots: Develop ways for agents in the wild to be detected and monitored by external agents.
  • Detection-in-the-wild - police agents: With agents detected from the project above, we probably want to be able to take action against them. This might involve automatically notifying the website owner, reporting specific agents to the police, or submitting support tickets for dysfunctional agents. Can we create police agents that respect privacy, are benign, and useful while defending the internet against malicious agents?

Let's make agents safe.

See the updated calendar and subscribe

The schedule runs from 4 PM UTC Friday to 3 AM Monday. We start with an introductory talk and end the event during the following week with an awards ceremony. Join the public ICal here.You will also find Explorer events, such as collaborative brainstorming and team match-making before the hackathon begins on Discord and in the calendar.


📍 Registered jam sites

Beside the remote and virtual participation, our amazing organizers also host local hackathon locations where you can meet up in-person and connect with others in your area.

Agent Security Hackathon @ UC Chile

If your're in Santiago and want to join us message @weibac on Telegram

Agent Security Hackathon @ UC Chile

Agent Security Hackathon: Expanding our Knowledge

Agent Security Hackathon: Expanding our Knowledge

Agent Security Hackathon: Expanding our Knowledge

We look forward to welcoming you at the EA Hotel: York Street 36, Blackpool, UK. Here, you will find a cozy bed, good food, and a merry little community of aspiring effective altruists.

Agent Security Hackathon: Expanding our Knowledge

Apart Research Hackathon at LEAH

Join us for a remote location of the next Apart Research Hackathon on Agent Security at the LEAH office in Farringdon, London.

Apart Research Hackathon at LEAH

iNBest AI safety group

Av. Unión #163 Piso 1 Col. Lafayette Guadalajara, Jalisco, México 44140

iNBest AI safety group

Hanoi AI Safety Network Jam Site

Location will be a coworking space in Hanoi that can accommodate up to 30 people. Exact location is NovaUp 22nd Thành Công St., Thành Công, Ba Đình, Hà Nội, Vietnam. More details are available in the luma link.

Hanoi AI Safety Network Jam Site

AI Agent Security Hackathon

Join us for a weekend hackathon at Skybox 1 and 2 in DTU Skylab . Whether you're new to the topic or have some experience, this hackathon is a great opportunity to get hands-on experience with the security of agent-based systems.

AI Agent Security Hackathon

🏠 Register a location

The in-person events for the Apart Sprints are run by passionate individuals just like you! We organize the schedule, speakers, and starter templates, and you can focus on engaging your local research, student, and engineering community. Read more about organizing.
Uploading...
fileuploaded.jpg
Upload failed. Max size for files is 10 MB.
Thank you! Your submission has been received! Your event will show up on this page.
Oops! Something went wrong while submitting the form.

📣 Social media images and text snippets

No media added yet
No text snippets added yet

Use this template for your submission [Required]


What to submit?

The report should include:

  • Project title and team members
  • Executive summary (max 250 words)
  • Introduction and problem statement
  • Methodology and approach
  • Results and analysis
  • Discussion of implications for AI safety
  • Conclusion and future work
  • ReferencesAdditionally, teams should provide a link to their code repository (e.g., GitHub) and any demo materials if applicable.

Evaluation Criteria:

Submissions will be judged based on the following criteria:

  1. Agent safety: Does the project move the field of agent safety forward? After reading this, do we know more about how to detect dangerous agents, protect against dangerous agents, or build safer agents than before?
  2. AI safety: Does the project solve a concrete problem in AI safety? If this project is fully realized, would we expect the world with superintelligence to be a safer (even marginally) than yesterday?
  3. Methodology: Is the project well-executed and is the code available so we can review it? Do we expect the results to generalize beyond the specific case(s) presented in the submission?
Uploading...
fileuploaded.jpg
Upload failed. Max size for files is 10 MB.
Uploading...
fileuploaded.jpg
Upload failed. Max size for files is 10 MB.
Uploading...
fileuploaded.jpg
Upload failed. Max size for files is 10 MB.
You have successfully submitted! You should receive an email and your project should appear here. If not, contact operations@apartresearch.com.
Oops! Something went wrong while submitting the form.

Here are the entries for the Agent Security Hackathon 2024