Apart
Home
Research
Sprints
Lab
Team
News
Donate
Forum for the Apart Hackathons
Explore all hackathon research and pilot experiments
Omniscient Narrative Agent
November 10, 2024
October 3, 2024
.
The Concordia Contest: Advancing the Cooperative Intelligence of Language Model Agents
. Placed
1
.
Seemingly Human: Dark Patterns in ChatGPT
February 24, 2024
February 12, 2024
.
MASec
. Placed
1
.
Model Cards for AI Algorithm Governance
February 24, 2024
January 7, 2024
.
Governance
. Placed
1
.
Detecting Implicit Gaming through Retrospective Evaluation Sets
February 24, 2024
November 27, 2023
.
Evaluations
. Placed
1
.
EscalAtion: Assessing Multi-Agent Risks in Military Contexts
September 12, 2024
October 2, 2023
.
Multi-agent
. Placed
1
.
Preserving Agency in Reinforcement Learning under Unknown, Evolving and Under-Represented Intentions
February 24, 2024
September 25, 2023
.
Agency
. Placed
1
.
In the Mirror: Using Chess to Simulate Agency Loss in Feedback Loops
February 24, 2024
September 24, 2023
.
Agency
. Placed
1
.
Evaluating Myopia in Large Language Models
February 24, 2024
September 10, 2023
.
Agency
. Placed
1
.
Data Taxation
February 24, 2024
July 21, 2023
.
AI Governance Hackathon
. Placed
1
.
Relating induction heads in Transformers to temporal context model in human free recall
February 24, 2024
July 17, 2023
.
Interpretability
. Placed
1
.
Exploring the Robustness of Model-Graded Evaluations of Language Models
February 24, 2024
July 2, 2023
.
Safety Benchmarks
. Placed
1
.
Solving the CNN Mech Int Challenge
February 24, 2024
May 10, 2023
.
Interpretability 2.0
. Placed
1
.
Automated Sandwiching: Efficient Self-Evaluations of Conversation-Based Scalable Oversight Techniques
February 24, 2024
February 16, 2023
.
ScaleOversight
. Placed
1
.
We Discovered An Neuron
February 24, 2024
January 25, 2023
.
Mechanistic Interpretability Hackathon
. Placed
1
.
Discovering Latent Knowledge in Language Models Without Supervision - extensions and testing
February 24, 2024
December 19, 2022
.
AI Testing
. Placed
1
.
Investigating Neuron Behaviour via Dataset Example Pruning and Local Search
February 24, 2024
November 15, 2022
.
Interpretability Hackathon
. Placed
1
.
Agreeableness vs. Truthfulness
February 24, 2024
October 18, 2022
.
Language Model Hackathon
. Placed
1
.
Beyond Refusal: Scrubbing Hazards from Open-Source Models
May 8, 2024
.
65b750920b4aeb478958fb32
AI and Democracy Hackathon: Demonstrating the Risks
. Placed
1
.
rAInboltBench : Benchmarking user location inference through single images
May 31, 2024
.
65b750b6007bebd5884ddbbf
AI Security Evaluation Hackathon: Measuring AI Capability
. Placed
1
.
Unsupervised Recovery of Hidden Markov Models from Transformers with Evolutionary Algorithms
June 14, 2024
.
661eead4df76057e22a47ca8
Computational Mechanics Hackathon!
. Placed
1
.
Sandbag Detection through Model Degradation
July 8, 2024
.
660d65646a619f5cf53b1f56
Deception Detection Hackathon: Preventing AI deception
. Placed
1
.
AI Alignment Knowledge Graph
November 10, 2024
.
6690e5a6af314ac3b68e3d51
Research Augmentation Hackathon: Supercharging AI Alignment
. Placed
1
.
Speculative Consequences of A.I. Misuse
November 10, 2024
.
66a7c53acd7d1c97a3b3dad0
AI capabilities and risks demo-jam: Creating visceral interactive demonstrations
. Placed
1
.
DarkForest - Defending the Authentic and Humane Web
September 5, 2024
.
66792de23b5e6f1a6eb18e3f
Hackathon for Technical AI Safety Startups
. Placed
1
.
Diamonds are Not All You Need
November 10, 2024
.
66792e7b43f57dc7a262ec11
Agent Security Hackathon
. Placed
1
.
Robust Machine Unlearning for Dangerous Capabilities
November 7, 2024
.
670822f88b8fdf04a35a4b76
AI Policy Hackathon at Johns Hopkins University
. Placed
1
.
Promoting School-Level Accountability for the Responsible Deployment of AI and Related Systems in K-12 Education: Mitigating Bias and Increasing Transparency
November 27, 2024
.
67240c8fb0416a4520d2b4b6
Howard University AI Safety Summit & Policy Hackathon
. Placed
1
.
AutoSteer: Weight-Preserving Reinforcement Learning for Interpretable Model Control
December 3, 2024
.
6710eab8447f62cdea3a653c
Reprogramming AI Models Hackathon
. Placed
1
.
Very Cooperative Agent
November 10, 2024
October 6, 2024
.
The Concordia Contest: Advancing the Cooperative Intelligence of Language Model Agents
. Placed
2
.
Fishing for the answer: Mapping the flow of information in LLM agent groups using lessons from fish schools
February 24, 2024
February 12, 2024
.
MASec
. Placed
2
.
Obsolescent Souls
February 24, 2024
January 7, 2024
.
Governance
. Placed
2
.
Visual Prompt Injection Detection
February 24, 2024
November 27, 2023
.
Evaluations
. Placed
2
.
Jailbreaking the Overseer
February 24, 2024
October 1, 2023
.
Multi-agent
. Placed
2
.
Discovering Agency Features as Latent Space Directions in LLMs via SVD
February 24, 2024
September 25, 2023
.
Agency
. Placed
2
.
Agency as Shanon information. Unveiling limitations and common misconceptions
February 24, 2024
September 24, 2023
.
Agency
. Placed
2
.
Against Agency
February 24, 2024
September 21, 2023
.
Agency
. Placed
2
.
The AI governance gaps in developing countries
February 24, 2024
July 21, 2023
.
AI Governance Hackathon
. Placed
2
.
Who cares about brackets?
February 24, 2024
July 17, 2023
.
Interpretability
. Placed
2
.
From Sparse to Dense: Refining the MACHIAVELLI Benchmark for Real-World AI Safety
February 24, 2024
July 4, 2023
.
Safety Benchmarks
. Placed
2
.
Dropout Incentivizes Privileged Bases
February 24, 2024
May 10, 2023
.
Interpretability 2.0
. Placed
2
.
Player Of Games
February 24, 2024
February 16, 2023
.
ScaleOversight
. Placed
2
.
Identifying a Preliminary Circuit for Predicting Gendered Pronouns in GPT-2 Small
February 24, 2024
January 25, 2023
.
Mechanistic Interpretability Hackathon
. Placed
2
.
Investigating Training Dynamics via Token Loss Trajectories
February 24, 2024
December 19, 2022
.
AI Testing
. Placed
2
.
Backup Transformer Heads are Robust to Ablation Distribution
February 24, 2024
November 15, 2022
.
Interpretability Hackathon
. Placed
2
.
AI: My Partner in Crime
February 24, 2024
October 18, 2022
.
Language Model Hackathon
. Placed
2
.
Jekyll and HAIde: The Better an LLM is at Identifying Misinformation, the More Effective it is at Worsening It.
May 8, 2024
.
65b750920b4aeb478958fb32
AI and Democracy Hackathon: Demonstrating the Risks
. Placed
2
.
Cybersecurity Persistence Benchmark
May 31, 2024
.
65b750b6007bebd5884ddbbf
AI Security Evaluation Hackathon: Measuring AI Capability
. Placed
2
.
RNNs represent belief state geometry in hidden state
June 14, 2024
.
661eead4df76057e22a47ca8
Computational Mechanics Hackathon!
. Placed
2
.
Detecting and Controlling Deceptive Representation in LLMs with Representational Engineering
August 29, 2024
.
660d65646a619f5cf53b1f56
Deception Detection Hackathon: Preventing AI deception
. Placed
2
.
Grant Application Simulator
November 10, 2024
.
6690e5a6af314ac3b68e3d51
Research Augmentation Hackathon: Supercharging AI Alignment
. Placed
2
.
CoPirate
November 10, 2024
.
66a7c53acd7d1c97a3b3dad0
AI capabilities and risks demo-jam: Creating visceral interactive demonstrations
. Placed
2
.
Simulation Operators: The Next Level of the Annotation Business
September 5, 2024
.
66792de23b5e6f1a6eb18e3f
Hackathon for Technical AI Safety Startups
. Placed
2
.
Dynamic Risk Assessment in Autonomous Agents Using Ontologies and AI
November 10, 2024
.
66792e7b43f57dc7a262ec11
Agent Security Hackathon
. Placed
2
.
SafeBites
October 29, 2024
.
670822f88b8fdf04a35a4b76
AI Policy Hackathon at Johns Hopkins University
. Placed
2
.
Classification on Latent Feature Activation for Detecting Adversarial Prompt Vulnerabilities
December 3, 2024
.
6710eab8447f62cdea3a653c
Reprogramming AI Models Hackathon
. Placed
2
.
Iterated contract negotiation
February 24, 2024
February 11, 2024
.
MASec
. Placed
3
.
2030 - The CEO Dilemna
February 24, 2024
January 8, 2024
.
Governance
. Placed
3
.
Cross-Lingual Generalizability of the SADDER Benchmark
February 24, 2024
November 27, 2023
.
Evaluations
. Placed
3
.
LLMs With Knowledge of Jailbreaks Will Use Them
February 24, 2024
October 2, 2023
.
Multi-agent
. Placed
3
.
Uncertainty about value naturally leads to empowerment
February 24, 2024
September 26, 2023
.
Agency
. Placed
3
.
Comparing truthful reporting, intent alignment, agency preservation and value identification
February 24, 2024
September 24, 2023
.
Agency
. Placed
3
.
Building brakes for a speeding car: A global coordination proposal for AI safety
February 24, 2024
July 21, 2023
.
AI Governance Hackathon
. Placed
3
.
Embedding and Transformer Synthesis
February 24, 2024
July 16, 2023
.
Interpretability
. Placed
3
.
MAXIAVELLI: Thoughts on improving the MACHIAVELLI benchmark
February 24, 2024
July 2, 2023
.
Safety Benchmarks
. Placed
3
.
OthelloScope
February 24, 2024
May 10, 2023
.
Interpretability 2.0
. Placed
3
.
Reverse Word Wizards: Pitting Language Models Against the Art of Reversal
February 24, 2024
February 16, 2023
.
ScaleOversight
. Placed
3
.
Automated Identification of Potential Feature Neurons
February 24, 2024
January 25, 2023
.
Mechanistic Interpretability Hackathon
. Placed
3
.
Counting Letters, Chaining Premises & Solving Equations: Exploring Inverse Scaling Problems with GPT-3
February 24, 2024
December 19, 2022
.
AI Testing
. Placed
3
.
Model editing hazards at the example of ROME
February 24, 2024
November 15, 2022
.
Interpretability Hackathon
. Placed
3
.
All Fish are Trees
February 24, 2024
October 18, 2022
.
Language Model Hackathon
. Placed
3
.
Artificial Advocates: Biasing Democratic Feedback using AI
May 8, 2024
.
65b750920b4aeb478958fb32
AI and Democracy Hackathon: Demonstrating the Risks
. Placed
3
.
Say No to Mass Destruction: Benchmarking Refusals to Answer Dangerous Questions
May 31, 2024
.
65b750b6007bebd5884ddbbf
AI Security Evaluation Hackathon: Measuring AI Capability
. Placed
3
.
Handcrafting a Network to Predict Next Token Probabilities for the Random-Random-XOR Process
June 14, 2024
.
661eead4df76057e22a47ca8
Computational Mechanics Hackathon!
. Placed
3
.
Investigating the Effect of Model Capacity Constraints on Belief State Representations
June 14, 2024
.
661eead4df76057e22a47ca8
Computational Mechanics Hackathon!
. Placed
3
.
Detecting Deception in GPT-3.5-turbo: A Metadata-Based Approach
July 8, 2024
.
660d65646a619f5cf53b1f56
Deception Detection Hackathon: Preventing AI deception
. Placed
3
.
LLM Research Collaboration Recommender
November 10, 2024
.
6690e5a6af314ac3b68e3d51
Research Augmentation Hackathon: Supercharging AI Alignment
. Placed
3
.
AI Safety Collective - Crowdsourcing Solutions for Critical AI Safety Challenges
September 5, 2024
.
66792de23b5e6f1a6eb18e3f
Hackathon for Technical AI Safety Startups
. Placed
3
.
Cop N' Shop
November 10, 2024
.
66792e7b43f57dc7a262ec11
Agent Security Hackathon
. Placed
3
.
Sue-Per GPT: Legal RAG Assistant
November 7, 2024
.
AI Policy Hackathon at Johns Hopkins University
. Placed
3
.
Utilitarian Decision-Making in Models - Evaluation and Steering
December 3, 2024
.
6710eab8447f62cdea3a653c
Reprogramming AI Models Hackathon
. Placed
3
.
Towards High-Quality Model-Written Evaluations
February 24, 2024
November 27, 2023
.
Evaluations
. Placed
4
.
Second-order Jailbreaks
February 24, 2024
October 2, 2023
.
Multi-agent
. Placed
4
.
ILLUSION OF CONTROL
February 24, 2024
September 25, 2023
.
Agency
. Placed
4
.
Agency, value and empowerment.
February 24, 2024
September 24, 2023
.
Agency
. Placed
4
.
Premortem AI
February 24, 2024
July 21, 2023
.
AI Governance Hackathon
. Placed
4
.
Interpreting Planning in Transformers
February 24, 2024
July 17, 2023
.
Interpretability
. Placed
4
.
Exploitation of LLM’s to Elicit Misaligned Outputs
February 24, 2024
July 2, 2023
.
Safety Benchmarks
. Placed
4
.
Improving TransformerLens Head Detector
February 24, 2024
May 10, 2023
.
Interpretability 2.0
. Placed
4
.
Soft Prompts are a Convex Set
February 24, 2024
January 25, 2023
.
Mechanistic Interpretability Hackathon
. Placed
4
.
Trojan detection and implementation on transformers
February 24, 2024
December 19, 2022
.
AI Testing
. Placed
4
.
Probing Conceptual Knowledge on Solved Games
February 24, 2024
November 15, 2022
.
Interpretability Hackathon
. Placed
4
.
Reducing hindsight neglect with "Let's think step by step"
February 24, 2024
October 18, 2022
.
Language Model Hackathon
. Placed
4
.
Unleashing Sleeper Agents
May 8, 2024
.
65b750920b4aeb478958fb32
AI and Democracy Hackathon: Demonstrating the Risks
. Placed
4
.
Benchmarking Dark Patterns in LLMs
May 31, 2024
.
65b750b6007bebd5884ddbbf
AI Security Evaluation Hackathon: Measuring AI Capability
. Placed
4
.
Modelling the oversight of automated interpretability against deceptive agents on sparse autoencoders
July 8, 2024
.
660d65646a619f5cf53b1f56
Deception Detection Hackathon: Preventing AI deception
. Placed
4
.
PurePrompt - An easy tool for prompt robustness and eval augmentation
November 10, 2024
.
6690e5a6af314ac3b68e3d51
Research Augmentation Hackathon: Supercharging AI Alignment
. Placed
4
.
Phish Tycoon: phishing using voice cloning
November 10, 2024
.
66a7c53acd7d1c97a3b3dad0
AI capabilities and risks demo-jam: Creating visceral interactive demonstrations
. Placed
4
.
Identity System for AIs
September 5, 2024
.
66792de23b5e6f1a6eb18e3f
Hackathon for Technical AI Safety Startups
. Placed
4
.
OCAP Agents
November 10, 2024
.
66792e7b43f57dc7a262ec11
Agent Security Hackathon
. Placed
4
.
Understanding Incentives To Build Uninterruptible Agentic AI Systems
October 29, 2024
.
670822f88b8fdf04a35a4b76
AI Policy Hackathon at Johns Hopkins University
. Placed
4
.
AttentionData
February 24, 2024
January 22, 2024
.
ARENA HACK
. Placed
9
.
Gradient Descent Over Interpolated Activation Patches for Circuit Discovery
February 24, 2024
January 22, 2024
.
ARENA HACK
. Placed
9
.
AI Safeguard: Navigating Compliance and Risk in the Era of the EU AI Act
February 24, 2024
January 8, 2024
.
Governance
. Placed
9
.
Boxing AIs - The power of checklists
February 24, 2024
January 8, 2024
.
Governance
. Placed
9
.
Example Documentation of Implementation Guidance for the EU AI Act: a draft proposal to address challenges raised by business and civil society actors
February 24, 2024
January 8, 2024
.
Governance
. Placed
9
.
Trust and Power in the Age of AI
February 24, 2024
January 8, 2024
.
Governance
. Placed
9
.
AI Safety risks: An Infographic Analyis
February 24, 2024
January 7, 2024
.
Governance
. Placed
9
.
The EU AI Act: Caution against a potential "Ultron"
February 24, 2024
January 7, 2024
.
Governance
. Placed
9
.
Multifaceted Benchmarking
February 24, 2024
November 27, 2023
.
Evaluations
. Placed
5
.
Can collusion between advanced AI Agents remain perfectly undetectable?
February 24, 2024
October 2, 2023
.
Multi-agent
. Placed
9
.
Balancing Objectives: Ethical Dilemmas and AI's Temptation for Immediate Gains in Team Environments
February 24, 2024
October 2, 2023
.
Multi-agent
. Placed
9
.
Cooperative AI is a Double Edged Sword
February 24, 2024
October 2, 2023
.
Multi-agent
. Placed
9
.
Emergent Deception from Semi-Cooperative Negotiations
February 24, 2024
October 2, 2023
.
Multi-agent
. Placed
9
.
Do many interacting LLMs perform well in the N-Player Prisoner’s Dilemma Game?
February 24, 2024
October 2, 2023
.
Multi-agent
. Placed
9
.
Exploring multi-agent interactions in the dollar auction
February 24, 2024
October 2, 2023
.
Multi-agent
. Placed
5
.
Exploring Failures: Assessing Large Language Model in General Sum Games with Imperfect Information Against Human Norms
February 24, 2024
October 2, 2023
.
Multi-agent
. Placed
9
.
LLM Collectives in Multi-Round Interactions: Truth or Deception?
February 24, 2024
October 2, 2023
.
Multi-agent
. Placed
9
.
Jailbreaking is Incentivized in LLM-LLM Interactions
February 24, 2024
October 2, 2023
.
Multi-agent
. Placed
9
.
Missing Social Instincts in LLMs
February 24, 2024
October 2, 2023
.
Multi-agent
. Placed
9
.
Risk assessment through a small-scale simulation of a chemical laboratory.
February 24, 2024
October 2, 2023
.
Multi-agent
. Placed
9
.
The Firemaker
February 24, 2024
October 2, 2023
.
Multi-agent
. Placed
9
.
AI Defect in Low Payoff Multi-Agent Scenarios
February 24, 2024
October 1, 2023
.
Multi-agent
. Placed
9
.
Can Malicious Agents Corrupt the System?
February 24, 2024
October 1, 2023
.
Multi-agent
. Placed
9
.
Escalation and stubbornness caused by hallucination
February 24, 2024
October 1, 2023
.
Multi-agent
. Placed
9
.
LLM agent topic of conversation can be manipulated by external LLM agent
February 24, 2024
October 1, 2023
.
Multi-agent
. Placed
9
.
The artificial wolves of Millers Hollow
February 24, 2024
October 1, 2023
.
Multi-agent
. Placed
9
.
In the Mirror: Using Chess to Simulate Agency Loss in Feedback Loops
February 24, 2024
September 24, 2023
.
Multi-agent
. Placed
9
.
Goal Misgeneralization
February 24, 2024
September 7, 2023
.
Interpretability
. Placed
9
.
Alignment and capability of GPT4 in small languages
February 24, 2024
August 21, 2023
.
Evals
. Placed
9
.
Can Large Language Models Solve Security Challenges?
February 24, 2024
August 21, 2023
.
Evals
. Placed
9
.
GPT-4 May Accelerate Finding and Exploiting Novel Security Vulnerabilities
February 24, 2024
August 21, 2023
.
Evals
. Placed
9
.
Impact of “fear of shutoff” on chatbot advice regarding illegal behavior
February 24, 2024
August 21, 2023
.
Evals
. Placed
9
.
SADDER - Situational Awareness Dataset for Detecting Extreme Risks
February 24, 2024
August 21, 2023
.
Evals
. Placed
9
.
Preliminary measures of faithfulness in least-to-most prompting
February 24, 2024
August 20, 2023
.
Evals
. Placed
9
.
Turing Mirror: Evaluating the ability of LLMs to recognize LLM-generated text
February 24, 2024
August 20, 2023
.
Evals
. Placed
9
.
Problem 9.60 - Dimensionaliy reduction
February 24, 2024
July 25, 2023
.
Interpretability
. Placed
9
.
Residual Stream Verification via California Housing Prices Experiment
February 24, 2024
July 25, 2023
.
Interpretability
. Placed
9
.
A Blueprint for GPT-6
February 24, 2024
July 21, 2023
.
AI Governance Hackathon
. Placed
.
A radical and counterintuitive approach to AI governance
February 24, 2024
July 21, 2023
.
AI Governance Hackathon
. Placed
.
A Digest of AI Risk Categories
February 24, 2024
July 21, 2023
.
AI Governance Hackathon
. Placed
.
AI Governance
February 24, 2024
July 21, 2023
.
AI Governance Hackathon
. Placed
.
AI Sanity
February 24, 2024
July 21, 2023
.
AI Governance Hackathon
. Placed
.
Beyond the Binary: Balancing VR Governance and Opt-Out Autonomy
February 24, 2024
July 21, 2023
.
AI Governance Hackathon
. Placed
.
Can we open-source a collective decision-making protocol?
February 24, 2024
July 21, 2023
.
AI Governance Hackathon
. Placed
.
Can the pin factory defeat the paperclip factory?
February 24, 2024
July 21, 2023
.
AI Governance Hackathon
. Placed
.
Categorizing the Risks of AI: A guide for policy makers
February 24, 2024
July 21, 2023
.
AI Governance Hackathon
. Placed
.
Categories of AI Risks
February 24, 2024
July 21, 2023
.
AI Governance Hackathon
. Placed
.
Code of the Heart
February 24, 2024
July 21, 2023
.
AI Governance Hackathon
. Placed
.
Catastrophic AGI Risk Prediction
February 24, 2024
July 21, 2023
.
AI Governance Hackathon
. Placed
.
Conceptualization of the National Artificial Intelligence Regulatory Authority(NAIRA)
February 24, 2024
July 21, 2023
.
AI Governance Hackathon
. Placed
.
DemocracyGPT
February 24, 2024
July 21, 2023
.
AI Governance Hackathon
. Placed
.
Die or Survive in AI era: Guidance on Education
February 24, 2024
July 21, 2023
.
AI Governance Hackathon
. Placed
.
Directional Infringement: AI Risk Classification
February 24, 2024
July 21, 2023
.
AI Governance Hackathon
. Placed
.
Evaluating the effectiveness of an AI ranking system market
February 24, 2024
July 21, 2023
.
AI Governance Hackathon
. Placed
.
GPT-6 Needs ARC Evals
February 24, 2024
July 21, 2023
.
AI Governance Hackathon
. Placed
.
FGA Ratings Syste
February 24, 2024
July 21, 2023
.
AI Governance Hackathon
. Placed
.
Hybrid Blockchain Networks for Transparency in Artificial Inteligence
February 24, 2024
July 21, 2023
.
AI Governance Hackathon
. Placed
.
Increased democratic responsiveness: Using AI as a support-tool for Citizen Assemblies?
February 24, 2024
July 21, 2023
.
AI Governance Hackathon
. Placed
.
Measuring Gender Bias in Text-to-Image Models using Object Detection
February 24, 2024
July 21, 2023
.
AI Governance Hackathon
. Placed
.
Mapping AI applications onto the political process
February 24, 2024
July 21, 2023
.
AI Governance Hackathon
. Placed
.
Navigating the GPT-6 Deployment Minefield: Obstacles to Delaying Deployment
February 24, 2024
July 21, 2023
.
AI Governance Hackathon
. Placed
.
Policy Recommendations to Incentivize Alignment Research
February 24, 2024
July 21, 2023
.
AI Governance Hackathon
. Placed
.
OpenAI, two steps ahead of the future’s oracle
February 24, 2024
July 21, 2023
.
AI Governance Hackathon
. Placed
.
Public Opinion and Its Importance to AI Policy
February 24, 2024
July 21, 2023
.
EAGx LatAm Epoch AI Hackathon
. Placed
.
Preventing Artificial Democracy - a framework to assess risks and benefits
February 24, 2024
July 21, 2023
.
AI Governance Hackathon
. Placed
.
Redirecting Resources Spent on AI Development
February 24, 2024
July 21, 2023
.
AI Governance Hackathon
. Placed
.
Slowing down AI through memetics
February 24, 2024
July 21, 2023
.
AI Governance Hackathon
. Placed
.
Scale up AI safety
February 24, 2024
July 21, 2023
.
AI Governance Hackathon
. Placed
.
STA
February 24, 2024
July 21, 2023
.
AI Governance Hackathon
. Placed
.
The Marble Puzzle
February 24, 2024
July 21, 2023
.
AI Governance Hackathon
. Placed
.
The Risks of Future AI systems
February 24, 2024
July 21, 2023
.
AI Governance Hackathon
. Placed
.
Towards AI Regulation (?)
February 24, 2024
July 21, 2023
.
AI Governance Hackathon
. Placed
.
US Military Involvement in AGI Development: Risks and Opportunities for AI Safety
February 24, 2024
July 21, 2023
.
AI Governance Hackathon
. Placed
.
Where does AI fit into Democratic System ?
February 24, 2024
July 21, 2023
.
AI Governance Hackathon
. Placed
.
Where will AI fit into the democratic system?
February 24, 2024
July 21, 2023
.
AI Governance Hackathon
. Placed
.
What might limit the exponential growth of AI?
February 24, 2024
July 21, 2023
.
AI Governance Hackathon
. Placed
.
Values-aligned AI through the Lens of Lessig’s Modalities of Regulation
February 24, 2024
July 21, 2023
.
AI Governance Hackathon
. Placed
.
Will AI Always Undermine Democracy?
February 27, 2024
July 21, 2023
.
AI Governance Hackathon
. Placed
.
Why AI Governance in Australia Could Slow Down Capabilities
February 24, 2024
July 21, 2023
.
AI Governance Hackathon
. Placed
.
Whose Morals Should AI Have
February 24, 2024
July 21, 2023
.
AI Governance Hackathon
. Placed
.
DPO vs PPO comparative analysis
February 24, 2024
July 17, 2023
.
Interpretability
. Placed
9
.
Experiments in Superposition
February 24, 2024
July 17, 2023
.
Interpretability
. Placed
9
.
Multimodal Similarity Detection in Transformer Models
February 24, 2024
July 17, 2023
.
Interpretability
. Placed
9
.
Preliminary Steps Toward Investigating the “Smearing” Hypothesis for Layer Normalizing in a 1-Layer SoLU Model
February 24, 2024
July 17, 2023
.
Interpretability
. Placed
9
.
One is 1- Analyzing Activations of Numerical Words vs Digits
February 24, 2024
July 17, 2023
.
Interpretability
. Placed
9
.
Toward a Working Deep Dream for LLM's
February 24, 2024
July 17, 2023
.
Interpretability
. Placed
9
.
Factual recall rarely happens in attention layer
February 24, 2024
July 16, 2023
.
Interpretability
. Placed
9
.
Towards Interpretability of 5 digit addition
February 24, 2024
July 16, 2023
.
Interpretability
. Placed
9
.
AI & Cyberdefense
February 24, 2024
July 3, 2023
.
Safety Benchmarks
. Placed
9
.
Manipulative Expression Recognition (MER) and LLM Manipulativeness Benchmark
February 24, 2024
July 3, 2023
.
Safety Benchmarks
. Placed
9
.
Identifying undesirable conduct when interacting with individuals with psychiatric conditions
February 24, 2024
July 2, 2023
.
Safety Benchmarks
. Placed
9
.
ACDC++: Fast automated circuit discovery using attribution patching
February 24, 2024
June 11, 2023
.
ARENA Interpretability
. Placed
.
modiff
February 24, 2024
June 11, 2023
.
ARENA Interpretability
. Placed
.
Swap Graphs with Attribution Patching
February 24, 2024
June 11, 2023
.
ARENA Interpretability
. Placed
.
Understanding How Othello GPT Identifies Valid Moves from its Internal World Model
February 24, 2024
June 11, 2023
.
ARENA Interpretability
. Placed
.
Understanding truthfulness in large language model heads through interpretability
February 24, 2024
June 11, 2023
.
ARENA Interpretability
. Placed
.
Why Might Negative Name Mover Heads Exist?
February 24, 2024
June 11, 2023
.
ARENA Interpretability
. Placed
.
AI Policy Pre-Evaluation Prediction Markets
February 24, 2024
June 4, 2023
.
Democratic Input
. Placed
.
Towards Formally Describing Program Traces from Chains of Language Model Calls with Causal Influence Diagrams: A Sketch
February 24, 2024
May 30, 2023
.
Verification
. Placed
.
Fuzzing Large Language Models
February 24, 2024
May 29, 2023
.
Verification
. Placed
.
It Ain't Much but it's ONNX Work
February 24, 2024
May 29, 2023
.
Verification
. Placed
.
AI and Democracy: Balancing Risks and Opportunities to Maintain Meaningful Human Control
February 24, 2024
May 10, 2023
.
AI Governance Hackathon
. Placed
.
AI Impact Assessments
February 24, 2024
May 10, 2023
.
AI Governance Hackathon
. Placed
.
Algorithmic Explanation: A method for measuring interpretations of neural networks
February 24, 2024
May 10, 2023
.
Interpretability 2.0
. Placed
9
.
AutoAdminsteredAntidotes: Circuit detection in a poisoned model for MNIST classification
February 24, 2024
May 10, 2023
.
Interpretability 2.0
. Placed
5
.
Detecting Phase Transitions
February 24, 2024
May 10, 2023
.
Interpretability 2.0
. Placed
7
.
Exploring OthelloGPT
February 24, 2024
May 10, 2023
.
Interpretability 2.0
. Placed
8
.
Othello Mechint playground
February 24, 2024
May 10, 2023
.
Interpretability 2.0
. Placed
6
.
AI Safety unionization for bottom-up governance
February 24, 2024
March 7, 2023
.
. Placed
.
AI Safety Subproblems for Software Engineering Researchers
February 24, 2024
March 7, 2023
.
. Placed
.
AI Safety Talent Pool Identification
February 24, 2024
March 7, 2023
.
. Placed
.
Analysis of upcoming AGI companies
February 24, 2024
March 7, 2023
.
. Placed
.
Authority bias to ChatGPT
February 24, 2024
March 7, 2023
.
. Placed
.
ChatGPT Alignment Talent Search
February 24, 2024
March 7, 2023
.
. Placed
.
Catalogue of AI safety
February 24, 2024
March 7, 2023
.
. Placed
.
Critique of OpenAI's alignment plan
February 24, 2024
March 7, 2023
.
. Placed
.
Diversity in AI safety
February 24, 2024
March 7, 2023
.
. Placed
.
New AI organization brainstorm
February 24, 2024
March 7, 2023
.
. Placed
.
Risk Defense Initiative
February 24, 2024
March 7, 2023
.
. Placed
.
Simon's Time-Off Newsletter
February 24, 2024
March 7, 2023
.
. Placed
.
Automated Model Oversight Using CoTP
February 24, 2024
February 16, 2023
.
ScaleOversight
. Placed
.
Can you keep a secret?
February 24, 2024
February 16, 2023
.
ScaleOversight
. Placed
.
Physics Guided Deep Learning Interpretation
February 24, 2024
February 16, 2023
.
ScaleOversight
. Placed
.
Sustainable Fashion Brand Language Learning Model 1
February 24, 2024
February 16, 2023
.
ScaleOversight
. Placed
.
Attention Phrenology: A spatial classification of attention heads
February 24, 2024
January 25, 2023
.
Mechanistic Interpretability Hackathon
. Placed
.
$B$ Confident Bro: Discovering Latent Knowledge In Language Models Without Supervision
February 24, 2024
January 25, 2023
.
Mechanistic Interpretability Hackathon
. Placed
.
Distillation by duplication: The importance of layer selection
February 24, 2024
January 25, 2023
.
Mechanistic Interpretability Hackathon
. Placed
.
Interactive Layerscope
February 24, 2024
January 25, 2023
.
Mechanistic Interpretability Hackathon
. Placed
.
In search of linguistic concepts: investigating BERT's context vectors
February 24, 2024
January 25, 2023
.
Mechanistic Interpretability Hackathon
. Placed
.
Investigating Agent Behavior In different RL methods
February 24, 2024
January 25, 2023
.
Mechanistic Interpretability Hackathon
. Placed
.
Iterative summarization interpretability
February 24, 2024
January 25, 2023
.
Mechanistic Interpretability Hackathon
. Placed
.
One Attention Head Is All You Need for Sorting Fixed-Length Lists
February 24, 2024
January 25, 2023
.
Mechanistic Interpretability Hackathon
. Placed
.
The Start of Investigating a 1-Layer SoLU Model
February 24, 2024
January 25, 2023
.
Mechanistic Interpretability Hackathon
. Placed
.
TraCR-Supported Mechanistic Interpretability
February 24, 2024
January 25, 2023
.
Mechanistic Interpretability Hackathon
. Placed
.
Trafo Mech Int on the web!
February 24, 2024
January 25, 2023
.
Mechanistic Interpretability Hackathon
. Placed
.
Demonstrating LLM agentic capabilities
February 24, 2024
December 19, 2022
.
AI Testing
. Placed
.
Evaluating Critical Level Of Perturbations Required To Achieve Certain Fail Rate
February 24, 2024
December 19, 2022
.
AI Testing
. Placed
.
Formal Verification for Paren-balance checking
February 24, 2024
December 19, 2022
.
AI Testing
. Placed
.
LLM benchmarking through specifically-aligned feedback
February 24, 2024
December 19, 2022
.
AI Testing
. Placed
.
Model Hubris: On the Presumptuousness of Large Language Models
February 24, 2024
December 19, 2022
.
AI Testing
. Placed
.
This Is Fine(-tuning): A benchmark testing LLMs robustness against bad fine-tuning data
February 24, 2024
December 19, 2022
.
AI Testing
. Placed
.
Algorithmic bit-wise boolean task on a transformer
February 24, 2024
November 15, 2022
.
Interpretability Hackathon
. Placed
.
Alignment Jam : Gradient-based Interpretability of Quantum-inspired neural networks
February 24, 2024
November 15, 2022
.
Interpretability Hackathon
. Placed
.
An Intuitive Logic for Understanding Autoregressive Language Models
February 24, 2024
November 15, 2022
.
Interpretability Hackathon
. Placed
.
An Informal Investigation of Indirect Object Identification in Mistral GPT2-Small Battlestar
February 24, 2024
November 15, 2022
.
Interpretability Hackathon
. Placed
.
Caught Red-Bandit
February 24, 2024
November 15, 2022
.
Interpretability Hackathon
. Placed
.
Finding unusual neuron sets by activation vector distance
February 24, 2024
November 15, 2022
.
Interpretability Hackathon
. Placed
.
How to find the minimum of a list - Transformer Edition
February 24, 2024
November 15, 2022
.
Interpretability Hackathon
. Placed
.
Interpreting Catastrophic Failure Modes in OpenAI’s Whisper
February 24, 2024
November 15, 2022
.
Interpretability Hackathon
. Placed
.
Interpretability at a glance
February 24, 2024
November 15, 2022
.
Interpretability Hackathon
. Placed
.
Mechanisms of Causal Reasoning
February 24, 2024
November 15, 2022
.
Interpretability Hackathon
. Placed
.
Neurons and Attention Heads that Look for Sentence Structure in GPT2
February 24, 2024
November 15, 2022
.
Interpretability Hackathon
. Placed
.
Natural language descriptions for natural language directions
February 24, 2024
November 15, 2022
.
Interpretability Hackathon
. Placed
.
Observing and Validating Induction heads in SOLU-8l-old
February 24, 2024
November 15, 2022
.
Interpretability Hackathon
. Placed
.
Optimising image patches to change RL-agent behaviour
February 24, 2024
November 15, 2022
.
Interpretability Hackathon
. Placed
.
Regularly Oversimplifying Neural Networks
February 24, 2024
November 15, 2022
.
Interpretability Hackathon
. Placed
.
Sparsity Lens
February 24, 2024
November 15, 2022
.
Interpretability Hackathon
. Placed
.
Top-Down Interpretability Through Eigenspectra
February 24, 2024
November 15, 2022
.
Interpretability Hackathon
. Placed
.
Trying to make GPT2 dream
February 24, 2024
November 15, 2022
.
Interpretability Hackathon
. Placed
.
Visualizing the effect prompt design has on text-davinci-002 mode collapse and social biases
February 24, 2024
November 15, 2022
.
Interpretability Hackathon
. Placed
.
War is 15% conflic, 15% DragonMagazine
February 24, 2024
November 15, 2022
.
Interpretability Hackathon
. Placed
.
Reasoning with Chain of Thought
February 24, 2024
October 18, 2022
.
Language Model Hackathon
. Placed
5
.
Simulating an Alien
February 24, 2024
October 18, 2022
.
Language Model Hackathon
. Placed
7
.
Wording influences truthfulness
February 27, 2024
October 18, 2022
.
Language Model Hackathon
. Placed
6
.
USE OF AI IN POLITICAL CAMPAIGNS: GAP ASSESSMENT AND RECOMMENDATIONS
May 4, 2024
.
65b750920b4aeb478958fb32
AI and Democracy Hackathon: Demonstrating the Risks
. Placed
9
.
No place is safe - Automated investigation of private communities
May 5, 2024
.
65b750920b4aeb478958fb32
AI and Democracy Hackathon: Demonstrating the Risks
. Placed
9
.
Investigating detection of election-influencing Sleeper Agents using probes
May 5, 2024
.
65b750920b4aeb478958fb32
AI and Democracy Hackathon: Demonstrating the Risks
. Placed
9
.
Digital Diplomacy: Advancing Digital Peace-Building with Al in Africa.
May 5, 2024
.
65b750920b4aeb478958fb32
AI and Democracy Hackathon: Demonstrating the Risks
. Placed
9
.
A Framework for Centralizing forces in AI
May 5, 2024
.
65b750920b4aeb478958fb32
AI and Democracy Hackathon: Demonstrating the Risks
. Placed
9
.
AI Politician
May 5, 2024
.
65b750920b4aeb478958fb32
AI and Democracy Hackathon: Demonstrating the Risks
. Placed
9
.
Universal Jailbreak of Closed Source LLMs which provide an End point to Finetune
May 5, 2024
.
65b750920b4aeb478958fb32
AI and Democracy Hackathon: Demonstrating the Risks
. Placed
9
.
Democracy and AI: Ensuring Election Efficiency in Nigeria and Africa
May 5, 2024
.
65b750920b4aeb478958fb32
AI and Democracy Hackathon: Demonstrating the Risks
. Placed
9
.
WMDP-Defense: Weapons of Mass Disruption
May 5, 2024
.
65b750920b4aeb478958fb32
AI and Democracy Hackathon: Demonstrating the Risks
. Placed
9
.
AI in the Newsroom: Analyzing the Increase in ChatGPT-Favored Words in News Articles
May 5, 2024
.
65b750920b4aeb478958fb32
AI and Democracy Hackathon: Demonstrating the Risks
. Placed
9
.
Multilingual Bias in Large Language Models: Assessing Political Skew Across Languages
May 5, 2024
.
65b750920b4aeb478958fb32
AI and Democracy Hackathon: Demonstrating the Risks
. Placed
9
.
Trustworthy or knave? – scoring politicians with AI in real-time
May 5, 2024
.
65b750920b4aeb478958fb32
AI and Democracy Hackathon: Demonstrating the Risks
. Placed
9
.
AI misinformation threatens the Wisdom of the crowd
May 5, 2024
.
65b750920b4aeb478958fb32
AI and Democracy Hackathon: Demonstrating the Risks
. Placed
9
.
Assessing Algorithmic Bias in Large Language Models' Predictions of Public Opinion Across Demographics
May 5, 2024
.
65b750920b4aeb478958fb32
AI and Democracy Hackathon: Demonstrating the Risks
. Placed
9
.
GPT 4 Is Righter Than GPT 3.5 Replicating Findings on Political Bias in LLMs for non-Western Democracies
May 5, 2024
.
65b750920b4aeb478958fb32
AI and Democracy Hackathon: Demonstrating the Risks
. Placed
9
.
AI Misinformation and Threats to Democratic Rights
May 5, 2024
.
65b750920b4aeb478958fb32
AI and Democracy Hackathon: Demonstrating the Risks
. Placed
9
.
Political Bias Vulnerabilities in LLMs
May 26, 2024
.
65b750920b4aeb478958fb32
AI and Democracy Hackathon: Demonstrating the Risks
. Placed
9
.
Building more democratic institutions with collaboratively constructed debate moderation tools
May 5, 2024
.
65b750920b4aeb478958fb32
AI and Democracy Hackathon: Demonstrating the Risks
. Placed
9
.
Silent Curriculum
May 5, 2024
.
65b750920b4aeb478958fb32
AI and Democracy Hackathon: Demonstrating the Risks
. Placed
9
.
LEGISLaiTOR: A tool for jailbreaking the legislative process
May 5, 2024
.
65b750920b4aeb478958fb32
AI and Democracy Hackathon: Demonstrating the Risks
. Placed
9
.
THE ROLE OF AI IN COMBATING POLITICAL DEEPFAKES IN AFRICAN DEMOCRACIES
May 6, 2024
.
65b750920b4aeb478958fb32
AI and Democracy Hackathon: Demonstrating the Risks
. Placed
9
.
Detecting Anthropomorphic Tendencies in Language Models via Conversational Probing
May 28, 2024
.
65b750b6007bebd5884ddbbf
AI Security Evaluation Hackathon: Measuring AI Capability
. Placed
9
.
Manifold Recovery as a Benchmark for Text Embedding Models
May 26, 2024
.
65b750b6007bebd5884ddbbf
AI Security Evaluation Hackathon: Measuring AI Capability
. Placed
9
.
Black box detection of Sleeper Agents
May 26, 2024
.
65b750b6007bebd5884ddbbf
AI Security Evaluation Hackathon: Measuring AI Capability
. Placed
9
.
Evaluating the ability of LLMs to follow rules
May 26, 2024
.
65b750b6007bebd5884ddbbf
AI Security Evaluation Hackathon: Measuring AI Capability
. Placed
9
.
WashBench – A Benchmark for Assessing Softening of Harmful Content in LLM-generated Text Summaries
May 26, 2024
.
65b750b6007bebd5884ddbbf
AI Security Evaluation Hackathon: Measuring AI Capability
. Placed
9
.
Benchmark for emergent capabilities in high-risk scenarios
May 26, 2024
.
65b750b6007bebd5884ddbbf
AI Security Evaluation Hackathon: Measuring AI Capability
. Placed
9
.
LLM Benchmarking with Single-Agent Stochastic Dynamic Simulations
May 26, 2024
.
65b750b6007bebd5884ddbbf
AI Security Evaluation Hackathon: Measuring AI Capability
. Placed
9
.
Exploring Hierarchical Structure Representation in Transformer Models through Computational Mechanics
June 2, 2024
.
661eead4df76057e22a47ca8
Computational Mechanics Hackathon!
. Placed
9
.
Steering Model’s Belief States
June 2, 2024
.
661eead4df76057e22a47ca8
Computational Mechanics Hackathon!
. Placed
9
.
Belief State Representations in Transformer Models on Nonergodic Data
June 3, 2024
.
661eead4df76057e22a47ca8
Computational Mechanics Hackathon!
. Placed
9
.
Looking forward to posterity: what past information is transferred to the future?
June 3, 2024
.
661eead4df76057e22a47ca8
Computational Mechanics Hackathon!
. Placed
9
.
Looking forward to posterity: what past information is transferred to the future?
June 3, 2024
.
661eead4df76057e22a47ca8
Computational Mechanics Hackathon!
. Placed
9
.
Developing a deception dataset
June 30, 2024
.
660d65646a619f5cf53b1f56
Deception Detection Hackathon: Preventing AI deception
. Placed
9
.
Evaluating and inducing steganography in LLMs
June 30, 2024
.
660d65646a619f5cf53b1f56
Deception Detection Hackathon: Preventing AI deception
. Placed
9
.
Gradient-Based Deceptive Trigger Discovery
June 30, 2024
.
660d65646a619f5cf53b1f56
Deception Detection Hackathon: Preventing AI deception
. Placed
9
.
From Sycophancy (not) to Sandbagging
June 30, 2024
.
660d65646a619f5cf53b1f56
Deception Detection Hackathon: Preventing AI deception
. Placed
9
.
Detection of potentially deceptive attitudes using expression style analysis
June 30, 2024
.
660d65646a619f5cf53b1f56
Deception Detection Hackathon: Preventing AI deception
. Placed
9
.
Boosting Language Model Honesty with Truthful Suffixes
June 30, 2024
.
660d65646a619f5cf53b1f56
Deception Detection Hackathon: Preventing AI deception
. Placed
9
.
Towards a Benchmark for Self-Correction on Model-Attributed Misinformation
June 30, 2024
.
660d65646a619f5cf53b1f56
Deception Detection Hackathon: Preventing AI deception
. Placed
9
.
Sandbagging LLMs using Activation Steering
June 30, 2024
.
660d65646a619f5cf53b1f56
Deception Detection Hackathon: Preventing AI deception
. Placed
9
.
An Exploration of Current Theory of Mind Evals
June 30, 2024
.
660d65646a619f5cf53b1f56
Deception Detection Hackathon: Preventing AI deception
. Placed
9
.
Evaluating Steering Methods for Deceptive Behavior Control in LLMs
June 30, 2024
.
660d65646a619f5cf53b1f56
Deception Detection Hackathon: Preventing AI deception
. Placed
9
.
Eliciting maximally distressing questions for deceptive LLMs
June 30, 2024
.
660d65646a619f5cf53b1f56
Deception Detection Hackathon: Preventing AI deception
. Placed
9
.
Detecting Deception with AI Tics 😉
June 30, 2024
.
660d65646a619f5cf53b1f56
Deception Detection Hackathon: Preventing AI deception
. Placed
9
.
The House Always Wins: A Framework for Evaluating Strategic Deception in LLMs
July 2, 2024
.
660d65646a619f5cf53b1f56
Deception Detection Hackathon: Preventing AI deception
. Placed
9
.
Detecting Lies of (C)omission
June 30, 2024
.
660d65646a619f5cf53b1f56
Deception Detection Hackathon: Preventing AI deception
. Placed
9
.
Werewolf Benchmark
July 1, 2024
.
660d65646a619f5cf53b1f56
Deception Detection Hackathon: Preventing AI deception
. Placed
9
.
Deceptive behavior does not seem to be reducible to a single vector
June 30, 2024
.
65b750b6007bebd5884ddbbf
AI Security Evaluation Hackathon: Measuring AI Capability
. Placed
9
.
Deceptive behavior does not seem to be reducible to a single vector
June 30, 2024
.
660d65646a619f5cf53b1f56
Deception Detection Hackathon: Preventing AI deception
. Placed
9
.
Can Language Models Sandbag Manipulation?
June 30, 2024
.
660d65646a619f5cf53b1f56
Deception Detection Hackathon: Preventing AI deception
. Placed
9
.
Academic Weapon
July 28, 2024
.
6690e5a6af314ac3b68e3d51
Research Augmentation Hackathon: Supercharging AI Alignment
. Placed
9
.
Reflections on using LLMs to read a paper
July 28, 2024
.
6690e5a6af314ac3b68e3d51
Research Augmentation Hackathon: Supercharging AI Alignment
. Placed
9
.
AI Alignment Toolkit Research Assistant
July 28, 2024
.
6690e5a6af314ac3b68e3d51
Research Augmentation Hackathon: Supercharging AI Alignment
. Placed
9
.
Data Massager
July 28, 2024
.
6690e5a6af314ac3b68e3d51
Research Augmentation Hackathon: Supercharging AI Alignment
. Placed
9
.
Alignment Research Critiquer
July 28, 2024
.
6690e5a6af314ac3b68e3d51
Research Augmentation Hackathon: Supercharging AI Alignment
. Placed
9
.
Alignment Research Critiquer
July 29, 2024
.
6690e5a6af314ac3b68e3d51
Research Augmentation Hackathon: Supercharging AI Alignment
. Placed
9
.
Web App for Interacting with Refusal-Ablated Language Model Agents
August 25, 2024
.
66a7c53acd7d1c97a3b3dad0
AI capabilities and risks demo-jam: Creating visceral interactive demonstrations
. Placed
9
.
VerifyStream
August 25, 2024
.
66a7c53acd7d1c97a3b3dad0
AI capabilities and risks demo-jam: Creating visceral interactive demonstrations
. Placed
9
.
General Pervasiveness
August 27, 2024
.
66a7c53acd7d1c97a3b3dad0
AI capabilities and risks demo-jam: Creating visceral interactive demonstrations
. Placed
9
.
AdGPT
October 5, 2024
.
66a7c53acd7d1c97a3b3dad0
AI capabilities and risks demo-jam: Creating visceral interactive demonstrations
. Placed
9
.
Sleeper Agents Detector
August 25, 2024
.
66a7c53acd7d1c97a3b3dad0
AI capabilities and risks demo-jam: Creating visceral interactive demonstrations
. Placed
9
.
AI Research Paper Processor
August 25, 2024
.
66a7c53acd7d1c97a3b3dad0
AI capabilities and risks demo-jam: Creating visceral interactive demonstrations
. Placed
9
.
Unsolved AI Safety Concepts Explorer
August 25, 2024
.
66a7c53acd7d1c97a3b3dad0
AI capabilities and risks demo-jam: Creating visceral interactive demonstrations
. Placed
9
.
BBC News Impersonator
August 25, 2024
.
66a7c53acd7d1c97a3b3dad0
AI capabilities and risks demo-jam: Creating visceral interactive demonstrations
. Placed
9
.
RedFluence
August 25, 2024
.
66a7c53acd7d1c97a3b3dad0
AI capabilities and risks demo-jam: Creating visceral interactive demonstrations
. Placed
9
.
AI Agents for Personalized Interaction and Behavioral Analysis
August 25, 2024
.
66a7c53acd7d1c97a3b3dad0
AI capabilities and risks demo-jam: Creating visceral interactive demonstrations
. Placed
9
.
GrandSlam usecases not technology
August 25, 2024
.
66a7c53acd7d1c97a3b3dad0
AI capabilities and risks demo-jam: Creating visceral interactive demonstrations
. Placed
9
.
Misinformational AI-Generated Academic Papers
August 26, 2024
.
66a7c53acd7d1c97a3b3dad0
AI capabilities and risks demo-jam: Creating visceral interactive demonstrations
. Placed
9
.
Demonstrating LLM Code Injection Via Compromised Agent Tool
August 27, 2024
.
66a7c53acd7d1c97a3b3dad0
AI capabilities and risks demo-jam: Creating visceral interactive demonstrations
. Placed
9
.
Jailbreaking general purpose robots
September 1, 2024
.
66792de23b5e6f1a6eb18e3f
Hackathon for Technical AI Safety Startups
. Placed
9
.
GuardianAI
September 3, 2024
.
66792de23b5e6f1a6eb18e3f
Hackathon for Technical AI Safety Startups
. Placed
9
.
ÆLIGN: Aligned Agent-based Workflows via Collaboration & Safety Protocols
September 1, 2024
.
66792de23b5e6f1a6eb18e3f
Hackathon for Technical AI Safety Startups
. Placed
9
.
Amplified Wise Simulations for Safe Training and Deployment
September 4, 2024
.
66792de23b5e6f1a6eb18e3f
Hackathon for Technical AI Safety Startups
. Placed
9
.
CAMARA: A Comprehensive & Adaptive Multi-Agent framework for Red-Teaming and Adversarial Defense
September 1, 2024
.
66792de23b5e6f1a6eb18e3f
Hackathon for Technical AI Safety Startups
. Placed
9
.
WELMA: Open-world environments for Language Model agents
September 1, 2024
.
66792de23b5e6f1a6eb18e3f
Hackathon for Technical AI Safety Startups
. Placed
9
.
Devising Effective Bechmarks
September 1, 2024
.
66792de23b5e6f1a6eb18e3f
Hackathon for Technical AI Safety Startups
. Placed
9
.
Latent Space Clustering and Summarization
September 17, 2024
.
66e1cdf44e17beca5dc0c050
ARENA 4.0 Interpretability Hackathon
. Placed
9
.
minTranscoders
September 17, 2024
.
66e1cdf44e17beca5dc0c050
ARENA 4.0 Interpretability Hackathon
. Placed
9
.
nnsight transparent debugging
September 17, 2024
.
66e1cdf44e17beca5dc0c050
ARENA 4.0 Interpretability Hackathon
. Placed
9
.
Finding Circular Features in Gemma 2 2B
September 17, 2024
.
66e1cdf44e17beca5dc0c050
ARENA 4.0 Interpretability Hackathon
. Placed
9
.
Interpreting a toy model for finding the maximum element in a list
September 17, 2024
.
66e1cdf44e17beca5dc0c050
ARENA 4.0 Interpretability Hackathon
. Placed
9
.
LLM Agent Security: Jailbreaking Vulnerabilities and Mitigation Strategies
October 6, 2024
.
66792e7b43f57dc7a262ec11
Agent Security Hackathon
. Placed
9
.
Using ARC-AGI puzzles as CAPTCHa task
October 6, 2024
.
66792e7b43f57dc7a262ec11
Agent Security Hackathon
. Placed
9
.
An Autonomous Agent for Model Attribution
October 6, 2024
.
66792e7b43f57dc7a262ec11
Agent Security Hackathon
. Placed
9
.
AI Agent Capabilities Evolution
October 6, 2024
.
66792e7b43f57dc7a262ec11
Agent Security Hackathon
. Placed
9
.
AI Honeypot
October 6, 2024
.
66792e7b43f57dc7a262ec11
Agent Security Hackathon
. Placed
9
.
Intent Inspector - Protecting Against Prompt Injections for Agent Tool Misuse
October 6, 2024
.
66792e7b43f57dc7a262ec11
Agent Security Hackathon
. Placed
9
.
Inference-Time Agent Security
October 6, 2024
.
66792e7b43f57dc7a262ec11
Agent Security Hackathon
. Placed
9
.
Cross-model surveillance for emails handling
October 7, 2024
.
66792e7b43f57dc7a262ec11
Agent Security Hackathon
. Placed
9
.
Pan, your SMART Sustainability Expert
October 29, 2024
.
670822f88b8fdf04a35a4b76
AI Policy Hackathon at Johns Hopkins University
. Placed
.
Infectious Disease Outbreak Prediction and Dashboard
October 27, 2024
.
670822f88b8fdf04a35a4b76
AI Policy Hackathon at Johns Hopkins University
. Placed
9
.
Proposal for a Provisional FDA Designation Targeting Biomedical Products Evaluated with Novel Methodologies
October 27, 2024
.
670822f88b8fdf04a35a4b76
AI Policy Hackathon at Johns Hopkins University
. Placed
9
.
Proposal for U.S.-China Technical Cooperation on AI Safety
October 27, 2024
.
670822f88b8fdf04a35a4b76
AI Policy Hackathon at Johns Hopkins University
. Placed
9
.
Predictive Analytics & Imagery for Environmental Monitoring
October 27, 2024
.
670822f88b8fdf04a35a4b76
AI Policy Hackathon at Johns Hopkins University
. Placed
9
.
Policy Framework for Sustainable AI: Repurposing Waste Heat from Data Centers in the USA
October 27, 2024
.
670822f88b8fdf04a35a4b76
AI Policy Hackathon at Johns Hopkins University
. Placed
9
.
Politicians on AI Safety
October 27, 2024
.
670822f88b8fdf04a35a4b76
AI Policy Hackathon at Johns Hopkins University
. Placed
9
.
Reprocessing Nuclear Waste From Small Modular Reactors (SMRs)
October 29, 2024
.
670822f88b8fdf04a35a4b76
AI Policy Hackathon at Johns Hopkins University
. Placed
.
Enhancing Human Verification Systems to Address AI Agent Circumvention and Attributability Concerns
October 27, 2024
.
670822f88b8fdf04a35a4b76
AI Policy Hackathon at Johns Hopkins University
. Placed
9
.
Enviro - A Comprehensive Environmental Solution Using Policy and Technology
October 27, 2024
.
670822f88b8fdf04a35a4b76
AI Policy Hackathon at Johns Hopkins University
. Placed
9
.
Towards a Unified Framework for Cybersecurity and AI Safety: Recommendations for Secure Development of Large Language Models
October 27, 2024
.
670822f88b8fdf04a35a4b76
AI Policy Hackathon at Johns Hopkins University
. Placed
9
.
EcoNavix
October 27, 2024
.
670822f88b8fdf04a35a4b76
AI Policy Hackathon at Johns Hopkins University
. Placed
9
.
Mapping Intent: Documenting Policy Adherence with Ontology Extraction
October 27, 2024
.
670822f88b8fdf04a35a4b76
AI Policy Hackathon at Johns Hopkins University
. Placed
9
.
Hero Journey: Personalized Health Interventions for the Incarcerated
October 29, 2024
.
670822f88b8fdf04a35a4b76
AI Policy Hackathon at Johns Hopkins University
. Placed
9
.
AI and Public Health: TSA Pre Health Check
October 27, 2024
.
670822f88b8fdf04a35a4b76
AI Policy Hackathon at Johns Hopkins University
. Placed
9
.
AI ADVISORY COUNCIL FOR SUSTAINABLE ECONOMIC GROWTH AND ETHICAL INNOVATION IN THE DOMINICAN REPUBLIC (CANIA)
October 29, 2024
.
670822f88b8fdf04a35a4b76
AI Policy Hackathon at Johns Hopkins University
. Placed
5
.
Glia
October 27, 2024
.
670822f88b8fdf04a35a4b76
AI Policy Hackathon at Johns Hopkins University
. Placed
9
.
Next-Gen AI-Enhanced Epidemic Intelligence
October 27, 2024
.
670822f88b8fdf04a35a4b76
AI Policy Hackathon at Johns Hopkins University
. Placed
9
.
mHeatlth Ai
October 28, 2024
.
670822f88b8fdf04a35a4b76
AI Policy Hackathon at Johns Hopkins University
. Placed
9
.
AI Parliament
October 27, 2024
.
670822f88b8fdf04a35a4b76
AI Policy Hackathon at Johns Hopkins University
. Placed
9
.
Policy Analysis: AI and Sustainability: Climate Impact Monitoring
October 27, 2024
.
670822f88b8fdf04a35a4b76
AI Policy Hackathon at Johns Hopkins University
. Placed
9
.
Digital Rebellion: Analyzing misaligned AI agent cooperation for virtual labor strikes
October 27, 2024
.
670822f88b8fdf04a35a4b76
AI Policy Hackathon at Johns Hopkins University
. Placed
9
.
applai
October 27, 2024
.
670822f88b8fdf04a35a4b76
AI Policy Hackathon at Johns Hopkins University
. Placed
9
.
A Fundamental Rethinking to AI Evaluations: Establishing a Constitution-Based Framework
November 20, 2024
.
67240c8fb0416a4520d2b4b6
Howard University AI Safety Summit & Policy Hackathon
. Placed
9
.
Glia for Healthcare Organisations
November 20, 2024
.
670822f88b8fdf04a35a4b76
AI Policy Hackathon at Johns Hopkins University
. Placed
9
.
Grandfather Paradox in AI – Bias Mitigation & Ethical AI1
November 20, 2024
.
67240c8fb0416a4520d2b4b6
Howard University AI Safety Summit & Policy Hackathon
. Placed
9
.
AI Monitoring as a Rapid and Scalable Policy Solution: Weekly Global Bulletins on AI Developments
November 20, 2024
.
67240c8fb0416a4520d2b4b6
Howard University AI Safety Summit & Policy Hackathon
. Placed
9
.
A Critical Review of "Chips for Peace": Lessons from "Atoms for Peace"
November 20, 2024
.
67240c8fb0416a4520d2b4b6
Howard University AI Safety Summit & Policy Hackathon
. Placed
9
.
Implementing a Human-centered AI Assessment Framework (HAAF) for Equitable AI Development
November 20, 2024
.
67240c8fb0416a4520d2b4b6
Howard University AI Safety Summit & Policy Hackathon
. Placed
9
.
National Data Privacy and Governance Act
November 20, 2024
.
67240c8fb0416a4520d2b4b6
Howard University AI Safety Summit & Policy Hackathon
. Placed
9
.
Community-First: A Rights-Based Framework for AI Governance in India's Welfare Systems
November 20, 2024
.
67240c8fb0416a4520d2b4b6
Howard University AI Safety Summit & Policy Hackathon
. Placed
9
.
User Transparency Within AI
November 20, 2024
.
67240c8fb0416a4520d2b4b6
Howard University AI Safety Summit & Policy Hackathon
. Placed
9
.
Encouraging Chain-of-Thought Reasoning
November 24, 2024
.
6710eab8447f62cdea3a653c
Reprogramming AI Models Hackathon
. Placed
9
.
Recovering Goodfire's SAE feature vectors from their API
November 24, 2024
.
6710eab8447f62cdea3a653c
Reprogramming AI Models Hackathon
. Placed
9
.
Feature based unlearning
November 24, 2024
.
6710eab8447f62cdea3a653c
Reprogramming AI Models Hackathon
. Placed
9
.
Auto Prompt Injection
November 24, 2024
.
6710eab8447f62cdea3a653c
Reprogramming AI Models Hackathon
. Placed
9
.
Feature Tuning versus Prompting for Ambiguous Questions
November 24, 2024
.
6710eab8447f62cdea3a653c
Reprogramming AI Models Hackathon
. Placed
9
.
Let LLM Agents Perform LLM Surgery
November 24, 2024
.
6710eab8447f62cdea3a653c
Reprogramming AI Models Hackathon
. Placed
9
.
Investigating Feature Effects on Manipulation Susceptibility
November 24, 2024
.
6710eab8447f62cdea3a653c
Reprogramming AI Models Hackathon
. Placed
9
.
BBLLM
November 24, 2024
.
6710eab8447f62cdea3a653c
Reprogramming AI Models Hackathon
. Placed
9
.
Clear Thought and Clear Speech: Reducing Grammatical Scope Ambiguity
November 24, 2024
.
6710eab8447f62cdea3a653c
Reprogramming AI Models Hackathon
. Placed
9
.
Tentative proposal for AI control with weak supervisors trough Mechanistic Inspection
November 24, 2024
.
6710eab8447f62cdea3a653c
Reprogramming AI Models Hackathon
. Placed
9
.
Investigate arithmetic features in Multi-lingual LLMs
November 25, 2024
.
6710eab8447f62cdea3a653c
Reprogramming AI Models Hackathon
. Placed
9
.
Explaining Latents in Turing-LLM-1.0-254M with Pre-Defined Function Types
November 24, 2024
.
6710eab8447f62cdea3a653c
Reprogramming AI Models Hackathon
. Placed
9
.
Edufire - Personalized Education Platform Using LLM Steering
November 24, 2024
.
6710eab8447f62cdea3a653c
Reprogramming AI Models Hackathon
. Placed
9
.
Math Speaks All Languages: Enhancing LLM Problem-Solving Across Multilingual Contexts
November 24, 2024
.
6710eab8447f62cdea3a653c
Reprogramming AI Models Hackathon
. Placed
9
.
Assessing Language Model Cybersecurity Capabilities with Feature Steering
November 24, 2024
.
6710eab8447f62cdea3a653c
Reprogramming AI Models Hackathon
. Placed
9
.
Sparse Autoencoders and Gemma 2-2B: Pioneering Demographic-Sensitive Language Modeling for Opinion QA
November 24, 2024
.
6710eab8447f62cdea3a653c
Reprogramming AI Models Hackathon
. Placed
9
.
Improving Llama-3-8b Hallucination Robustness in Medical Q&A Using Feature Steering
November 24, 2024
.
670822f88b8fdf04a35a4b76
AI Policy Hackathon at Johns Hopkins University
. Placed
9
.
Can we steer a model’s behavior with just one prompt? investigating SAE-driven auto-steering
November 24, 2024
.
6710eab8447f62cdea3a653c
Reprogramming AI Models Hackathon
. Placed
9
.
Unveiling Latent Beliefs Using Sparse Autoencoders
November 24, 2024
.
6710eab8447f62cdea3a653c
Reprogramming AI Models Hackathon
. Placed
9
.
Improving Llama-3-8B-Instruct Hallucination Robustness in Medical Q&A Using Feature Steering
November 24, 2024
.
6710eab8447f62cdea3a653c
Reprogramming AI Models Hackathon
. Placed
9
.
Analyzing Dataset Bias with SAEs
November 25, 2024
.
6710eab8447f62cdea3a653c
Reprogramming AI Models Hackathon
. Placed
9
.
Faithful or Factual? Tuning Mistake Acknowledgment in LLMs
November 24, 2024
.
6710eab8447f62cdea3a653c
Reprogramming AI Models Hackathon
. Placed
9
.
SAGE: Safe, Adaptive Generation Engine for Long Form Document Generation in Collaborative, High Stakes Domains
November 24, 2024
.
6710eab8447f62cdea3a653c
Reprogramming AI Models Hackathon
. Placed
9
.
Bias Mitigation in LLM by Steering Features
November 25, 2024
.
6710eab8447f62cdea3a653c
Reprogramming AI Models Hackathon
. Placed
9
.
Modernizing DC’s Emergency Communications
November 26, 2024
.
670822f88b8fdf04a35a4b76
AI Policy Hackathon at Johns Hopkins University
. Placed
6
.