26 : 08 : 29 : 54

Keep Apart Research Going: Donate Today

Nov 25, 2024

Bias Mitigation in LLM by Steering Features

Akanksha Devkar

Details

Arrow

To ensure that we create a safe and unbiased path to AGI, we must calibrate the biases in our LLMs. And with this goal in mind, I worked on testing Goodfire SDK and the steering features to mitigate bias in the recently help Apart Research x Goodfire-led hackathon on ‘Reprogramming AI Models’.

Cite this work:

@misc {

title={

Bias Mitigation in LLM by Steering Features

},

author={

Akanksha Devkar

},

date={

11/25/24

},

organization={Apart Research},

note={Research submission to the research sprint hosted by Apart.},

howpublished={https://apartresearch.com}

}

This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.