This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
Accepted at the 
 research sprint on 
November 27, 2023

Towards High-Quality Model-Written Evaluations

We aimed to improve the method of generating model-written evaluations for LLMs based a method called Evol-Instruct, which uses LLMs to create complex instructions. We retargeted Evol-Instruct to generate high-quality model evaluations instead, focusing particularly on evaluations for situational awareness. We then compared these evaluations with those generated by the model-written evaluations through few-shot generation. Contrary to our expectations, we observed a consistent decrease in evaluation quality, indicating that our method did not enhance the quality of model-generated evaluations as we had hoped.

Jannes Elstner, Jaime Raldua Veuthey
