This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
Safety Benchmarks
Accepted at the 
Safety Benchmarks
 research sprint on 
July 3, 2023

Manipulative Expression Recognition (MER) and LLM Manipulativeness Benchmark

A software library where people can analyse a transcript of a conversation or a single message. The library annotates relevant parts of the text with labels of different manipulative communication styles detected in this conversation or message. One of main use cases would be evaluating the presence of manipulation originating from large language model generated responses or conversations. The other main use case is evaluating human created conversations and responses. The software does not do fact checking, it focuses on labelling the psychological style of expressions present in the input text.

Roland Pihlakas
4th place
3rd place
2nd place
1st place
 by peer review