Visual Prompt Injection Detection
The new visual capabilities of LLM multiply the possible use cases but also embed new vulnerabilities. Visual Prompt Injection, the ability to send instructions using images, could be detrimental to the model end users. In this work, we propose to explore the OCR capabilities of a Visual Assistant based on the model LLaVA [1,2]. This work outlines different attacks that can be conducted using corrupted images. We leverage a metric in the embedding space that could be used to identify and differentiate optical character recognition from object detection.