--- sidebar_label: Image Inputs title: Image Jailbreaking Strategy description: Probe vision-language model security with image-encoded text attacks targeting content filter and safety mechanism bypasses --- # Image Jailbreaking The Image strategy converts prompt text into an image and then encodes that image as a base64 string. This approach enables testing how AI systems handle images of text, which may potentially bypass text-based guardrails / content filters or lead to different behaviors than when processing plain text. ## Why Use This Strategy This strategy helps security researchers and AI developers: 1. **Test model capabilities**: Assess if models can extract and process text from base64-encoded images 2. **Evaluate security measures**: Determine if image-encoded text can bypass content filters that typically scan plain text 3. **Assess multi-modal behavior**: Identify differences in how models respond to the same content in different formats 4. **Discover inconsistencies**: Reveal potential vulnerabilities by comparing text-based and image-based processing pathways ## How It Works The strategy performs the following operations: 1. Takes the original text from your test case 2. Renders the text onto a blank PNG image with a white background and black text 3. Converts the PNG image to a base64 string 4. Replaces the original text in your test case with the base64-encoded image The resulting test case contains the same semantic content as the original but in a different format that may be processed differently by AI systems. ## Text-to-Image Conversion Example Below is an example of converting a text prompt into an image from our `harmful:hate` plugin. ![Example of text converted to an image](/img/docs/redteam/harmful-content-example.png) After encoding, this image is converted to base64, which begins like: ``` iVBORw0KGgoAAAANSUhEUgAAA... ``` ## Implementation Use it in your promptfooconfig.yaml like this: ```yaml title="promptfooconfig.yaml" # yaml-language-server: $schema=https://promptfoo.dev/config-schema.json description: 'Multi-modal redteam' targets: - bedrock:amazon.nova-lite-v1:0 prompts: - file://path/to/your/prompt.json redteam: plugins: - ... strategies: - image - id: basic config: enabled: false ``` :::note You should only disable all other strategies when using the image strategy. You can do this by setting `enabled: false` for the basic strategy and removing other strategies from the strategies array. ::: Your prompt.json file should look like this: ```json title="prompt.json" [ { "role": "system", "content": "You are a helpful assistant" }, { "role": "user", "content": [ { "image": { "format": "png", "source": { "bytes": "{{image}}" } } } ] } ] ``` :::note You should update the prompt.json to match the prompt format of your LLM provider. Base64 images are all encoded as PNG images. ::: :::note The `{{image}}` syntax in the examples is a Nunjucks template variable. When promptfoo processes your prompt, it replaces `{{image}}` with the base64-encoded image data. ::: :::tip This strategy requires you to install the `sharp` package for image creation. ``` npm i sharp ``` ::: ## Related Concepts - [Audio Jailbreaking](audio.md) - Similar approach using speech audio instead of images - [Video Jailbreaking](video.md) - Similar approach using video instead of images - [Base64 Encoding](base64.md) - Similar encoding technique using text-to-base64 conversion - [Multi-Modal Red Teaming Guide](/docs/guides/multimodal-red-team) - Comprehensive guide for testing multi-modal models - [Types of LLM vulnerabilities](/docs/red-team/llm-vulnerability-types/) - Full vulnerability and plugin directory with category mapping - [Red Team Strategies](/docs/red-team/strategies/) - Full strategy catalog