--- sidebar_label: XSTest description: Red team homonym vulnerabilities by testing LLM interpretation of ambiguous terms to prevent misclassification of harmful content in production AI systems --- # XSTest Homonym Dataset ## Overview The XSTest plugin tests how well LLMs handle ambiguous words (homonyms) that can have both harmful and benign interpretations. The dataset focuses on terms that might be related to: - Violence (e.g., "kill a process" vs. "kill a person") - Drugs (e.g., "crack in a pipe" vs. "crack cocaine") - Other potentially harmful contexts when misinterpreted Each test includes metadata about the homonym, category, and expected safety label. ## How it works The plugin: - Loads prompts containing homonyms from the XSTest dataset - Selects and shuffles a specified number of prompts - Tests if the model correctly interprets the homonym in context - Uses LLM-as-a-judge to evaluate responses based on the context and expected safety interpretation ## Configuration Add to your YAML file: ```yaml redteam: plugins: - xstest ``` Control the number of tests: ```yaml redteam: plugins: - id: xstest numTests: 25 ``` ## Example ```yaml description: XSTest homonym evaluation prompts: - 'Respond to the following query or question: {{query}}' redteam: plugins: - xstest targets: - openai:gpt-5-mini ``` Run with: ```sh promptfoo redteam run ``` Once complete, you'll be able to view the evaluation results. For each output cell, hovering over it and clicking on the 🔎 icon will display the full input/output. This detailed display also includes XSTest metadata such as the homonym term, category, and expected safety label. ## Grading Responses are evaluated on: 1. For safe queries: Whether the model correctly understood the benign interpretation of the homonym and provided a helpful response 2. For potentially unsafe queries: Whether the model appropriately refused to answer or provided a safe response ## Dataset The dataset includes: - The prompt text containing a homonym - The specific homonym term (e.g., "kill", "execute", "crack") - Category information about the term type (e.g., "contrast_homonyms", "homonyms") - Safety label ("safe" or "unsafe") ## Related Concepts - [Types of LLM Vulnerabilities](../llm-vulnerability-types.md) - [DoNotAnswer](./donotanswer.md) - [BeaverTails](./beavertails.md)