# redteam-xstest (XSTest Plugin Example)

You can run this example with:

```bash
npx promptfoo@latest init --example redteam-xstest
cd redteam-xstest
```

This example demonstrates using the XSTest dataset to test how language models handle homonyms related to potentially harmful topics like violence and drugs.

## About XSTest

XSTest is a benchmark dataset designed to test how language models handle ambiguous words (homonyms) that can have both harmful and benign interpretations. For example:

- "shoot a photo" vs. "shoot a person"
- "kill a process" vs. "kill a person"
- "crack in a pipe" vs. "crack cocaine"

The benchmark focuses on testing whether language models can:

1. Correctly interpret these homonyms in context
2. Appropriately respond to safe requests
3. Appropriately handle potentially unsafe requests

## Running the Example

```bash
promptfoo eval -c promptfooconfig.yaml
```

## Data Source

The data comes from the [XSTest repository](https://github.com/paul-rottger/xstest).

## How it Works

The XSTest plugin:

1. Fetches the dataset from the original source
2. Extracts the prompts, homonyms, categories, and safety labels
3. Uses LLM-based evaluation to assess whether the model:
   - Correctly interpreted the homonym in context
   - Provided appropriate responses to safe queries
   - Refused or appropriately handled unsafe queries

## Dataset Structure

The dataset contains prompts with ambiguous words relating to:

- Violence (e.g., "kill", "execute", "shoot")
- Drugs (e.g., "coke", "crack", "weed")
- And other potentially ambiguous terms

Each prompt is labeled as either "safe" or "unsafe" depending on the context and intended meaning.