# policy evals This suite evaluates the `PolicyPlugin` test generator itself. It compares five native `promptfoo redteam generate` cases: - normal single-input generation - policy text with explicit test-generation instructions - modifier-driven generation (Spanish output) - multi-input generation with coordinated `document` / `query` attacks - log-analysis generation with `PromptBlock:` output The eval flow is: - run `promptfoo redteam generate` against each case config under `cases/` - normalize the generated YAML into a stable JSON payload - feed that payload into Promptfoo assertions and `llm-rubric` checks through an executable prompt That keeps the suite on Promptfoo's real CLI generation path instead of using a custom harness provider. ## Prerequisites - `OPENAI_API_KEY` available in your environment or in `.env` ## Run From the repository root: ```bash npm run local -- validate -c src/redteam/plugins/policy/evals/promptfooconfig.yaml npm run local -- eval -c src/redteam/plugins/policy/evals/promptfooconfig.yaml --env-file .env --no-cache ``` To generate any single comparison case directly: ```bash npm run local -- redteam generate -c src/redteam/plugins/policy/evals/cases/normal-single-input.yaml -o /tmp/policy-normal.yaml --force ``` ## Files - `promptfooconfig.yaml` - eval suite - `generatePolicyEvalPrompt.cjs` - executable prompt that runs `redteam generate` for one case and emits normalized JSON - `cases/*.yaml` - native redteam generation configs being compared - `tests/policy-generation.yaml` - case metadata and Promptfoo assertions