# openai-codex-sdk/skill-comparison (Codex Skill Comparison)

You can run this example with:

```bash
npx promptfoo@latest init --example openai-codex-sdk/skill-comparison
cd openai-codex-sdk/skill-comparison
```

## Overview

This example compares two versions of the same Codex skill against identical review tasks.

- `fixtures/v1` contains a narrower `review-standards` skill that only calls out weak password hashing.
- `fixtures/v2` contains a stronger version that also checks timing-safe secret comparison.
- Both providers share an `output_schema` (declared once via a YAML anchor) so each response is guaranteed to match the review JSON shape.
- The eval verifies `skill-used`, scores issue recall and precision, and uses `max-score` to select the best output for each test case.

Run it from this directory with:

```bash
promptfoo eval --no-cache
```

If you run it from another directory, set these environment variables first:

```bash
export CODEX_SKILL_COMPARE_V1_DIR=/absolute/path/to/fixtures/v1
export CODEX_SKILL_COMPARE_V2_DIR=/absolute/path/to/fixtures/v2
```

The checked-in `sample-codex-home` directory is intentionally empty of auth state. Use an API key, or set `CODEX_HOME_OVERRIDE="$HOME/.codex"` to reuse a local Codex login.

Because this example uses `max-score`, the weaker candidate is expected to fail when Promptfoo marks the stronger output as the winner for a test case.