# openai-codex-sdk (OpenAI Codex SDK Examples) The OpenAI Codex SDK provider enables agentic code analysis and generation evals with thread-based conversations and Git-aware operations. You can run this example with: ```bash npx promptfoo@latest init --example openai-codex-sdk cd openai-codex-sdk ``` ## Setup Install the OpenAI Codex SDK: ```bash npm install @openai/codex-sdk ``` **Requirements**: Node.js 20.20+ or 22.22+ Authenticate with Codex using one of these options: 1. Sign in with ChatGPT through the Codex CLI: ```bash codex ``` 2. Or set your OpenAI API key: ```bash export OPENAI_API_KEY=your_api_key_here # or export CODEX_API_KEY=your_api_key_here ``` When no `apiKey`, `OPENAI_API_KEY`, or `CODEX_API_KEY` is set, promptfoo will let the Codex SDK reuse an existing Codex login. ## Examples ### Basic Usage Simple code generation with `sandbox_mode: read-only` so Codex can answer from the prompt without writing files. The example also sets `skip_git_repo_check: true` so it works in a standalone example directory that is not a Git repo. This basic example uses only deterministic string assertions, so it can run with either a Codex login or an API key without needing a separate grader model credential. **Location**: `./basic/` **Usage**: ```bash (cd basic && promptfoo eval) ``` ### Skills Testing This example demonstrates evaluating a local Codex skill stored under `.agents/skills/`. - **Local skill discovery**: Codex discovers `SKILL.md` from the sample project's `.agents/skills/` directory - **Skill assertions**: Verifies confirmed skill usage with the `skill-used` assertion over normalized `metadata.skillCalls` - **Trace assertions**: `promptfooconfig.tracing.yaml` enables OTEL deep tracing and asserts on the traced command that reads `SKILL.md` - **Isolated Codex home**: Uses a project-local `CODEX_HOME` so personal skills and config do not leak into the eval - **Controlled shell environment**: Promptfoo now passes a minimal shell environment by default, so the tracing config can override `CODEX_HOME` without inheriting unrelated process secrets while still preserving a usable `PATH` `metadata.skillCalls` only includes confirmed successful skill reads. When Promptfoo sees more candidate `SKILL.md` paths than confirmed successful reads, it also emits `metadata.attemptedSkillCalls` for debugging. `metadata.skillCalls` and `metadata.attemptedSkillCalls` are heuristic: Promptfoo infers them from direct command references to `SKILL.md`. Wildcard paths are ignored, and absolute `.agents/...` paths outside the active repo are ignored. **Location**: `./skills/` **Usage**: ```bash (cd skills && promptfoo eval) # Trace the skill's internal command activity (cd skills && promptfoo eval -c promptfooconfig.tracing.yaml) ``` Relative `working_dir` values resolve from the config file's directory, so the sample project path stays stable regardless of where you invoke `promptfoo eval`. Codex resolves `CODEX_HOME` itself, so set `CODEX_HOME_OVERRIDE` to an absolute path when you run these configs from another working directory or need Codex to use a different home directory. The checked-in relative paths stay local to the config directory so the examples remain self-contained. The checked-in `sample-codex-home` fixture is intentionally empty of auth state. Use it with `OPENAI_API_KEY`/`CODEX_API_KEY`, or point `CODEX_HOME_OVERRIDE` at `$HOME/.codex` when you want to reuse a local Codex login. ### Skill Comparison This example compares two versions of the same local Codex skill against identical review tasks. - **Versioned fixtures**: Each provider points at a different `working_dir` with its own `review-standards` skill - **Outcome scoring**: A JavaScript assertion scores issue recall and precision for each response - **Winner selection**: `max-score` picks the strongest skill version for each task after combining routing, correctness, cost, and latency signals **Location**: `./skill-comparison/` **Usage**: ```bash (cd skill-comparison && promptfoo eval --no-cache) ``` If you run this config from the repo root, set `CODEX_SKILL_COMPARE_V1_DIR` and `CODEX_SKILL_COMPARE_V2_DIR` to the absolute fixture paths first. If your network requires proxy or custom certificate environment variables such as `HTTP_PROXY`, `HTTPS_PROXY`, `ALL_PROXY`, `NO_PROXY`, `SSL_CERT_FILE`, or `NODE_EXTRA_CA_CERTS`, pass them through `config.cli_env` or set `inherit_process_env: true` in the provider config. Promptfoo intentionally does not forward the full process environment to Codex by default. ### Thread Persistence This example demonstrates `persist_threads: true` with one prompt template and multiple tests. It checks that Codex can remember a marker from the first test when answering the second test. **Location**: `./thread-persistence/` **Usage**: ```bash (cd thread-persistence && promptfoo eval) ``` ### Sandbox Enforcement This example runs Codex in `read-only` mode and asks it to create a file. The assertion checks that the model reports a write denial, and you can also inspect the sample workspace after the eval to confirm no file was created. **Location**: `./sandbox/` **Usage**: ```bash (cd sandbox && promptfoo eval) ``` If you run this config from the repo root, set `CODEX_SANDBOX_WORKING_DIR="$PWD/examples/openai-codex-sdk/sandbox/sample-workspace"`. ## Key Features - **Thread Persistence**: Conversations saved to `~/.codex/sessions` - **Git Integration**: Automatic repository detection (can be disabled) - **Structured Output**: Native JSON schema support with Zod - **Streaming Events**: Real-time progress updates - **Custom Binary**: Override Codex binary path with `codex_path_override` ## Configuration Options See [documentation](https://www.promptfoo.dev/docs/providers/openai-codex-sdk/) for full details.