# openai-codex-sdk (OpenAI Codex SDK Examples)

The OpenAI Codex SDK provider enables agentic code analysis and generation evals with thread-based conversations and Git-aware operations.

You can run this example with:

```bash
npx promptfoo@latest init --example openai-codex-sdk
cd openai-codex-sdk
```

## Setup

Install the OpenAI Codex SDK:

```bash
npm install @openai/codex-sdk
```

**Requirements**: Node.js 20.20+ or 22.22+

Authenticate with Codex using one of these options:

1. Sign in with ChatGPT through the Codex CLI:

```bash
codex
```

2. Or set your OpenAI API key:

```bash
export OPENAI_API_KEY=your_api_key_here
# or
export CODEX_API_KEY=your_api_key_here
```

When no `apiKey`, `OPENAI_API_KEY`, or `CODEX_API_KEY` is set, promptfoo will let the Codex SDK reuse an existing Codex login.

## Examples

### Basic Usage

Simple code generation with `sandbox_mode: read-only` so Codex can answer from the prompt without writing files. The example also sets `skip_git_repo_check: true` so it works in a standalone example directory that is not a Git repo.

This basic example uses only deterministic string assertions, so it can run with either a Codex login or an API key without needing a separate grader model credential.

**Location**: `./basic/`

**Usage**:

```bash
(cd basic && promptfoo eval)
```

### Skills Testing

This example demonstrates evaluating a local Codex skill stored under `.agents/skills/`.

- **Local skill discovery**: Codex discovers `SKILL.md` from the sample project's `.agents/skills/` directory
- **Skill assertions**: Verifies confirmed skill usage with the `skill-used` assertion over normalized `metadata.skillCalls`
- **Trace assertions**: `promptfooconfig.tracing.yaml` enables OTEL deep tracing and asserts on the traced command that reads `SKILL.md`
- **Isolated Codex home**: Uses a project-local `CODEX_HOME` so personal skills and config do not leak into the eval
- **Controlled shell environment**: Promptfoo now passes a minimal shell environment by default, so the tracing config can override `CODEX_HOME` without inheriting unrelated process secrets while still preserving a usable `PATH`

`metadata.skillCalls` only includes confirmed successful skill reads. When Promptfoo sees more candidate `SKILL.md` paths than confirmed successful reads, it also emits `metadata.attemptedSkillCalls` for debugging.

`metadata.skillCalls` and `metadata.attemptedSkillCalls` are heuristic: Promptfoo infers them from direct command references to `SKILL.md`. Wildcard paths are ignored, and absolute `.agents/...` paths outside the active repo are ignored.

**Location**: `./skills/`

**Usage**:

```bash
(cd skills && promptfoo eval)

# Trace the skill's internal command activity
(cd skills && promptfoo eval -c promptfooconfig.tracing.yaml)
```

Relative `working_dir` values resolve from the config file's directory, so the sample project path stays stable regardless of where you invoke `promptfoo eval`. Codex resolves `CODEX_HOME` itself, so set `CODEX_HOME_OVERRIDE` to an absolute path when you run these configs from another working directory or need Codex to use a different home directory.
The checked-in relative paths stay local to the config directory so the examples remain self-contained.

The checked-in `sample-codex-home` fixture is intentionally empty of auth state. Use it with `OPENAI_API_KEY`/`CODEX_API_KEY`, or point `CODEX_HOME_OVERRIDE` at `$HOME/.codex` when you want to reuse a local Codex login.

### Skill Comparison

This example compares two versions of the same local Codex skill against identical review tasks.

- **Versioned fixtures**: Each provider points at a different `working_dir` with its own `review-standards` skill
- **Outcome scoring**: A JavaScript assertion scores issue recall and precision for each response
- **Winner selection**: `max-score` picks the strongest skill version for each task after combining routing, correctness, cost, and latency signals

**Location**: `./skill-comparison/`

**Usage**:

```bash
(cd skill-comparison && promptfoo eval --no-cache)
```

If you run this config from the repo root, set `CODEX_SKILL_COMPARE_V1_DIR` and `CODEX_SKILL_COMPARE_V2_DIR` to the absolute fixture paths first.

If your network requires proxy or custom certificate environment variables such as `HTTP_PROXY`, `HTTPS_PROXY`, `ALL_PROXY`, `NO_PROXY`, `SSL_CERT_FILE`, or `NODE_EXTRA_CA_CERTS`, pass them through `config.cli_env` or set `inherit_process_env: true` in the provider config. Promptfoo intentionally does not forward the full process environment to Codex by default.

### Thread Persistence

This example demonstrates `persist_threads: true` with one prompt template and multiple tests. It checks that Codex can remember a marker from the first test when answering the second test.

**Location**: `./thread-persistence/`

**Usage**:

```bash
(cd thread-persistence && promptfoo eval)
```

### Sandbox Enforcement

This example runs Codex in `read-only` mode and asks it to create a file. The assertion checks that the model reports a write denial, and you can also inspect the sample workspace after the eval to confirm no file was created.

**Location**: `./sandbox/`

**Usage**:

```bash
(cd sandbox && promptfoo eval)
```

If you run this config from the repo root, set `CODEX_SANDBOX_WORKING_DIR="$PWD/examples/openai-codex-sdk/sandbox/sample-workspace"`.

## Key Features

- **Thread Persistence**: Conversations saved to `~/.codex/sessions`
- **Git Integration**: Automatic repository detection (can be disabled)
- **Structured Output**: Native JSON schema support with Zod
- **Streaming Events**: Real-time progress updates
- **Custom Binary**: Override Codex binary path with `codex_path_override`

## Configuration Options

See [documentation](https://www.promptfoo.dev/docs/providers/openai-codex-sdk/) for full details.