---
sidebar_position: 42
title: OpenAI Codex App Server
description: Evaluate Codex app-server with streamed agent events, approvals, sandboxing controls, and thread metadata through the Promptfoo JSON-RPC provider guide.
---

# OpenAI Codex App Server

This provider starts `codex app-server` as a local child process and drives the Codex app-server JSON-RPC protocol from promptfoo. Use it when you need to eval the rich client surface of Codex: streamed agent items, approvals, skills, plugins, app connector events, command/file trajectories, and thread lifecycle metadata.

For CI and straightforward automation, prefer the [OpenAI Codex SDK provider](./openai-codex-sdk.md). The app-server protocol is experimental, broader than the SDK, and designed for rich product integrations.

## Provider IDs

```yaml
providers:
  - openai:codex-app-server
  - openai:codex-app-server:gpt-5.5
  - openai:codex-desktop
  - openai:codex-desktop:gpt-5.5
```

`openai:codex-desktop` is an alias for the same app-server protocol. Promptfoo starts its own `codex app-server` process; it does not attach to an already-running Codex Desktop app process.

## Codex SDK vs App Server vs Desktop App

Keep this provider separate from the Codex SDK provider. They share Codex concepts, but they expose different runtime contracts.

| Surface           | Best for                                      | Runtime                                              | Promptfoo provider                                 |
| ----------------- | --------------------------------------------- | ---------------------------------------------------- | -------------------------------------------------- |
| Codex SDK         | CI, automation, simple agentic coding evals   | `@openai/codex-sdk` library                          | [`openai:codex-sdk`](./openai-codex-sdk.md)        |
| Codex app-server  | Rich-client protocol behavior and event evals | Local `codex app-server` child process over JSON-RPC | `openai:codex-app-server` / `openai:codex-desktop` |
| Codex Desktop app | Interactive human work in the desktop product | Native app process and UI                            | Not attached directly                              |

Use this provider when the thing being tested depends on app-server-only behavior such as approval request payloads, streamed item notifications, app connector events, plugin/skill metadata, or thread lifecycle operations. Use the SDK provider when you only need final Codex output, thread reuse, structured output, and traced shell/MCP/search/file steps.

## What Promptfoo Can and Can't Evaluate

| Eval surface                                    | Supported? | Notes                                                                                            |
| ----------------------------------------------- | ---------- | ------------------------------------------------------------------------------------------------ |
| Final assistant text                            | Yes        | Returned in `response.output` as a string.                                                       |
| Text, image, local image, skill, mention inputs | Yes        | Pass plain text or a JSON array of supported app-server input items.                             |
| JSON schema output                              | Yes        | Pass `output_schema`; assert with `is-json` or parse `output` yourself.                          |
| Token usage and estimated cost                  | Yes        | Token usage is read from `thread/tokenUsage/updated`; cost needs a known model id.               |
| Thread IDs and turn IDs                         | Yes        | Available under `sessionId` and `metadata.codexAppServer`.                                       |
| Approval, permission, MCP, and tool requests    | Yes        | `server_request_policy` gives deterministic responses for non-interactive evals.                 |
| Streamed item metadata                          | Yes        | Command, file, MCP, dynamic tool, web search, reasoning, and agent-message items are normalized. |
| Deep app-server tracing                         | Yes        | Enable `deep_tracing` to inject OTEL env vars into a fresh app-server process per row.           |
| Live partial output in assertions               | No         | Promptfoo receives the final provider response after the turn completes.                         |
| Attaching to an existing Desktop app            | No         | Promptfoo owns a separate app-server child process.                                              |
| WebSocket transport                             | No         | The provider uses stdio; app-server WebSocket mode remains experimental upstream.                |

When `service_tier: fast` is used, Promptfoo still reports only the standard model-rate estimate from the returned token ledger. The app-server payload does not expose enough billing metadata to convert Codex fast-mode credit consumption into an exact spend figure.

## Setup

Install the Codex CLI and sign in:

```bash
npm i -g @openai/codex
codex
```

You can also authenticate with an API key:

```bash
export OPENAI_API_KEY=your_api_key_here
```

Promptfoo also accepts `CODEX_API_KEY` or `config.apiKey`. For reproducible evals, prefer API-key-backed runs or set `cli_env.CODEX_HOME` to a fixture home directory that already contains the intended Codex login state.

## Basic Usage

```yaml title="promptfooconfig.yaml"
providers:
  - id: openai:codex-app-server:gpt-5.5
    config:
      sandbox_mode: read-only
      approval_policy: never

prompts:
  - 'Review this repository and summarize the highest-risk code paths.'
```

The provider returns Codex's final assistant text as `output`. It also records thread ids, turn ids, item counts, command/file/tool metadata, approval decisions, and token usage under `metadata.codexAppServer`.

## Safety Defaults

The app-server protocol can expose shell, filesystem, config, plugin, MCP, and app connector surfaces. Promptfoo defaults to deterministic eval behavior:

| Option                | Default       |
| --------------------- | ------------- |
| `sandbox_mode`        | `read-only`   |
| `approval_policy`     | `never`       |
| `ephemeral`           | `true`        |
| `thread_cleanup`      | `unsubscribe` |
| `reuse_server`        | `true`        |
| `inherit_process_env` | `false`       |

Approval requests are answered without blocking:

| Request type                            | Default response       |
| --------------------------------------- | ---------------------- |
| `item/commandExecution/requestApproval` | `decline`              |
| `item/fileChange/requestApproval`       | `decline`              |
| `item/permissions/requestApproval`      | empty grant            |
| `item/tool/requestUserInput`            | empty answers          |
| `mcpServer/elicitation/request`         | `decline`              |
| `item/tool/call`                        | failed static response |

Use `accept`, `acceptForSession`, permission grants, or MCP elicitation acceptance only in isolated workspaces where side effects are acceptable.

## Configuration

The provider validates top-level provider config strictly. Prompt-level config is parsed more leniently because promptfoo merges generic test options into `prompt.config`; unrelated keys are ignored there, while invalid values for known Codex fields still return a row-level provider error.

| Parameter                 | Type          | Description                                                                                                                                                         | Default              |
| ------------------------- | ------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------- |
| `apiKey`                  | string        | OpenAI API key. Optional when Codex is already signed in.                                                                                                           | Environment variable |
| `base_url`                | string        | Custom OpenAI-compatible base URL. Also passed as `OPENAI_BASE_URL` and `OPENAI_API_BASE_URL`.                                                                      | None                 |
| `working_dir`             | string        | Directory Codex operates in. Relative values resolve from the directory containing the config file.                                                                 | Current process dir  |
| `additional_directories`  | string[]      | Additional directories added to workspace-write sandbox roots.                                                                                                      | None                 |
| `skip_git_repo_check`     | boolean       | Skip the default Git repository safety check.                                                                                                                       | `false`              |
| `codex_path_override`     | string        | Path to a specific `codex` binary.                                                                                                                                  | `codex`              |
| `model`                   | string        | Model id, such as `gpt-5.5`. Can also be set in the provider id.                                                                                                    | Codex default        |
| `model_provider`          | string        | App-server model provider override for `thread/start` and `thread/resume`.                                                                                          | None                 |
| `service_tier`            | string        | `fast` or `flex`.                                                                                                                                                   | App-server default   |
| `sandbox_mode`            | string        | `read-only`, `workspace-write`, or `danger-full-access`.                                                                                                            | `read-only`          |
| `sandbox_policy`          | object        | Raw app-server sandbox policy override for `turn/start`.                                                                                                            | Generated from mode  |
| `network_access_enabled`  | boolean       | Adds network access to generated sandbox policies.                                                                                                                  | `false`              |
| `approval_policy`         | string/object | `never`, `on-request`, `on-failure`, `untrusted`, or granular approval policy object.                                                                               | `never`              |
| `approvals_reviewer`      | string        | `user` or `guardian_subagent`.                                                                                                                                      | App-server default   |
| `model_reasoning_effort`  | string        | `none`, `minimal`, `low`, `medium`, `high`, or `xhigh`.                                                                                                             | App-server default   |
| `reasoning_summary`       | string        | `auto`, `concise`, `detailed`, or `none`.                                                                                                                           | App-server default   |
| `personality`             | string        | `none`, `friendly`, or `pragmatic`.                                                                                                                                 | App-server default   |
| `base_instructions`       | string        | Base instructions passed to `thread/start` and `thread/resume`.                                                                                                     | None                 |
| `developer_instructions`  | string        | Developer instructions passed to `thread/start` and `thread/resume`.                                                                                                | None                 |
| `collaboration_mode`      | object        | Experimental collaboration mode passed to `turn/start`.                                                                                                             | None                 |
| `output_schema`           | object        | JSON Schema passed to `turn/start`.                                                                                                                                 | None                 |
| `thread_id`               | string        | Resume an existing Codex thread.                                                                                                                                    | None                 |
| `persist_threads`         | boolean       | Reuse threads across rows with the same prompt template and config.                                                                                                 | `false`              |
| `thread_pool_size`        | number        | Max cached thread count when `persist_threads` is enabled.                                                                                                          | `1`                  |
| `thread_cleanup`          | string        | `unsubscribe`, `archive`, or `none` for non-persistent threads. Resumed `thread_id` rows unsubscribe by default; `archive` is ignored for user-supplied thread IDs. | `unsubscribe`        |
| `ephemeral`               | boolean       | Create ephemeral threads by default.                                                                                                                                | `true`               |
| `experimental_raw_events` | boolean       | Ask app-server to emit raw Responses API items.                                                                                                                     | `false`              |
| `experimental_api`        | boolean       | Opt into experimental app-server protocol fields during `initialize`.                                                                                               | `true`               |
| `include_raw_events`      | boolean       | Include protocol notifications in `raw`.                                                                                                                            | `false`              |
| `cli_config`              | object        | Extra `codex app-server -c key=value` config overrides.                                                                                                             | None                 |
| `cli_env`                 | object        | Extra environment variables for the app-server process.                                                                                                             | Minimal shell env    |
| `inherit_process_env`     | boolean       | Merge the full Node.js environment into the app-server process.                                                                                                     | `false`              |
| `reuse_server`            | boolean       | Reuse the app-server process across rows. Disabled for `deep_tracing`.                                                                                              | `true`               |
| `deep_tracing`            | boolean       | Inject OTEL env vars into a fresh app-server process per call.                                                                                                      | `false`              |
| `request_timeout_ms`      | number        | JSON-RPC request timeout.                                                                                                                                           | `30000`              |
| `startup_timeout_ms`      | number        | `initialize` timeout.                                                                                                                                               | `30000`              |
| `turn_timeout_ms`         | number        | Overall turn timeout.                                                                                                                                               | None                 |
| `server_request_policy`   | object        | Deterministic responses for approvals, user input, MCP elicitations, and dynamic tools.                                                                             | Safe declines        |

### Granular Approval Policy

```yaml
providers:
  - id: openai:codex-app-server:gpt-5.5
    config:
      approval_policy:
        granular:
          sandbox_approval: true
          rules: true
          skill_approval: false
          request_permissions: true
          mcp_elicitations: true
```

### Collaboration Mode

```yaml
providers:
  - id: openai:codex-app-server:gpt-5.5
    config:
      collaboration_mode:
        mode: plan
        settings:
          model: gpt-5.5
          reasoning_effort: none
          developer_instructions: null
```

`collaboration_mode` is experimental and is sent on `turn/start`. App-server may let the selected mode override model, reasoning effort, or developer instructions for the turn.

## Server Request Policy

Configure deterministic responses when you intentionally want app-server approval flows:

```yaml
providers:
  - id: openai:codex-app-server:gpt-5.5
    config:
      sandbox_mode: workspace-write
      approval_policy: on-request
      server_request_policy:
        command_execution: decline
        file_change: decline
        user_input:
          severity: high
        mcp_elicitation:
          action: accept
          content:
            severity: low
          _meta:
            source: promptfoo
        permissions:
          scope: session
          permissions:
            network:
              enabled: true
            fileSystem:
              read:
                - /tmp/fixture
              write: null
        dynamic_tools:
          classify:
            success: true
            text: '{"label":"safe"}'
```

For command execution approvals, `command_execution` may also be an app-server decision object:

```yaml
server_request_policy:
  command_execution:
    applyNetworkPolicyAmendment:
      network_policy_amendment:
        host: registry.npmjs.org
        action: allow
```

Legacy `execCommandApproval` and `applyPatchApproval` callbacks are also handled for older app-server versions. Advanced command decision objects are only supported on the modern `item/commandExecution/requestApproval` flow.

## Structured Output

```yaml title="promptfooconfig.yaml"
providers:
  - id: openai:codex-app-server:gpt-5.5
    config:
      sandbox_mode: read-only
      output_schema:
        type: object
        properties:
          summary:
            type: string
          risks:
            type: array
            items:
              type: string
        required: [summary, risks]
        additionalProperties: false

prompts:
  - 'Return a JSON review summary for this repo.'

tests:
  - assert:
      - type: is-json
```

The final app-server response is returned as a string. Use `is-json` or a JavaScript assertion to parse it.

## Prompt Inputs

Plain text prompts work as usual. To include images, skills, or mentions, pass a JSON array:

```json
[
  { "type": "text", "text": "$skill-creator Write a test plan for this provider." },
  { "type": "image", "url": "https://example.com/screenshot.png" },
  { "type": "local_image", "path": "/Users/me/screenshots/failure.png" },
  {
    "type": "skill",
    "name": "skill-creator",
    "path": "/Users/me/.codex/skills/skill-creator/SKILL.md"
  },
  {
    "type": "mention",
    "name": "workspace",
    "path": "app://connector/resource"
  }
]
```

Supported input item types are `text`, `image`, `local_image`, `localImage`, `skill`, and `mention`.

## Metadata

The provider records app-server details for assertions and debugging:

```js
providerResponse.metadata.codexAppServer.threadId;
providerResponse.metadata.codexAppServer.turnId;
providerResponse.metadata.codexAppServer.itemCounts;
providerResponse.metadata.codexAppServer.items;
providerResponse.metadata.codexAppServer.serverRequests;
```

Command output, tool arguments, and approval metadata are sanitized before they are placed in metadata or tracing attributes.

## Tracing

Promptfoo wraps each provider call in a GenAI span. The app-server provider also creates item-level spans for completed command, file, MCP, dynamic tool, reasoning, search, and agent-message items.

Enable deeper app-server tracing by setting `deep_tracing: true` with Promptfoo's OpenTelemetry tracing enabled. Deep tracing starts a fresh app-server process for each row so the child process can receive the active trace context. Reusable app-server process and persistent thread pooling are disabled in this mode; explicit `thread_id` resumes are still serialized so parallel rows do not overlap turns on the same Codex thread.

## Local Verification

Run from the repository root:

```bash
npm run local -- eval -c examples/openai-codex-app-server/promptfooconfig.yaml --no-cache
```

Use `--env-file .env` if your API key is stored there.

To validate the provider against your installed Codex CLI schema:

```bash
codex app-server generate-ts --out /tmp/codex-app-server-schema/ts
codex app-server generate-json-schema --out /tmp/codex-app-server-schema/json
```