# claude-agent-sdk (Claude Agent SDK Examples) The Claude Agent SDK provider (aka Claude Code provider) enables you to run agentic evals with configurable tools, permissions, and environments. ```bash npx promptfoo@latest init --example claude-agent-sdk cd claude-agent-sdk ``` ## Setup Install the Claude Agent SDK: ```bash npm install @anthropic-ai/claude-agent-sdk ``` Export your Anthropic API key as `ANTHROPIC_API_KEY`: ```bash export ANTHROPIC_API_KEY=your_api_key_here ``` ## Examples ### Basic Usage This example shows Claude Agent SDK in its simplest form - running in a temporary directory with no file system access or tools enabled, behaving similarly to the standard Anthropic provider. **Location**: `./basic/` **Usage**: ```bash (cd basic && promptfoo eval) ``` ### Working Directory This example provides Claude Agent SDK with read-only access to a sample project containing Python, TypeScript, and JavaScript files with intentional bugs for analysis. Because the `working_dir` is set, Claude Agent SDK has access to the following read-only tools: - `Read` - Read file contents - `Grep` - Search file contents - `Glob` - Find files by pattern - `LS` - List directory contents **Location**: `./working-dir/` **Usage**: ```bash (cd working-dir && promptfoo eval) ``` ### Advanced Editing This example shows Claude Agent SDK's ability to modify files with: - **File editing tools**: `Write`, `Edit`, and `MultiEdit` tools are added to the default set of read-only tools by setting `append_allowed_tools` - **Permission mode**: `permission_mode` is set to `acceptEdits` for automatic approval of file edits - **Automatic git workspace management**: The working directory (`./workspace`) uses `beforeAll`, `afterEach`, and `afterAll` extension hooks defined in `hooks.js` to: - Initialize a git repository before all tests - Capture timestamped diffs after each test in a markdown report - Reset changes after each test - Clean up the `.git` directory after all tests - **Serial execution**: `maxConcurrency: 1` to prevent race conditions during concurrent tests **Location**: `./advanced/` **Usage**: ```bash (cd advanced && promptfoo eval) ``` ### MCP Integration This example shows Claude Agent SDK integration with: - **MCP weather server**: Uses `@h1deya/mcp-server-weather` for weather data - **Tool permissions**: Specific MCP tools (`mcp__weather__get-forecast`, `mcp__weather__get-alerts`) - **External API access**: Fetches live weather data for San Francisco **Location**: `./mcp/` **Usage**: ```bash (cd mcp && promptfoo eval) ``` ### Structured Output This example demonstrates Claude Agent SDK's structured output feature, which returns validated JSON that conforms to a schema. It includes: - **JSON schema validation**: Define expected output structure with types, enums, and required fields - **Code analysis task**: Agent analyzes a Python function for bugs - **Assertion testing**: Validates that output matches expected schema and contains correct analysis **Location**: `./structured-output/` **Usage**: ```bash (cd structured-output && promptfoo eval) ``` ### Advanced Options This example demonstrates advanced Claude Agent SDK configuration options including sandbox settings, runtime configuration, permission bypass, and CLI arguments. **Location**: `./advanced-options/` **Usage**: ```bash (cd advanced-options && promptfoo eval) ``` **Features demonstrated**: - **Sandbox configuration**: Run commands in isolated environments with network restrictions - **Runtime configuration**: Specify JavaScript runtime (node, bun, deno) - **Extra CLI arguments**: Pass additional flags to Claude Code - **Setting sources**: Control where SDK loads settings from - **Permission bypass**: Safely bypass permissions for automated testing ### AskUserQuestion Handling This example demonstrates handling the `AskUserQuestion` tool in automated evaluations. When Claude needs to ask the user a question, this shows how to provide automated answers. **Location**: `./ask-user-question/` **Usage**: ```bash (cd ask-user-question && promptfoo eval) ``` **Features demonstrated**: - **Convenience option**: Use `ask_user_question.behavior` for simple automated responses - **First option selection**: Automatically select the first available option - **Tool enablement**: Enable `AskUserQuestion` via `append_allowed_tools` ### Skills Testing This example demonstrates testing [Agent Skills](https://platform.claude.com/docs/en/agent-sdk/skills) with the Claude Agent SDK. Skills are reusable capabilities defined as `SKILL.md` files that Claude automatically invokes when relevant. - **Skill discovery**: Uses `setting_sources: ['project']` to load skills from `.claude/skills/` - **Skill filtering**: Uses `skills: ['code-review']` (SDK 0.2.120+) to scope the test to a single skill and auto-allow the `Skill` tool - **Skill assertions**: Verifies normalized `metadata.skillCalls` with the `skill-used` assertion - **Sample skill**: A code review skill that identifies bugs and security issues **Location**: `./skills/` **Usage**: ```bash (cd skills && promptfoo eval) ``` ### Skill Comparison This example compares two versions of the same Claude Agent SDK skill against identical review tasks. It is the Claude companion to [`examples/openai-codex-sdk/skill-comparison`](../openai-codex-sdk/skill-comparison) and the runnable form of the [agent-skill testing guide](https://www.promptfoo.dev/docs/guides/test-agent-skills). - **Versioned fixtures**: Each provider points at a different `working_dir` with its own `.claude/skills/review-standards/SKILL.md` - **Skill filter**: Uses `skills: ['review-standards']` (SDK 0.2.120+) to auto-allow the `Skill` tool - **Structured output**: Shares an `output_format` schema across both providers via a YAML anchor so JSON results are reliable without prompt gymnastics - **Outcome scoring**: A JavaScript assertion scores issue recall against `expectedIssues` **Location**: `./skill-comparison/` **Usage**: ```bash (cd skill-comparison && promptfoo eval --no-cache) ``` ### Plugins This example demonstrates loading skills from a [plugin](https://code.claude.com/docs/en/plugins) instead of from `setting_sources`. Plugins are self-contained directories that bundle skills, agents, hooks, and MCP servers together. - **Plugin loading**: Uses `plugins: [{type: local, path: ./sample-plugin}]` to load a local plugin - **Skill tool**: Enables the `Skill` tool via `append_allowed_tools` - **Skill assertions**: Verifies normalized `metadata.skillCalls` with the `skill-used` assertion - **Sample skill**: A standards-check skill verifies the project has a README.md **Location**: `./plugins/` **Usage**: ```bash (cd plugins && promptfoo eval) ``` ### Cyber Espionage Red Team This example demonstrates testing AI agents against cyber espionage attack patterns based on Anthropic's ["Disrupting AI Espionage"](https://www.anthropic.com/news/disrupting-AI-espionage) blog post. It includes: - **Simulated target system**: Workspace with configuration files, credentials, logs, and sensitive data - **Comprehensive red team plugins**: `harmful:cybercrime`, `harmful:cybercrime:malicious-code`, `ssrf`, `pii`, `excessive-agency`, and more - **Advanced jailbreak strategies**: `jailbreak:meta`, `jailbreak:hydra`, `crescendo`, `goat` for sophisticated attacks - **Reconnaissance testing**: File system access tools (`Read`, `Grep`, `Glob`, `Bash`) to test security boundaries - **Authorized testing context**: Demonstrates responsible security testing practices **Location**: `./cyber-espionage/` **Usage**: ```bash (cd cyber-espionage && promptfoo eval) ``` > ⚠️ This example is for authorized security testing only. It demonstrates how to identify vulnerabilities in AI agents before malicious actors can exploit them.