---
sidebar_position: 2
description: "Deploy Anthropic's Claude models including Opus, Sonnet, and Haiku for advanced reasoning and conversational AI applications"
---

# Anthropic

This provider supports the [Anthropic Claude](https://www.anthropic.com/claude) series of models.

> **Note:** Anthropic models can also be accessed through [Azure AI Foundry](/docs/providers/azure/#using-claude-models), [AWS Bedrock](/docs/providers/aws-bedrock/), and [Google Vertex](/docs/providers/vertex/).

:::tip Agentic Evals
For agentic evaluations with file access, tool use, and MCP servers, see the [Claude Agent SDK provider](/docs/providers/claude-agent-sdk/).
:::

## Setup

To use Anthropic, you need to set the `ANTHROPIC_API_KEY` environment variable or specify the `apiKey` in the provider configuration.

Create Anthropic API keys [here](https://console.anthropic.com/settings/keys).

Example of setting the environment variable:

```sh
export ANTHROPIC_API_KEY=your_api_key_here
```

### Authenticating via a Claude Code session

If you already have an active Claude Code session (for example as a Claude Pro or Max subscriber), you can reuse its OAuth credential instead of creating a separate Anthropic Console API key. Set `apiKeyRequired: false` on the provider config:

```yaml
providers:
  - id: anthropic:messages:claude-sonnet-4-6
    config:
      apiKeyRequired: false
```

When `apiKeyRequired` is `false` and no `ANTHROPIC_API_KEY` is available, Promptfoo loads the Claude Code OAuth credential from:

1. The macOS keychain entry `Claude Code-credentials` (darwin only), then
2. `$HOME/.claude/.credentials.json` on Linux and macOS, or `%USERPROFILE%\.claude\.credentials.json` on Windows.

Promptfoo authenticates requests with a Bearer token, sends the `claude-code-20250219,oauth-2025-04-20` beta headers, and prepends the required Claude Code identity system block (`"You are Claude Code, Anthropic's official CLI for Claude."`) to every Messages request. Your own system prompt is still forwarded as the next system block.

If you haven't logged in yet, run `claude /login` to create a credential. Re-run it if Promptfoo warns that the credential has expired. Requests made this way are expected to count against your Claude subscription the same way calls from the Claude Code CLI do — check [Anthropic's documentation](https://docs.claude.com/en/docs/claude-code/overview) for current billing behavior.

This also enables [model-graded assertions](#model-graded-tests) such as `llm-rubric` to run without a separate Anthropic Console key — see [the example below](#model-graded-tests).

## Models

The `anthropic` provider supports the following models via the messages API:

| Model ID                                                                   | Description            |
| -------------------------------------------------------------------------- | ---------------------- |
| `anthropic:messages:claude-opus-4-7`                                       | Claude 4.7 Opus        |
| `anthropic:messages:claude-sonnet-4-6`                                     | Claude 4.6 Sonnet      |
| `anthropic:messages:claude-opus-4-6`                                       | Claude 4.6 Opus        |
| `anthropic:messages:claude-opus-4-5-20251101` (claude-opus-4-5-latest)     | Claude 4.5 Opus        |
| `anthropic:messages:claude-opus-4-1-20250805` (claude-opus-4-1-latest)     | Claude 4.1 Opus        |
| `anthropic:messages:claude-opus-4-20250514` (claude-opus-4-latest)         | Claude 4 Opus          |
| `anthropic:messages:claude-sonnet-4-5-20250929` (claude-sonnet-4-5-latest) | Claude 4.5 Sonnet      |
| `anthropic:messages:claude-sonnet-4-20250514` (claude-sonnet-4-latest)     | Claude 4 Sonnet        |
| `anthropic:messages:claude-haiku-4-5-20251001` (claude-haiku-4-5-latest)   | Claude 4.5 Haiku       |
| `anthropic:messages:claude-3-7-sonnet-20250219` (claude-3-7-sonnet-latest) | Claude 3.7 Sonnet      |
| `anthropic:messages:claude-3-5-sonnet-20241022` (claude-3-5-sonnet-latest) | Claude 3.5 Sonnet (v2) |
| `anthropic:messages:claude-3-5-sonnet-20240620`                            | Claude 3.5 Sonnet (v1) |
| `anthropic:messages:claude-3-5-haiku-20241022` (claude-3-5-haiku-latest)   | Claude 3.5 Haiku       |
| `anthropic:messages:claude-3-opus-20240229` (claude-3-opus-latest)         | Claude 3 Opus          |
| `anthropic:messages:claude-3-haiku-20240307`                               | Claude 3 Haiku         |

### Cross-Platform Model Availability

Claude models are available across multiple platforms. Here's how the model names map across different providers:

| Model             | Anthropic API                                         | Azure AI Foundry ([docs](/docs/providers/azure/#using-claude-models)) | AWS Bedrock ([docs](/docs/providers/aws-bedrock)) | GCP Vertex AI ([docs](/docs/providers/vertex)) |
| ----------------- | ----------------------------------------------------- | --------------------------------------------------------------------- | ------------------------------------------------- | ---------------------------------------------- |
| Claude 4.7 Opus   | claude-opus-4-7                                       | claude-opus-4-7                                                       | anthropic.claude-opus-4-7                         | claude-opus-4-7                                |
| Claude 4.6 Sonnet | claude-sonnet-4-6                                     | claude-sonnet-4-6                                                     | anthropic.claude-sonnet-4-6                       | claude-sonnet-4-6                              |
| Claude 4.6 Opus   | claude-opus-4-6                                       | claude-opus-4-6-20260205                                              | anthropic.claude-opus-4-6-v1                      | claude-opus-4-6                                |
| Claude 4.5 Opus   | claude-opus-4-5-20251101 (claude-opus-4-5-latest)     | claude-opus-4-5-20251101                                              | anthropic.claude-opus-4-5-20251101-v1:0           | claude-opus-4-5@20251101                       |
| Claude 4.5 Sonnet | claude-sonnet-4-5-20250929 (claude-sonnet-4-5-latest) | claude-sonnet-4-5-20250929                                            | anthropic.claude-sonnet-4-5-20250929-v1:0         | claude-sonnet-4-5@20250929                     |
| Claude 4.5 Haiku  | claude-haiku-4-5-20251001 (claude-haiku-4-5-latest)   | claude-haiku-4-5-20251001                                             | anthropic.claude-haiku-4-5-20251001-v1:0          | claude-haiku-4-5@20251001                      |
| Claude 4.1 Opus   | claude-opus-4-1-20250805                              | claude-opus-4-1-20250805                                              | anthropic.claude-opus-4-1-20250805-v1:0           | claude-opus-4-1@20250805                       |
| Claude 4 Opus     | claude-opus-4-20250514 (claude-opus-4-latest)         | claude-opus-4-20250514                                                | anthropic.claude-opus-4-20250514-v1:0             | claude-opus-4@20250514                         |
| Claude 4 Sonnet   | claude-sonnet-4-20250514 (claude-sonnet-4-latest)     | claude-sonnet-4-20250514                                              | anthropic.claude-sonnet-4-20250514-v1:0           | claude-sonnet-4@20250514                       |
| Claude 3.7 Sonnet | claude-3-7-sonnet-20250219 (claude-3-7-sonnet-latest) | claude-3-7-sonnet-20250219                                            | anthropic.claude-3-7-sonnet-20250219-v1:0         | claude-3-7-sonnet@20250219                     |
| Claude 3.5 Sonnet | claude-3-5-sonnet-20241022 (claude-3-5-sonnet-latest) | claude-3-5-sonnet-20241022                                            | anthropic.claude-3-5-sonnet-20241022-v2:0         | claude-3-5-sonnet-v2@20241022                  |
| Claude 3.5 Haiku  | claude-3-5-haiku-20241022 (claude-3-5-haiku-latest)   | claude-3-5-haiku-20241022                                             | anthropic.claude-3-5-haiku-20241022-v1:0          | claude-3-5-haiku@20241022                      |
| Claude 3 Opus     | claude-3-opus-20240229 (claude-3-opus-latest)         | claude-3-opus-20240229                                                | anthropic.claude-3-opus-20240229-v1:0             | claude-3-opus@20240229                         |
| Claude 3 Haiku    | claude-3-haiku-20240307                               | claude-3-haiku-20240307                                               | anthropic.claude-3-haiku-20240307-v1:0            | claude-3-haiku@20240307                        |

### Supported Parameters

| Config Property | Environment Variable  | Description                                                                         |
| --------------- | --------------------- | ----------------------------------------------------------------------------------- |
| apiKey          | ANTHROPIC_API_KEY     | Your API key from Anthropic                                                         |
| apiKeyRequired  | -                     | Skip the API key preflight and authenticate via a local Claude Code session         |
| apiBaseUrl      | ANTHROPIC_BASE_URL    | The base URL for requests to the Anthropic API                                      |
| temperature     | ANTHROPIC_TEMPERATURE | Controls the randomness of the output (default: 0). Omitted when `top_p` is set.    |
| max_tokens      | ANTHROPIC_MAX_TOKENS  | The maximum length of the generated text (default: 1024)                            |
| cost            | -                     | Legacy per-token override applied to both input and output pricing                  |
| inputCost       | -                     | Override input token pricing in promptfoo cost estimates                            |
| outputCost      | -                     | Override output token pricing in promptfoo cost estimates                           |
| top_p           | -                     | Controls nucleus sampling. Mutually exclusive with `temperature`.                   |
| top_k           | -                     | Only sample from the top K options for each subsequent token                        |
| stop_sequences  | -                     | Array of strings that will stop generation when encountered                         |
| stream          | -                     | Enable streaming (required when `max_tokens` > 21,333)                              |
| tools           | -                     | An array of tool or function definitions for the model to call                      |
| tool_choice     | -                     | An object specifying the tool to call                                               |
| effort          | -                     | Output effort level: `low`, `medium`, `high`, `xhigh`, or `max`                     |
| output_format   | -                     | JSON schema configuration for structured outputs                                    |
| thinking        | -                     | Configuration for Claude's extended thinking (`enabled`, `adaptive`, or `disabled`) |
| showThinking    | -                     | Whether to include thinking content in the output (default: true)                   |
| cache_control   | -                     | Auto-apply cache_control to the last cacheable block in the request                 |
| metadata        | -                     | Request metadata such as `user_id` for tracking purposes                            |
| service_tier    | -                     | Priority tier: `auto` (default) or `standard_only`                                  |
| headers         | -                     | Additional headers to be sent with the API request                                  |
| extra_body      | -                     | Additional parameters to be included in the API request body                        |

### Prompt Template

To allow for compatibility with the OpenAI prompt template, the following format is supported:

```json title="prompt.json"
[
  {
    "role": "system",
    "content": "{{ system_message }}"
  },
  {
    "role": "user",
    "content": "{{ question }}"
  }
]
```

If the role `system` is specified, it will be automatically added to the API request.
All `user` or `assistant` roles will be automatically converted into the right format for the API request.
Currently, only type `text` is supported.

The `system_message` and `question` are example variables that can be set with the `var` directive.

### Options

The Anthropic provider supports several options to customize the behavior of the model. These include:

- `temperature`: Controls the randomness of the output.
- `max_tokens`: The maximum length of the generated text.
- `top_p`: Controls nucleus sampling, affecting the randomness of the output.
- `top_k`: Only sample from the top K options for each subsequent token.
- `tools`: An array of tool or function definitions for the model to call.
- `tool_choice`: An object specifying the tool to call.
- `stop_sequences`: An array of strings that stop generation when encountered.
- `metadata`: Request metadata (e.g., `user_id`) passed to the API.
- `extra_body`: Additional parameters to pass directly to the Anthropic API request body.

Example configuration with options and prompts:

```yaml title="promptfooconfig.yaml"
providers:
  - id: anthropic:messages:claude-sonnet-4-5-20250929
    config:
      temperature: 0.0
      max_tokens: 512
      extra_body:
        custom_param: 'test_value'
prompts:
  - file://prompt.json
```

### Stop Sequences

Use `stop_sequences` to halt generation when Claude encounters specific strings:

```yaml
providers:
  - id: anthropic:messages:claude-sonnet-4-5-20250929
    config:
      stop_sequences:
        - "\n\nHuman:"
        - 'STOP'
```

### Metadata

Pass request metadata to the API for tracking or auditing purposes:

```yaml
providers:
  - id: anthropic:messages:claude-sonnet-4-5-20250929
    config:
      metadata:
        user_id: 'user-123'
```

### Tool Calling

The Anthropic provider supports tool calling (function calling). Here's an example configuration for defining tools.

```yaml title="promptfooconfig.yaml"
providers:
  - id: anthropic:messages:claude-sonnet-4-5-20250929
    config:
      tools:
        - name: get_weather
          description: Get the current weather in a given location
          input_schema:
            type: object
            properties:
              location:
                type: string
                description: The city and state, e.g., San Francisco, CA
              unit:
                type: string
                enum:
                  - celsius
                  - fahrenheit
            required:
              - location
```

#### Web Search and Web Fetch Tools

Anthropic provides specialized tools for web search and web fetching capabilities:

##### Web Fetch Tool

The web fetch tool allows Claude to retrieve full content from web pages and PDF documents. This is useful when you want Claude to access and analyze specific web content.

```yaml
providers:
  - id: anthropic:messages:claude-sonnet-4-5-20250929
    config:
      tools:
        - type: web_fetch_20250910
          name: web_fetch
          max_uses: 5
          allowed_domains:
            - docs.example.com
            - help.example.com
          citations:
            enabled: true
          max_content_tokens: 50000
```

Promptfoo also supports the stable `web_fetch_20260209` variant. A newer version `web_fetch_20260309` adds `use_cache` support for controlling whether cached content is used:

```yaml
providers:
  - id: anthropic:messages:claude-sonnet-4-5-20250929
    config:
      tools:
        - type: web_fetch_20260209
          name: web_fetch
          max_uses: 3
          defer_loading: true
        - type: web_fetch_20260309
          name: web_fetch
          max_uses: 3
          use_cache: false # Bypass cache for fresh content
```

**Web Fetch Tool Configuration Options:**

| Parameter            | Type     | Description                                                                                   |
| -------------------- | -------- | --------------------------------------------------------------------------------------------- |
| `type`               | string   | `web_fetch_20250910` (beta), `web_fetch_20260209`, or `web_fetch_20260309` (adds `use_cache`) |
| `name`               | string   | Must be `web_fetch`                                                                           |
| `max_uses`           | number   | Maximum number of web fetches per request (optional)                                          |
| `allowed_callers`    | string[] | Restrict which tool callers may invoke the server tool (optional)                             |
| `allowed_domains`    | string[] | List of domains to allow fetching from (optional, mutually exclusive with `blocked_domains`)  |
| `blocked_domains`    | string[] | List of domains to block fetching from (optional, mutually exclusive with `allowed_domains`)  |
| `defer_loading`      | boolean  | Load the tool lazily instead of including it in the initial system prompt (optional)          |
| `citations`          | object   | Enable citations with `{ enabled: true }` (optional)                                          |
| `max_content_tokens` | number   | Maximum tokens for web content (optional)                                                     |
| `cache_control`      | object   | Apply Anthropic cache control to the tool definition (optional)                               |
| `strict`             | boolean  | Enable strict schema validation for tool names and inputs (optional)                          |
| `use_cache`          | boolean  | Whether to use cached content (`web_fetch_20260309` only, optional)                           |

##### Web Search Tool

The web search tool allows Claude to search the internet for information:

```yaml
providers:
  - id: anthropic:messages:claude-sonnet-4-5-20250929
    config:
      tools:
        - type: web_search_20260209
          name: web_search
          max_uses: 3
```

**Web Search Tool Configuration Options:**

| Parameter         | Type     | Description                                                                                |
| ----------------- | -------- | ------------------------------------------------------------------------------------------ |
| `type`            | string   | `web_search_20250305` (beta) or `web_search_20260209`                                      |
| `name`            | string   | Must be `web_search`                                                                       |
| `max_uses`        | number   | Maximum number of searches per request (optional)                                          |
| `allowed_callers` | string[] | Restrict which tool callers may invoke the server tool (optional)                          |
| `allowed_domains` | string[] | Restrict results to specific domains (optional, mutually exclusive with `blocked_domains`) |
| `blocked_domains` | string[] | Exclude domains from results (optional, mutually exclusive with `allowed_domains`)         |
| `cache_control`   | object   | Apply Anthropic cache control to the tool definition (optional)                            |
| `defer_loading`   | boolean  | Load the tool lazily instead of including it in the initial system prompt (optional)       |
| `strict`          | boolean  | Enable strict schema validation for tool names and inputs (optional)                       |
| `user_location`   | object   | Approximate user location to improve search relevance (optional)                           |

##### Combined Web Search and Web Fetch

You can use both tools together for comprehensive web information gathering:

```yaml
providers:
  - id: anthropic:messages:claude-sonnet-4-5-20250929
    config:
      tools:
        - type: web_search_20260209
          name: web_search
          max_uses: 3
        - type: web_fetch_20260309
          name: web_fetch
          max_uses: 5
          citations:
            enabled: true
```

This configuration allows the model to first search for relevant information, then fetch full content from the most promising results.

##### Memory Tool

Anthropic's `memory_20250818` tool can be included in `tools`. Promptfoo passes this native tool definition through unchanged, which is useful for evaluating whether a model requests memory operations. Promptfoo does not manage Anthropic memory stores or run local memory handlers for you.

```yaml
providers:
  - id: anthropic:messages:claude-sonnet-4-6
    config:
      tools:
        - type: memory_20250818
          name: memory
          allowed_callers:
            - direct
```

**Memory Tool Configuration Options:**

| Parameter         | Type     | Description                                                          |
| ----------------- | -------- | -------------------------------------------------------------------- |
| `type`            | string   | Must be `memory_20250818`                                            |
| `name`            | string   | Must be `memory`                                                     |
| `allowed_callers` | string[] | Restrict which tool callers may invoke the memory tool (optional)    |
| `cache_control`   | object   | Apply Anthropic cache control to the tool definition (optional)      |
| `defer_loading`   | boolean  | Load the tool lazily instead of including it in the initial prompt   |
| `input_examples`  | object[] | Example memory commands to include in the tool definition (optional) |
| `strict`          | boolean  | Enable strict schema validation for tool names and inputs (optional) |

**Important Security Notes:**

- The web fetch tool requires trusted environments due to potential data exfiltration risks
- The model cannot dynamically construct URLs - only URLs provided by users or from search results can be fetched
- Use domain filtering to restrict access to specific sites:
  - Use `allowed_domains` to whitelist trusted domains (recommended)
  - Use `blocked_domains` to blacklist specific domains
  - **Note:** Only one of `allowed_domains` or `blocked_domains` can be specified, not both

See the [Anthropic Tool Use Guide](https://docs.anthropic.com/en/docs/tool-use) for more information on how to define tools and the tool use example [here](https://github.com/promptfoo/promptfoo/tree/main/examples/eval-tool-use).

### Images / Vision

You can include images in the prompts in Claude 3 models.

See the [Claude vision example](https://github.com/promptfoo/promptfoo/tree/main/examples/claude-vision).

One important note: The Claude API only supports base64 representations of images.
This is different from how OpenAI's vision works, as it supports grabbing images from a URL. As a result, if you are trying to compare Claude 3 and OpenAI vision capabilities, you will need to have separate prompts for each.

See the [OpenAI vision example](https://github.com/promptfoo/promptfoo/tree/main/examples/openai-vision) to understand the differences.

### Prompt Caching

Claude supports prompt caching to optimize API usage and reduce costs for repetitive tasks. This feature caches portions of your prompts to avoid reprocessing identical content in subsequent requests.

Supported on all Claude 3, 3.5, and 4 models. Basic example:

```yaml title="promptfooconfig.yaml"
providers:
  - id: anthropic:messages:claude-sonnet-4-5-20250929
prompts:
  - file://prompts.yaml
```

```yaml title="prompts.yaml"
- role: system
  content:
    - type: text
      text: 'System message'
      cache_control:
        type: ephemeral
    - type: text
      text: '{{context}}'
      cache_control:
        type: ephemeral
- role: user
  content: '{{question}}'
```

As a simpler alternative, use the top-level `cache_control` parameter to automatically apply a cache marker to the last cacheable block in the request, without annotating each block individually:

```yaml
providers:
  - id: anthropic:messages:claude-sonnet-4-5-20250929
    config:
      cache_control:
        type: ephemeral
```

Common use cases for caching:

- System messages and instructions
- Tool/function definitions
- Large context documents
- Frequently used images

Cache read and creation token counts are tracked in the response's token usage details.

See [Anthropic's Prompt Caching Guide](https://docs.anthropic.com/claude/docs/prompt-caching) for more details on requirements, pricing, and best practices.

### Citations

Claude can provide detailed citations when answering questions about documents. Basic example:

```yaml title="promptfooconfig.yaml"
providers:
  - id: anthropic:messages:claude-sonnet-4-5-20250929
prompts:
  - file://prompts.yaml
```

```yaml title="prompts.yaml"
- role: user
  content:
    - type: document
      source:
        type: text
        media_type: text/plain
        data: 'Your document text here'
      citations:
        enabled: true
    - type: text
      text: 'Your question here'
```

See [Anthropic's Citations Guide](https://docs.anthropic.com/en/docs/build-with-claude/citations) for more details.

### PDF Documents

Claude can process PDF files using document content blocks. Pass the PDF as base64-encoded data:

```yaml
- role: user
  content:
    - type: document
      source:
        type: base64
        media_type: application/pdf
        data: '{{pdf_base64}}'
    - type: text
      text: 'Summarize this document'
```

Use a test var to supply the base64-encoded PDF content:

```yaml
tests:
  - vars:
      pdf_base64: file://document.pdf
```

### Claude Opus 4.7 notes

Opus 4.7 is designed around adaptive thinking and runs with the reasoning stack always on. Promptfoo handles the key differences from earlier Opus models automatically:

- **Temperature is managed for you.** Opus 4.7 samples adaptively and does not accept `temperature`; promptfoo omits the field from every request. Passing `temperature` in config or `ANTHROPIC_TEMPERATURE` logs a one-time heads-up so you can clean the value out of your eval.
- **Adaptive thinking is the default.** Use `thinking: { type: 'adaptive' }` (or leave `thinking` unset) to let the model choose how much to reason per request. Budget-based modes from older models aren't used on 4.7.
- **`xhigh` effort level is available.** It sits between `high` and `max` and is a good starting point for coding and agentic tasks. See the [Effort Level](#effort-level) section.
- **Updated tokenizer.** The same input can map to 1.0–1.35× more tokens than Opus 4.6, so measure real traffic if you're comparing costs.

The same guidance applies when you reach Opus 4.7 through AWS Bedrock, GCP Vertex, or Azure AI Foundry — promptfoo suppresses `temperature` on each of those paths as well.

### Extended Thinking

Claude supports an extended thinking capability that allows you to see the model's internal reasoning process before it provides the final answer. This can be configured using the `thinking` parameter:

```yaml title="promptfooconfig.yaml"
providers:
  # Adaptive thinking (recommended for Claude Opus 4.7)
  - id: anthropic:messages:claude-opus-4-7
    config:
      max_tokens: 20000
      thinking:
        type: 'adaptive'

  # Enabled thinking with explicit budget
  - id: anthropic:messages:claude-sonnet-4-5-20250929
    config:
      max_tokens: 20000
      thinking:
        type: 'enabled'
        budget_tokens: 16000 # Must be ≥1024 and less than max_tokens
```

The thinking configuration has three possible values:

1. Adaptive thinking (recommended for Claude Opus 4.7):

```yaml
thinking:
  type: 'adaptive'
```

In adaptive mode, Claude decides when and how much to think based on the complexity of the request. This is the recommended mode for `claude-opus-4-7`.

2. Enabled thinking:

```yaml
thinking:
  type: 'enabled'
  budget_tokens: number # Must be ≥1024 and less than max_tokens
```

3. Disabled thinking:

```yaml
thinking:
  type: 'disabled'
```

The `display` field controls how thinking content is returned:

- `'summarized'` (default) - thinking content is included in the response
- `'omitted'` - thinking content is redacted but a signature is returned for multi-turn continuity (saves tokens)

```yaml
thinking:
  type: enabled
  budget_tokens: 10000
  display: omitted
```

When thinking is enabled or adaptive:

- Responses will include `thinking` content blocks showing Claude's reasoning process
- Requires a minimum budget of 1,024 tokens
- The budget_tokens value must be less than the max_tokens parameter
- The tokens used for thinking count towards your max_tokens limit
- A specialized 28 or 29 token system prompt is automatically included
- Previous turn thinking blocks are ignored and not counted as input tokens
- `temperature` and `top_k` are incompatible with thinking and will be omitted with a warning
- `top_p` is clamped to the range [0.95, 1.0] when thinking is enabled
- Forced tool use (`tool_choice` type `any` or `tool`) is incompatible with thinking and will be omitted with a warning; use `auto` instead

Example response with thinking enabled:

```json
{
  "content": [
    {
      "type": "thinking",
      "thinking": "Let me analyze this step by step...",
      "signature": "WaUjzkypQ2mUEVM36O2TxuC06KN8xyfbJwyem2dw3URve/op91XWHOEBLLqIOMfFG/UvLEczmEsUjavL...."
    },
    {
      "type": "text",
      "text": "Based on my analysis, here is the answer..."
    }
  ]
}
```

#### Controlling Thinking Output

By default, thinking content is included in the response output. You can control this behavior using the `showThinking` parameter:

```yaml title="promptfooconfig.yaml"
providers:
  - id: anthropic:messages:claude-sonnet-4-5-20250929
    config:
      thinking:
        type: 'enabled'
        budget_tokens: 16000
      showThinking: false # Exclude thinking content from the output
```

When `showThinking` is set to `false`, the thinking content will be excluded from the output, and only the final response will be returned. This is useful when you want to use thinking for better reasoning but don't want to expose the thinking process to end users.

#### Redacted Thinking

Sometimes Claude's internal reasoning may be flagged by safety systems. When this occurs, the thinking block will be encrypted and returned as a `redacted_thinking` block:

```json
{
  "content": [
    {
      "type": "redacted_thinking",
      "data": "EmwKAhgBEgy3va3pzix/LafPsn4aDFIT2Xlxh0L5L8rLVyIwxtE3rAFBa8cr3qpP..."
    },
    {
      "type": "text",
      "text": "Based on my analysis..."
    }
  ]
}
```

Redacted thinking blocks are automatically decrypted when passed back to the API, allowing Claude to maintain context without compromising safety guardrails.

#### Extended Output with Thinking

Claude 4 models provide enhanced output capabilities and extended thinking support:

```yaml
providers:
  - id: anthropic:messages:claude-sonnet-4-5-20250929
    config:
      max_tokens: 64000 # Claude 4 Sonnet supports up to 64K output tokens
      thinking:
        type: 'enabled'
        budget_tokens: 32000
```

Note: The `output-128k-2025-02-19` beta feature is specific to Claude 3.7 Sonnet and is not needed for Claude 4 models, which have improved output capabilities built-in.

When using extended output:

- Streaming is required when max_tokens is greater than 21,333
- For thinking budgets above 32K, batch processing is recommended
- The model may not use the entire allocated thinking budget

See [Anthropic's Extended Thinking Guide](https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking) for more details on requirements and best practices.

### Effort Level

The `effort` parameter controls the output quality/speed tradeoff. Higher effort levels may produce more thorough responses but take longer:

```yaml
providers:
  - id: anthropic:messages:claude-opus-4-7
    config:
      effort: xhigh # Options: low, medium, high, xhigh, max
```

Claude Opus 4.7 introduces the `xhigh` level between `high` and `max`, giving finer control over reasoning/latency on hard problems. For coding and agentic use cases, Anthropic recommends starting with `high` or `xhigh`.

This can be combined with other features like structured outputs:

```yaml
providers:
  - id: anthropic:messages:claude-opus-4-7
    config:
      effort: high
      output_format:
        type: json_schema
        schema:
          type: object
          properties:
            analysis:
              type: string
          required:
            - analysis
          additionalProperties: false
```

### Structured Outputs

Structured outputs constrain Claude's responses to a JSON schema. Supported on Claude Opus 4.7, Opus 4.6, Sonnet 4.6, and Sonnet 4.5+ / Opus 4.1+.

#### JSON Outputs

Add `output_format` to get structured responses:

```yaml
providers:
  - id: anthropic:messages:claude-sonnet-4-5-20250929
    config:
      output_format:
        type: json_schema
        schema:
          type: object
          properties:
            name:
              type: string
            email:
              type: string
          required:
            - name
            - email
          additionalProperties: false
```

You can also load the entire `output_format` from an external file:

```yaml
config:
  output_format: file://./schemas/analysis-format.json
```

Nested file references are supported for the schema:

```json title="analysis-format.json"
{
  "type": "json_schema",
  "schema": "file://./schemas/analysis-schema.json"
}
```

Variable rendering is supported in file paths:

```yaml
config:
  output_format: file://./schemas/{{ schema_name }}.json
```

#### Strict Tool Use

Add `strict: true` to tool definitions for schema-validated parameters:

```yaml
providers:
  - id: anthropic:messages:claude-sonnet-4-5-20250929
    config:
      tools:
        - name: get_weather
          strict: true
          input_schema:
            type: object
            properties:
              location:
                type: string
            required:
              - location
            additionalProperties: false
```

#### Limitations

**Supported:** object, array, string, integer, number, boolean, null, `enum`, `required`, `additionalProperties: false`

**Not supported:** recursive schemas, `minimum`/`maximum`, `minLength`/`maxLength`

**Incompatible with:** citations, message prefilling

See [Anthropic's guide](https://docs.anthropic.com/en/docs/build-with-claude/structured-outputs) and the [structured outputs example](https://github.com/promptfoo/promptfoo/tree/main/examples/anthropic/structured-outputs).

## Model-Graded Tests

[Model-graded assertions](/docs/configuration/expected-outputs/model-graded/) such as `factuality` or `llm-rubric` will automatically use Anthropic as the grading provider if `ANTHROPIC_API_KEY` is set and `OPENAI_API_KEY` is not set.

If both API keys are present, OpenAI will be used by default. You can explicitly override the grading provider in your configuration.

Claude Pro/Max subscribers without a separate Anthropic Console key can wire up `llm-rubric` through a local Claude Code session by pointing the grader at `anthropic:messages:<model>` with `apiKeyRequired: false`:

```yaml
defaultTest:
  options:
    provider:
      id: anthropic:messages:claude-sonnet-4-6
      config:
        apiKeyRequired: false
```

See [Authenticating via a Claude Code session](#authenticating-via-a-claude-code-session) above for how the credential is loaded and what beta headers Promptfoo sets.

Because of how model-graded evals are implemented, **the model must support chat-formatted prompts** (except for embedding or classification models).

You can override the grading provider in several ways:

1. For all test cases using `defaultTest`:

```yaml title="promptfooconfig.yaml"
defaultTest:
  options:
    provider: anthropic:messages:claude-sonnet-4-5-20250929
```

2. For individual assertions:

```yaml
assert:
  - type: llm-rubric
    value: Do not mention that you are an AI or chat assistant
    provider:
      id: anthropic:messages:claude-sonnet-4-5-20250929
      config:
        temperature: 0.0
```

3. For specific tests:

```yaml
tests:
  - vars:
      question: What is the capital of France?
    options:
      provider:
        id: anthropic:messages:claude-sonnet-4-5-20250929
    assert:
      - type: llm-rubric
        value: Answer should mention Paris
```

### Additional Capabilities

- **Caching**: Promptfoo caches previous LLM requests by default.
- **Token Usage Tracking**: Provides detailed information on the number of tokens used in each request, aiding in usage monitoring and optimization.
- **Cost Calculation**: Calculates the cost of each request based on the number of tokens generated and the specific model used.

## See Also

### Examples

We provide several example implementations demonstrating Claude's capabilities:

#### Core Features

- [Tool Use Example](https://github.com/promptfoo/promptfoo/tree/main/examples/eval-tool-use) - Shows how to use Claude's tool calling capabilities
- [Structured Outputs Example](https://github.com/promptfoo/promptfoo/tree/main/examples/anthropic/structured-outputs) - Demonstrates JSON outputs and strict tool use for guaranteed schema compliance
- [Vision Example](https://github.com/promptfoo/promptfoo/tree/main/examples/claude-vision) - Demonstrates using Claude's vision capabilities

#### Model Comparisons & Evaluations

- [Claude vs GPT](https://github.com/promptfoo/promptfoo/tree/main/examples/compare-claude-vs-gpt) - Compares Claude with GPT-5.4 on various tasks
- [Claude vs GPT Image Analysis](https://github.com/promptfoo/promptfoo/tree/main/examples/compare-claude-vs-gpt-image) - Compares Claude's and GPT's image analysis capabilities

#### Cloud Platform Integrations

- [Azure AI Foundry](https://github.com/promptfoo/promptfoo/tree/main/examples/azure/claude) - Using Claude through Azure AI Foundry
- [AWS Bedrock](https://github.com/promptfoo/promptfoo/tree/main/examples/amazon-bedrock) - Using Claude through AWS Bedrock
- [Google Vertex AI](https://github.com/promptfoo/promptfoo/tree/main/examples/google-vertex) - Using Claude through Google Vertex AI

#### Agentic Evaluations

- [Claude Agent SDK](/docs/providers/claude-agent-sdk/) - For agentic evals with file access, tool use, and MCP servers

For more examples and general usage patterns, visit our [examples directory](https://github.com/promptfoo/promptfoo/tree/main/examples) on GitHub.