---
sidebar_label: Search Rubric
---

# Search-Rubric

The `search-rubric` assertion type is like `llm-rubric` but with web search capabilities. It evaluates outputs according to a rubric while having the ability to search for current information when needed.

## How it works

1. You provide a rubric that describes what the output should contain
2. The grading provider evaluates the output against the rubric
3. If the rubric requires current information, the provider searches the web
4. Returns pass/fail with a score from 0.0 to 1.0

## Basic Usage

```yaml
assert:
  - type: search-rubric
    value: 'Provides accurate current Bitcoin price within 5% of market value'
```

## Comparing to LLM-Rubric

The `search-rubric` assertion behaves exactly like `llm-rubric`, but automatically uses a provider with web search capabilities:

```yaml
# These are equivalent:
assert:
  # Using llm-rubric with a web-search capable provider
  - type: llm-rubric
    value: 'Contains current stock price for Apple (AAPL) within $5'
    provider: openai:responses:gpt-5.1 # Must configure web search tool

  # Using search-rubric (automatically selects a web-search provider)
  - type: search-rubric
    value: 'Contains current stock price for Apple (AAPL) within $5'
```

## Using Variables in the Rubric

Like `llm-rubric`, you can use test variables:

```yaml
prompts:
  - 'What is the current weather in {{city}}?'

assert:
  - type: search-rubric
    value: 'Provides current temperature in {{city}} with units (F or C)'

tests:
  - vars:
      city: San Francisco
  - vars:
      city: Tokyo
```

## Grading Providers

The search-rubric assertion requires a grading provider with web search capabilities:

### 1. Anthropic Claude

Anthropic Claude models support web search through the `web_search_20250305` tool:

```yaml
grading:
  provider: anthropic:messages:claude-opus-4-6
  providerOptions:
    config:
      tools:
        - type: web_search_20250305
          name: web_search
          max_uses: 5
```

### 2. OpenAI with Web Search

OpenAI's responses API supports web search through the `web_search_preview` tool:

```yaml
grading:
  provider: openai:responses:gpt-5.1
  providerOptions:
    config:
      tools:
        - type: web_search_preview
```

### 3. Perplexity

Perplexity models have built-in web search:

```yaml
grading:
  provider: perplexity:sonar
```

### 4. Google Gemini

Google's Gemini models support web search through the `googleSearch` tool:

```yaml
grading:
  provider: google:gemini-3.1-pro-preview
  providerOptions:
    config:
      tools:
        - googleSearch: {}
```

### 5. xAI Grok

xAI's Grok models can use server-side web search tools through the Responses API:

```yaml
grading:
  provider: xai:responses:grok-4.3
  providerOptions:
    config:
      tools:
        - type: web_search
```

## Use Cases

### 1. Current Events Verification

```yaml
prompts:
  - 'Who won the latest Super Bowl?'

assert:
  - type: search-rubric
    value: 'Names the correct winner of the most recent Super Bowl with the final score'
```

### 2. Real-time Price Checking

```yaml
prompts:
  - "What's the current stock price of {{ticker}}?"

assert:
  - type: search-rubric
    value: |
      Provides accurate stock price for {{ticker}} that:
      1. Is within 2% of current market price
      2. Includes currency (USD)
      3. Mentions if market is open or closed
    threshold: 0.8
```

### 3. Weather Information

```yaml
prompts:
  - "What's the weather like in Tokyo?"

assert:
  - type: search-rubric
    value: |
      Describes current Tokyo weather including:
      - Temperature (with units)
      - General conditions (sunny, rainy, etc.)
      - Humidity or precipitation if relevant
```

### 4. Latest Software Versions

```yaml
prompts:
  - "What's the latest version of Node.js?"

assert:
  - type: search-rubric
    value: 'States the correct latest LTS version of Node.js (not experimental or nightly)'
```

## Cost Considerations

Web search assertions have the following cost implications. As of November 2025:

- **Anthropic Claude**: $10 per 1,000 web search calls plus token costs
- **OpenAI**: Web search tools on the Responses API cost $10-25 per 1,000 tool calls in addition to token usage
- **Google Gemini API**: $35 per 1,000 grounded prompts; **Vertex AI Web Grounding**: $45 per 1,000
- **Perplexity**: Per-request plus token-based pricing; see Perplexity or your proxy's pricing page
- **xAI Grok**: $25 per 1,000 sources plus token usage for Live Search

## Threshold Support

Like `llm-rubric`, the `search-rubric` assertion supports thresholds:

```yaml
assert:
  - type: search-rubric
    value: 'Contains accurate information about current US inflation rate'
    threshold: 0.9 # Requires 90% accuracy for economic data
```

## Best Practices

1. **Write clear rubrics**: Be specific about what information you expect
2. **Use thresholds appropriately**: Higher thresholds for factual accuracy, lower for general correctness
3. **Include acceptable ranges**: For volatile data like prices, specify acceptable accuracy (e.g., "within 5%")
4. **Use caching**: Caching is enabled by default; use `promptfoo eval --no-cache` to force fresh searches
5. **Test variable substitution**: Ensure your rubrics work with different variable values

## Expected Behavior

Understanding how `search-rubric` evaluates different scenarios helps you write better tests.

### What the grader catches

The search-enabled grader identifies several types of failures:

| SUT Response                            | Grader Verdict | Reason                            |
| --------------------------------------- | -------------- | --------------------------------- |
| "I don't have access to real-time data" | **Fail**       | No actual answer provided         |
| Stale price from training data          | **Fail**       | Value differs from current market |
| Correct current price                   | **Pass**       | Matches web search results        |
| Partially correct answer                | **Partial**    | Score reflects completeness       |

### Models without web search

Models like `gpt-4o-mini` without web search enabled will often refuse to answer real-time questions:

> "I don't have access to real-time stock data. For current prices, please check a financial website."

The `search-rubric` grader correctly flags this as a failure since no actual information was provided. This is the expected behavior—the assertion is verifying whether your system provides accurate current information, not whether it gracefully declines.

**To test models that confidently answer (and potentially hallucinate):**

- Use a more capable model as the system under test
- Enable web search on your SUT if available
- Test against models known to attempt answers even when uncertain

### Partial matches and scoring

The grader returns a score from 0.0 to 1.0 based on how well the output matches the rubric:

- **1.0**: Fully matches all rubric criteria
- **0.7-0.9**: Matches most criteria, minor issues
- **0.4-0.6**: Partial match, missing key information
- **0.0-0.3**: Significant errors or refusal to answer

Use the `threshold` parameter to set your acceptable score level.

## Troubleshooting

### "No provider with web search capabilities"

Ensure your grading provider supports web search. Default providers without web search configuration will fail. Check the [Grading Providers](#grading-providers) section above.

### Test always fails with refusal

If your SUT consistently refuses to answer real-time questions, this is expected behavior for models without web access. The `search-rubric` grader is correctly identifying that no factual answer was provided.

**Solutions:**

1. Use a model with web search capabilities as your SUT
2. Accept that models without real-time access cannot answer these questions
3. Use `llm-rubric` instead if you only need to verify the response format

### Inaccurate results

The grader relies on web search results, which may occasionally be wrong or ambiguous.

**Best practices:**

- Write rubrics that can be verified from multiple reputable sources
- Avoid rubrics about speculative or disputed claims
- Use appropriate thresholds (not 1.0) to allow for minor discrepancies

### High costs

Web search adds cost on top of model tokens.

**Cost reduction strategies:**

- Caching is enabled by default to reduce API calls
- Reserve `search-rubric` for tests that truly need real-time verification
- Use `llm-rubric` for static fact-checking that doesn't require current data
- Consider Perplexity's `sonar` model for built-in search without per-call fees