---
sidebar_position: 51
sidebar_label: Python
description: Create advanced Python validation scripts with complex logic, external APIs, and ML libraries for sophisticated output grading
---

# Python assertions

The `python` assertion allows you to provide a custom Python function to validate the LLM output.

:::tip Python Overview

For an overview of all Python integrations (providers, assertions, test generators, prompts), see the [Python integration guide](/docs/integrations/python).

:::

A variable named `output` is injected into the context. The function should return `true` if the output passes the assertion, and `false` otherwise. If the function returns a number, it will be treated as a score.

Example:

```yaml
assert:
  - type: python
    value: output[5:10] == 'Hello'
```

You may also return a number, which will be treated as a score:

```yaml
assert:
  - type: python
    value: math.log10(len(output)) * 10
```

## Multiline functions

Python assertions support multiline strings:

```yaml
assert:
  - type: python
    value: |
      # Insert your scoring logic here...
      if output == 'Expected output':
          return {
            'pass': True,
            'score': 0.5,
          }
      return {
        'pass': False,
        'score': 0,
      }
```

## Using test context

A `context` object is available in the Python function. Here is its type definition:

```py
from typing import Any, Dict, List, Optional, TypedDict, Union

class TraceSpan(TypedDict):
    spanId: str
    parentSpanId: Optional[str]
    name: str
    startTime: int  # Unix timestamp in milliseconds
    endTime: Optional[int]  # Unix timestamp in milliseconds
    attributes: Optional[Dict[str, Any]]
    statusCode: Optional[int]
    statusMessage: Optional[str]

class TraceData(TypedDict):
    traceId: str
    spans: List[TraceSpan]

class AssertionValueFunctionContext(TypedDict):
    # Raw prompt sent to LLM
    prompt: Optional[str]

    # Test case variables
    vars: Dict[str, Union[str, object]]

    # The complete test case
    test: Dict[str, Any]  # Contains keys like "vars", "assert", "options"

    # Log probabilities from the LLM response, if available
    logProbs: Optional[list[float]]

    # Configuration passed to the assertion
    config: Optional[Dict[str, Any]]

    # The provider that generated the response
    provider: Optional[Any]  # ApiProvider type

    # The complete provider response
    providerResponse: Optional[Any]  # ProviderResponse type

    # OpenTelemetry trace data (when tracing is enabled)
    trace: Optional[TraceData]
```

For example, if the test case has a var `example`, access it in Python like this:

```yaml
tests:
  - description: 'Test with context'
    vars:
      example: 'Example text'
    assert:
      - type: python
        value: 'context["vars"]["example"] in output'
```

## External .py

To reference an external file, use the `file://` prefix:

```yaml
assert:
  - type: python
    value: file://relative/path/to/script.py
    config:
      outputLengthLimit: 10
```

You can specify a particular function to use by appending it after a colon:

```yaml
assert:
  - type: python
    value: file://relative/path/to/script.py:custom_assert
```

If no function is specified, it defaults to `get_assert`.

This file will be called with an `output` string and an `AssertionValueFunctionContext` object (see above).
It expects that either a `bool` (pass/fail), `float` (score), or `GradingResult` will be returned.

Here's an example `assert.py`:

```py
from typing import Dict, TypedDict, Union

# Default function name
def get_assert(output: str, context) -> Union[bool, float, Dict[str, Any]]:
    print('Prompt:', context['prompt'])
    print('Vars', context['vars']['topic'])

    # This return is an example GradingResult dict
    return {
      'pass': True,
      'score': 0.6,
      'reason': 'Looks good to me',
    }

# Custom function name
def custom_assert(output: str, context) -> Union[bool, float, Dict[str, Any]]:
    return len(output) > 10
```

This is an example of an assertion that uses data from a configuration defined in the assertion's YML file:

```py
from typing import Dict, Union

def get_assert(output: str, context) -> Union[bool, float, Dict[str, Any]]:
    return len(output) <= context.get('config', {}).get('outputLengthLimit', 0)
```

You can also return nested metrics and assertions via a `GradingResult` object:

```py
{
    'pass': True,
    'score': 0.75,
    'reason': 'Looks good to me',
    'componentResults': [{
        'pass': 'bananas' in output.lower(),
        'score': 0.5,
        'reason': 'Contains banana',
    }, {
        'pass': 'yellow' in output.lower(),
        'score': 0.5,
        'reason': 'Contains yellow',
    }]
}
```

### GradingResult types

Here's a Python type definition you can use for the [`GradingResult`](/docs/configuration/reference/#gradingresult) object:

```py
@dataclass
class GradingResult:
    pass_: bool  # 'pass' is a reserved keyword in Python
    score: float
    reason: str
    component_results: Optional[List['GradingResult']] = None
    named_scores: Optional[Dict[str, float]] = None  # Appear as metrics in the UI
```

:::tip Snake case support
Python snake_case fields are automatically mapped to camelCase:

- `pass_` → `pass` (or just use `"pass"` as a dictionary key)
- `named_scores` → `namedScores`
- `component_results` → `componentResults`
- `tokens_used` → `tokensUsed`
  :::

## Using trace data

When [tracing is enabled](/docs/tracing/), OpenTelemetry trace data is available in the `context.trace` object. This allows you to write assertions based on the execution flow:

```py
def get_assert(output: str, context) -> Union[bool, float, Dict[str, Any]]:
    # Check if trace data is available
    if not hasattr(context, 'trace') or context.trace is None:
        # Tracing not enabled, skip trace-based checks
        return True

    # Access trace spans
    spans = context.trace['spans']

    # Example: Check for errors in any span
    error_spans = [s for s in spans if s.get('statusCode', 0) >= 400]
    if error_spans:
        return {
            'pass': False,
            'score': 0,
            'reason': f"Found {len(error_spans)} error spans"
        }

    # Example: Calculate total trace duration
    if spans:
        duration = max(s.get('endTime', 0) for s in spans) - min(s['startTime'] for s in spans)
        if duration > 5000:  # 5 seconds
            return {
                'pass': False,
                'score': 0,
                'reason': f"Trace took too long: {duration}ms"
            }

    # Example: Check for specific operations
    api_calls = [s for s in spans if 'http' in s['name'].lower()]
    if len(api_calls) > 10:
        return {
            'pass': False,
            'score': 0,
            'reason': f"Too many API calls: {len(api_calls)}"
        }

    return True
```

Example YAML configuration:

```yaml
tests:
  - vars:
      query: "What's the weather?"
    assert:
      - type: python
        value: |
          # Ensure retrieval happened before response generation
          if context.trace:
              spans = context.trace['spans']
              retrieval_span = next((s for s in spans if 'retrieval' in s['name']), None)
              generation_span = next((s for s in spans if 'generation' in s['name']), None)
              
              if retrieval_span and generation_span:
                  return retrieval_span['startTime'] < generation_span['startTime']
          return True
```

## Overriding the Python binary

By default, promptfoo will run `python` in your shell. Make sure `python` points to the appropriate executable.

If a `python` binary is not present, you will see a "python: command not found" error.

To override the Python binary, set the `PROMPTFOO_PYTHON` environment variable. You may set it to a path (such as `/path/to/python3.11`) or just an executable in your PATH (such as `python3.11`).

## Negation

Use `not-python` to invert the final pass/fail result while preserving the returned score. Numeric scores are still compared against `threshold` before the result is inverted:

```yaml
assert:
  - type: not-python
    value: "'error' in output"
```

## Other assertion types

For more info on assertions, see [Test assertions](/docs/configuration/expected-outputs).