--- sidebar_position: 12 sidebar_label: Test Cases title: Test Case Configuration - Variables, Assertions, and Data description: Configure test cases for LLM evaluation with variables, assertions, CSV data, and dynamic generation. Learn inline tests, external files, and media support. keywords: [ test cases, LLM testing, evaluation data, assertions, CSV tests, variables, dynamic testing, test automation, ] pagination_prev: configuration/prompts pagination_next: configuration/scenarios --- # Test Case Configuration Define evaluation scenarios with variables, assertions, and test data. ## Inline Tests The simplest way to define tests is directly in your config: ```yaml title="promptfooconfig.yaml" tests: - vars: question: 'What is the capital of France?' assert: - type: contains value: 'Paris' - vars: question: 'What is 2 + 2?' assert: - type: equals value: '4' ``` ### Test Structure Each test case can include: ```yaml tests: - description: 'Optional test description' vars: # Variables to substitute in prompts var1: value1 var2: value2 assert: # Expected outputs and validations - type: contains value: 'expected text' metadata: # Filterable metadata category: math difficulty: easy ``` ### Filtering Tests by Provider Control which providers run specific tests using the `providers` field. This allows you to run different test suites against different models in a single evaluation: ```yaml providers: - id: openai:gpt-3.5-turbo label: fast-model - id: openai:gpt-4 label: smart-model tests: # Only run on fast-model - vars: question: 'What is 2 + 2?' providers: - fast-model assert: - type: equals value: '4' # Only run on smart-model - vars: question: 'Explain quantum entanglement' providers: - smart-model assert: - type: llm-rubric value: 'Provides accurate physics explanation' ``` **Matching syntax:** | Pattern | Matches | | -------------- | -------------------------------------------------------------------- | | `fast-model` | Exact label match | | `openai:gpt-4` | Exact provider ID match | | `openai:*` | Wildcard - any provider starting with `openai:` | | `openai` | Legacy prefix - matches `openai:gpt-4`, `openai:gpt-3.5-turbo`, etc. | **Apply to all tests using `defaultTest`:** ```yaml defaultTest: providers: - openai:* # All tests default to OpenAI providers only tests: - vars: question: 'Simple question' - vars: question: 'Complex question' providers: - smart-model # Override default for this test ``` **Edge cases:** - **No filter**: Without the `providers` field, the test runs against all providers (cross-product behavior) - **Empty array**: `providers: []` means the test runs on no providers and is effectively skipped - **Stacking with providerPromptMap**: When both `providers` and `providerPromptMap` are set, they filter together—a provider must match both to run - **CLI `--filter-providers`**: If you use `--filter-providers` to filter providers at the CLI level, validation only sees the filtered providers. Tests referencing providers excluded by `--filter-providers` will fail validation ### Filtering Tests by Prompt By default, each test runs against all prompts (a cartesian product). You can use the `prompts` field to restrict a test to specific prompts: ```yaml prompts: - id: prompt-factual label: Factual Assistant raw: 'You are a factual assistant. Answer: {{question}}' - id: prompt-creative label: Creative Writer raw: 'You are a creative writer. Answer: {{question}}' providers: - openai:gpt-4o-mini tests: # This test only runs with the Factual Assistant prompt - vars: question: 'What is the capital of France?' prompts: - Factual Assistant assert: - type: contains value: 'Paris' # This test only runs with the Creative Writer prompt - vars: question: 'Write a poem about Paris' prompts: - prompt-creative # You can reference by ID or label assert: - type: llm-rubric value: 'Contains poetic language' # This test runs with all prompts (default behavior) - vars: question: 'Hello' ``` The `prompts` field accepts: - **Exact labels**: `prompts: ['Factual Assistant']` - **Exact IDs**: `prompts: ['prompt-factual']` - **Wildcard patterns**: `prompts: ['Math:*']` matches `Math:Basic`, `Math:Advanced`, etc. - **Prefix patterns**: `prompts: ['Math']` matches `Math:Basic`, `Math:Advanced` (legacy syntax) :::note Invalid prompt references will cause an error at config load time. This strict validation catches typos early. ::: You can also set a default prompt filter in `defaultTest`: ```yaml defaultTest: prompts: - Factual Assistant tests: # Inherits prompts: ['Factual Assistant'] from defaultTest - vars: question: 'What is 2+2?' # Override to use a different prompt - vars: question: 'Write a story' prompts: - Creative Writer ``` ## External Test Files For larger test suites, store tests in separate files: ```yaml title="promptfooconfig.yaml" tests: file://tests.yaml ``` Or load multiple files: ```yaml tests: - file://basic_tests.yaml - file://advanced_tests.yaml - file://edge_cases/*.yaml ``` ## CSV Format CSV or Excel (XLSX) files are ideal for bulk test data: ```yaml title="promptfooconfig.yaml" tests: file://test_cases.csv ``` ```yaml title="promptfooconfig.yaml" tests: file://test_cases.xlsx ``` ### Basic CSV ```csv title="test_cases.csv" question,expectedAnswer "What is 2+2?","4" "What is the capital of France?","Paris" "Who wrote Romeo and Juliet?","Shakespeare" ``` Variables are automatically mapped from column headers. ### Excel (XLSX/XLS) Support Excel files (.xlsx and .xls) are supported as an optional feature. To use Excel files: 1. Install the `read-excel-file` package as a peer dependency: ```bash npm install read-excel-file ``` 2. Use Excel files just like CSV files: ```yaml title="promptfooconfig.yaml" tests: file://test_cases.xlsx ``` **Multi-sheet support:** By default, only the first sheet is used. To specify a different sheet, use the `#` syntax: - `file://test_cases.xlsx#Sheet2` - Select sheet by name - `file://test_cases.xlsx#2` - Select sheet by 1-based index (2 = second sheet) ```yaml title="promptfooconfig.yaml" # Use a specific sheet by name tests: file://test_cases.xlsx#DataSheet # Or by index (1-based) tests: file://test_cases.xlsx#2 ``` ### XLSX Example Your Excel file should have column headers in the first row, with each subsequent row representing a test case: | question | expectedAnswer | | ------------------------------ | -------------- | | What is 2+2? | 4 | | What is the capital of France? | Paris | | Name a primary color | blue | **Tips for Excel files:** - First row must contain column headers - Column names become variable names in your prompts - Empty cells are treated as empty strings - Use `__expected` columns for assertions (same as CSV) ### CSV with Assertions Use special `__expected` columns for assertions: ```csv title="test_cases.csv" input,__expected "Hello world","contains: Hello" "Calculate 5 * 6","equals: 30" "What's the weather?","llm-rubric: Provides weather information" ``` Values without a type prefix default to `equals`: | `__expected` value | Assertion type | | --------------------------------- | ---------------------------- | | `Paris` | `equals` | | `contains:Paris` | `contains` | | `factuality:The capital is Paris` | `factuality` | | `similar(0.8):Hello there` | `similar` with 0.8 threshold | Multiple assertions: ```csv title="test_cases.csv" question,__expected1,__expected2,__expected3 "What is 2+2?","equals: 4","contains: four","javascript: output.length < 10" ``` :::note **contains-any** and **contains-all** expect comma-delimited values inside the `__expected` column. ```csv title="test_cases.csv" translated_text,__expected "Hola mundo","contains-any: ," ``` If you write `"contains-any: "`, promptfoo treats ` ` as a single search term rather than two separate tags. ::: ### Special CSV Columns | Column | Purpose | Example | | ----------------------------------------------------------- | -------------------------------------------------------- | ----------------------------------------------------------------- | | `__expected` | Single assertion | `contains: Paris` | | `__expected1`, `__expected2`, ... | Multiple assertions | `equals: 42` | | `__description` | Test description | `Basic math test` | | `__prefix` | Prepend to prompt | `You must answer: ` | | `__suffix` | Append to prompt | ` (be concise)` | | `__metric` | Display name in reports (does not change assertion type) | `accuracy` | | `__threshold` | Pass threshold (applies to all asserts) | `0.8` | | `__metadata:*` | Filterable metadata | See below | | `__config:__expected:` or `__config:__expectedN:` | Set configuration for all or specific assertions | `__config:__expected:threshold`, `__config:__expected2:threshold` | Using `__metadata` without a key is not supported. Specify the metadata field like `__metadata:category`. If a CSV file includes a `__metadata` column without a key, Promptfoo logs a warning and ignores the column. ### Metadata in CSV Add filterable metadata: ```csv title="test_cases.csv" question,__expected,__metadata:category,__metadata:difficulty "What is 2+2?","equals: 4","math","easy" "Explain quantum physics","llm-rubric: Accurate explanation","science","hard" ``` Array metadata with `[]`: ```csv topic,__metadata:tags[] "Machine learning","ai,technology,data science" "Climate change","environment,science,global\,warming" ``` Filter tests: ```bash promptfoo eval --filter-metadata category=math promptfoo eval --filter-metadata difficulty=easy promptfoo eval --filter-metadata tags=ai # Multiple filters use AND logic (tests must match ALL conditions) promptfoo eval --filter-metadata category=math --filter-metadata difficulty=easy ``` ### JSON in CSV Include structured data: ```csv title="test_cases.csv" query,context,__expected "What's the temperature?","{""location"":""NYC"",""units"":""celsius""}","contains: celsius" ``` Access in prompts: ```yaml prompts: - 'Query: {{query}}, Location: {{(context | load).location}}' ``` ### CSV with defaultTest Apply the same assertions to all tests loaded from a CSV file using [`defaultTest`](/docs/configuration/guide#default-test-cases): ```yaml title="promptfooconfig.yaml" defaultTest: assert: - type: factuality value: '{{reference_answer}}' options: provider: openai:gpt-5.2 tests: file://tests.csv ``` ```csv title="tests.csv" question,reference_answer "What does GPT stand for?","Generative Pre-trained Transformer" "What is the capital of France?","Paris is the capital of France" ``` Use regular column names (like `reference_answer`) instead of `__expected` when referencing values in `defaultTest` assertions. The `__expected` column automatically creates assertions per row. ## Dynamic Test Generation Generate tests programmatically: ### JavaScript/TypeScript ```yaml title="promptfooconfig.yaml" tests: file://generate_tests.js ``` ```javascript title="generate_tests.js" module.exports = async function () { // Fetch data, compute test cases, etc. const testCases = []; for (let i = 1; i <= 10; i++) { testCases.push({ description: `Test case ${i}`, vars: { number: i, squared: i * i, }, assert: [ { type: 'contains', value: String(i * i), }, ], }); } return testCases; }; ``` ### Python ```yaml title="promptfooconfig.yaml" tests: file://generate_tests.py:create_tests ``` ```python title="generate_tests.py" import json def create_tests(): test_cases = [] # Load test data from database, API, etc. test_data = load_test_data() for item in test_data: test_cases.append({ "vars": { "input": item["input"], "context": item["context"] }, "assert": [{ "type": "contains", "value": item["expected"] }] }) return test_cases ``` ### With Configuration Pass configuration to generators: ```yaml title="promptfooconfig.yaml" tests: - path: file://generate_tests.py:create_tests config: dataset: 'validation' category: 'math' sample_size: 100 ``` ```python title="generate_tests.py" def create_tests(config): dataset = config.get('dataset', 'train') category = config.get('category', 'all') size = config.get('sample_size', 50) # Use configuration to generate tests return generate_test_cases(dataset, category, size) ``` ## JSON/JSONL Format ### JSON Array ```json title="tests.json" [ { "vars": { "topic": "artificial intelligence" }, "assert": [ { "type": "contains", "value": "AI" } ] }, { "vars": { "topic": "climate change" }, "assert": [ { "type": "llm-rubric", "value": "Discusses environmental impact" } ] } ] ``` ### JSONL (One test per line) ```jsonl title="tests.jsonl" {"vars": {"x": 5, "y": 3}, "assert": [{"type": "equals", "value": "8"}]} {"vars": {"x": 10, "y": 7}, "assert": [{"type": "equals", "value": "17"}]} ``` ## Loading Media Files Include images, PDFs, and other files as variables: ```yaml title="promptfooconfig.yaml" tests: - vars: image: file://images/chart.png document: file://docs/report.pdf data: file://data/config.yaml ``` ### Path Resolution `file://` paths are resolved relative to your **config file's directory**, not the current working directory. This ensures consistent behavior regardless of where you run `promptfoo` from: ```yaml title="src/tests/promptfooconfig.yaml" tests: - vars: # Resolved as src/tests/data/input.json data: file://./data/input.json # Also works - resolved as src/tests/data/input.json data2: file://data/input.json # Parent directory - resolved as src/shared/context.json shared: file://../shared/context.json ``` Without the `file://` prefix, values are passed as plain strings to your provider. ### Supported File Types | Type | Handling | Usage | | ----------------------- | ------------------- | ----------------- | | Images (png, jpg, etc.) | Converted to base64 | Vision models | | Videos (mp4, etc.) | Converted to base64 | Multimodal models | | PDFs | Text extraction | Document analysis | | Text files | Loaded as string | Any use case | | YAML/JSON | Parsed to object | Structured data | ### Example: Vision Model Test ```yaml tests: - vars: image: file://test_image.jpg question: 'What objects are in this image?' assert: - type: contains value: 'dog' ``` In your prompt: ```json [ { "role": "user", "content": [ { "type": "text", "text": "{{question}}" }, { "type": "image_url", "image_url": { "url": "data:image/jpeg;base64,{{image}}" } } ] } ] ``` ## Best Practices ### 1. Organize Test Data ``` project/ ├── promptfooconfig.yaml ├── prompts/ │ └── main_prompt.txt └── tests/ ├── basic_functionality.csv ├── edge_cases.yaml └── regression_tests.json ``` ### 2. Use Descriptive Names ```yaml tests: - description: 'Test French translation with formal tone' vars: text: 'Hello' language: 'French' tone: 'formal' ``` ### 3. Group Related Tests ```yaml # Use metadata for organization tests: - vars: query: 'Reset password' metadata: feature: authentication priority: high ``` ### 4. Combine Approaches ```yaml tests: # Quick smoke tests inline - vars: test: 'quick check' # Comprehensive test suite from file - file://tests/full_suite.csv # Dynamic edge case generation - file://tests/generate_edge_cases.js ``` ## Common Patterns ### A/B Testing Variables ```csv title="ab_tests.csv" message_style,greeting,__expected "formal","Good morning","contains: Good morning" "casual","Hey there","contains: Hey" "friendly","Hello!","contains: Hello" ``` ### Error Handling Tests ```yaml tests: - description: 'Handle empty input' vars: input: '' assert: - type: contains value: 'provide more information' ``` ### Performance Tests ```yaml tests: - vars: prompt: 'Simple question' assert: - type: latency threshold: 1000 # milliseconds ``` ### Passing Arrays to Assertions By default, array variables expand into multiple test cases. To pass an array directly to assertions like `contains-any`, disable variable expansion: ```yaml defaultTest: options: disableVarExpansion: true assert: - type: contains-any value: '{{expected_values}}' tests: - description: 'Check for any valid response' vars: expected_values: ['option1', 'option2', 'option3'] ``` ## External Data Sources ### Google Sheets See [Google Sheets integration](/docs/integrations/google-sheets) for details on loading test data directly from spreadsheets. ### SharePoint See [SharePoint integration](/docs/integrations/sharepoint) for details on loading test data from Microsoft SharePoint document libraries. ### HuggingFace Datasets See [HuggingFace Datasets](/docs/configuration/huggingface-datasets) for instructions on importing test cases from existing datasets.