# eval-max-score-selection (Max-Score Selection)

You can run this example with:

```bash
npx promptfoo@latest init --example eval-max-score-selection
cd eval-max-score-selection
```

This example demonstrates the `max-score` assertion type for objective output selection based on aggregated scores from other assertions.

## Overview

The `max-score` assertion provides a deterministic way to select the best output from multiple providers by:

- Aggregating scores from other assertions (correctness, quality, documentation, etc.)
- Applying configurable weights to different assertion types
- Selecting the output with the highest weighted score
- Providing objective, reproducible selection criteria

## Key Differences from `select-best`

- **Objective**: Uses quantifiable scores rather than LLM judgment
- **Deterministic**: Same inputs always produce same selection
- **Transparent**: Clear scoring methodology based on weighted assertions
- **Cost-effective**: No additional LLM calls for selection

## Configuration

```yaml
- type: max-score
  value:
    method: average # 'average' (default) or 'sum'
    weights:
      python: 3 # Weight for Python code correctness tests
      llm-rubric: 1 # Weight for LLM-evaluated quality rubrics
      javascript: 2 # Weight for JavaScript tests
      contains: 0.5 # Weight for simple string matching
    threshold: 0.7 # Optional minimum score threshold
```

### Options

- **method**: How to aggregate scores
  - `average` (default): Weighted average of assertion scores
  - `sum`: Weighted sum of assertion scores
- **weights**: Map of assertion types to their weights (default: 1.0)
- **threshold**: Minimum score required for selection (optional)

## Usage

### Basic Example

```bash
# Run the main example (requires API keys for OpenAI/Anthropic)
npx promptfoo@latest eval
```

## How It Works

1. **Multiple Outputs Generated**: Each provider generates a solution
2. **Assertions Evaluated**: All assertions run on each output:
   - Python tests verify correctness (pass=1, fail=0)
   - LLM rubrics evaluate quality aspects (0-1 score)
   - Other assertions contribute their scores
3. **Scores Aggregated**: Max-score calculates weighted score for each output
4. **Best Selected**: Output with highest score is marked as passing
5. **Results Shown**: Clear indication of which output won and why

## Example Scoring

Given three outputs with these assertion results:

- Output A: python=1.0, documentation=0.5, efficiency=0.7
- Output B: python=1.0, documentation=0.9, efficiency=0.8
- Output C: python=0.0, documentation=1.0, efficiency=1.0

With weights: python=3, llm-rubric=1

- Output A: (3×1.0 + 1×0.5 + 1×0.7) / 5 = 0.84
- Output B: (3×1.0 + 1×0.9 + 1×0.8) / 5 = 0.94 ✓ (selected)
- Output C: (3×0.0 + 1×1.0 + 1×1.0) / 5 = 0.40

## When to Use max-score

Use `max-score` when:

- You have objective criteria (tests, metrics)
- You want reproducible results
- You need to weight different aspects differently
- You want to avoid additional API costs

Use `select-best` when:

- You need subjective judgment
- The criteria are hard to quantify
- You want nuanced evaluation of quality