# compare-openai-models (OpenAI Model Comparison)

This example compares OpenAI's `gpt-5.4` with `gpt-5.4-mini` across various riddles and reasoning tasks.

You can run this example with:

```bash
npx promptfoo@latest init --example compare-openai-models
cd compare-openai-models
```

## Quick Start

1. Initialize this example by running:

   ```bash
   npx promptfoo@latest init --example compare-openai-models
   ```

2. Navigate to the newly created `compare-openai-models` directory:

   ```bash
   cd compare-openai-models
   ```

3. Set an OpenAI API key directly in your environment:

   ```bash
   export OPENAI_API_KEY="your_openai_api_key"
   ```

   Alternatively, you can set the API key in a `.env` file:

   ```bash
   OPENAI_API_KEY=your_openai_api_key
   ```

4. Run the evaluation with:

   ```bash
   npx promptfoo@latest eval --no-cache
   ```

   Note: the `--no-cache` flag is required because the example uses a [latency assertion](https://www.promptfoo.dev/docs/configuration/expected-outputs/deterministic/#latency) which does not support caching.

5. View the results:

   ```bash
   npx promptfoo@latest view
   ```

   The expected output will include the responses from both models for the provided riddles, allowing you to compare their performance side by side.

## What this example demonstrates

This example compares OpenAI's GPT-5.4 with GPT-5.4 Mini across various riddles and puzzles. It demonstrates:

- **Model comparison**: Side-by-side evaluation of `gpt-5.4` vs `gpt-5.4-mini`
- **Cost and latency assertions**: Ensuring responses meet performance thresholds
- **Content validation**: Using `contains` assertions to verify specific answers
- **LLM-based grading**: Using `llm-rubric` assertions for nuanced evaluation criteria
- **Diverse test cases**: A variety of riddles testing different reasoning capabilities