# openai-deep-research (OpenAI Deep Research Models)

You can run this example with:

```bash
npx promptfoo@latest init --example openai-deep-research
cd openai-deep-research
```

This example demonstrates OpenAI's deep research models with web search capabilities via the Responses API.

## Important Notes

⚠️ **Response Times**: Deep research models can take **2-10 minutes** to complete research tasks as they perform extensive web searches and reasoning.

⚠️ **Token Usage**: These models use significant tokens for internal reasoning. Always set high `max_output_tokens` (50,000+) to avoid incomplete responses.

⚠️ **Access**: Deep research models may require special access from OpenAI. Check your API access if you encounter persistent 429 errors.

## Setup

1. Set your OpenAI API key:

```bash
export OPENAI_API_KEY=your-key-here
```

2. Run the evaluation with appropriate timeout:

```bash
# Set a 10-minute timeout for deep research tasks
export PROMPTFOO_EVAL_TIMEOUT_MS=600000
promptfoo eval
```

For local development:

```bash
PROMPTFOO_EVAL_TIMEOUT_MS=600000 npm run local -- eval -c examples/openai-deep-research/promptfooconfig.yaml
```

## What's happening?

This example:

- Tests OpenAI's `o4-mini-deep-research` model with web search tools
- Evaluates research capabilities on machine learning and space exploration topics
- Uses the model's ability to automatically search the web for current information
- Checks that responses contain relevant technical terminology
- Demonstrates handling of web search results and citations

The model automatically decides when to use web search to provide comprehensive, up-to-date answers.

## Configuration Details

```yaml
providers:
  - id: openai:responses:o4-mini-deep-research
    config:
      max_output_tokens: 50000 # Required for complete research responses
      tools:
        - type: web_search_preview # Required for deep research models
      # Optional parameters:
      # max_tool_calls: 50 # Control number of searches (default: unlimited)
      # background: true # Use background mode for long-running tasks
      # store: true # Store the conversation for 30 days
```

## Available Models

- `o3-deep-research` - Most powerful deep research model ($10/1M input, $40/1M output)
- `o3-deep-research-2025-06-26` - Snapshot version
- `o4-mini-deep-research` - Faster, more affordable ($2/1M input, $8/1M output)
- `o4-mini-deep-research-2025-06-26` - Snapshot version

## Advanced Features

### Background Mode (Recommended)

For production use, run deep research tasks in background mode to avoid timeouts:

```yaml
providers:
  - id: openai:responses:o4-mini-deep-research
    config:
      background: true
      webhook_url: https://your-api.com/webhook # Optional: Get notified when complete
```

### Using Code Interpreter

Deep research models can analyze data using code:

```yaml
providers:
  - id: openai:responses:o4-mini-deep-research
    config:
      tools:
        - type: web_search_preview
        - type: code_interpreter
          container:
            type: auto
```

### MCP Server Integration

Connect to private data sources using MCP servers:

```yaml
providers:
  - id: openai:responses:o4-mini-deep-research
    config:
      tools:
        - type: web_search_preview
        - type: mcp
          server_label: mycompany_mcp
          server_url: https://mycompany.com/mcp
          require_approval: never # Required for deep research
```

### Prompt Enhancement

For better results, consider preprocessing user queries:

1. **Clarification**: Use a faster model to gather context
2. **Prompt rewriting**: Expand the query with specific requirements
3. **Deep research**: Pass the enhanced prompt to the research model

See the [OpenAI Deep Research Guide](https://platform.openai.com/docs/guides/deep-research) for detailed examples.

## Response Format

Deep research responses include:

- **output_text**: The final research report with inline citations
- **annotations**: Citation details with URLs and titles
- **web_search_call**: Details of searches performed
- **code_interpreter_call**: Any code analysis performed

## Troubleshooting

- **Timeouts**: Increase `PROMPTFOO_EVAL_TIMEOUT_MS` if evaluations time out
- **Incomplete responses**: Increase `max_output_tokens` to 50,000 or higher
- **429 errors**: May indicate rate limits or access restrictions
- **Tool validation errors**: Ensure `web_search_preview` is configured

## Best Practices

1. **Always use high token limits**: Set `max_output_tokens: 50000` or higher
2. **Handle long response times**: Use background mode or set high timeouts
3. **Monitor costs**: These models use significant tokens for reasoning
4. **Validate citations**: Check that returned URLs are accessible
5. **Consider prompt enhancement**: Preprocess queries for better results

## Learn More

- [OpenAI Deep Research Guide](https://platform.openai.com/docs/guides/deep-research)
- [Promptfoo Documentation](https://promptfoo.dev/docs)
- [MCP Integration Guide](https://platform.openai.com/docs/mcp)
- [Building a Deep Research Compatible MCP Server](mcp-server-example.md)