---
description: Monitor and optimize LLM usage through Helicone's AI gateway with unified access, caching, and comprehensive observability
---

# Helicone AI Gateway

[Helicone AI Gateway](https://github.com/Helicone/ai-gateway) is an open-source, self-hosted AI gateway that provides a unified OpenAI-compatible interface for 100+ LLM providers. The Helicone provider in promptfoo allows you to route requests through a locally running Helicone AI Gateway instance.

## Benefits

- **Unified Interface**: Use OpenAI SDK syntax to access 100+ different LLM providers
- **Load Balancing**: Smart provider selection based on latency, cost, or custom strategies
- **Caching**: Intelligent response caching to reduce costs and improve performance
- **Rate Limiting**: Built-in rate limiting and usage controls
- **Observability**: Optional integration with Helicone's observability platform
- **Self-Hosted**: Run your own gateway instance for full control

## Setup

### Start Helicone AI Gateway

First, start a local Helicone AI Gateway instance:

```bash
# Set your provider API keys
export OPENAI_API_KEY=your_openai_key
export ANTHROPIC_API_KEY=your_anthropic_key
export GROQ_API_KEY=your_groq_key

# Start the gateway
npx @helicone/ai-gateway@latest
```

The gateway will start on `http://localhost:8080` by default.

### Installation

No additional dependencies are required. The Helicone provider is built into promptfoo and works with any running Helicone AI Gateway instance.

## Usage

### Basic Usage

To route requests through your local Helicone AI Gateway:

```yaml
providers:
  - helicone:openai/gpt-5-mini
  - helicone:anthropic/claude-3-5-sonnet
  - helicone:groq/llama-3.1-8b-instant
```

The model format is `provider/model` as supported by the Helicone AI Gateway.

### Custom Configuration

For more advanced configuration:

```yaml
providers:
  - id: helicone:openai/gpt-4o
    config:
      # Gateway configuration
      baseUrl: http://localhost:8080 # Custom gateway URL
      router: production # Use specific router
      # Standard OpenAI options
      temperature: 0.7
      max_tokens: 1500
      headers:
        Custom-Header: 'custom-value'
```

### Using Custom Router

If your Helicone AI Gateway is configured with custom routers:

```yaml
providers:
  - id: helicone:openai/gpt-4o
    config:
      router: production
  - id: helicone:openai/gpt-3.5-turbo
    config:
      router: development
```

## Configuration Options

### Provider Format

The Helicone provider uses the format: `helicone:provider/model`

Examples:

- `helicone:openai/gpt-4o`
- `helicone:anthropic/claude-3-5-sonnet`
- `helicone:groq/llama-3.1-8b-instant`

### Supported Models

The Helicone AI Gateway supports 100+ models from various providers. Some popular examples:

| Provider  | Example Models                                                    |
| --------- | ----------------------------------------------------------------- |
| OpenAI    | `openai/gpt-4o`, `openai/gpt-5-mini`, `openai/o1-preview`         |
| Anthropic | `anthropic/claude-3-5-sonnet`, `anthropic/claude-3-haiku`         |
| Groq      | `groq/llama-3.1-8b-instant`, `groq/llama-3.1-70b-versatile`       |
| Meta      | `meta-llama/Llama-3-8b-chat-hf`, `meta-llama/Llama-3-70b-chat-hf` |
| Google    | `google/gemma-7b-it`, `google/gemma-2b-it`                        |

For a complete list, see the [Helicone AI Gateway documentation](https://github.com/Helicone/ai-gateway).

### Configuration Parameters

#### Gateway Options

- `baseUrl` (string): Helicone AI Gateway URL (defaults to `http://localhost:8080`)
- `router` (string): Custom router name (optional, uses `/ai` endpoint if not specified)
- `model` (string): Override the model name from the provider specification
- `apiKey` (string): Custom API key (defaults to `placeholder-api-key`)

#### OpenAI-Compatible Options

Since the provider extends OpenAI's chat completion provider, all standard OpenAI options are supported:

- `temperature`: Controls randomness (0.0 to 1.0)
- `max_tokens`: Maximum number of tokens to generate
- `top_p`: Nucleus sampling parameter
- `frequency_penalty`: Penalizes frequent tokens
- `presence_penalty`: Penalizes new tokens based on presence
- `stop`: Stop sequences
- `headers`: Additional HTTP headers

## Examples

### Basic OpenAI Integration

```yaml
providers:
  - helicone:openai/gpt-5-mini

prompts:
  - "Translate '{{text}}' to French"

tests:
  - vars:
      text: 'Hello world'
    assert:
      - type: contains
        value: 'Bonjour'
```

### Multi-Provider Comparison with Observability

```yaml
providers:
  - id: helicone:openai/gpt-4o
    config:
      tags: ['openai', 'gpt4']
      properties:
        model_family: 'gpt-4'

  - id: helicone:anthropic/claude-3-5-sonnet-20241022
    config:
      tags: ['anthropic', 'claude']
      properties:
        model_family: 'claude-3'

prompts:
  - 'Write a creative story about {{topic}}'

tests:
  - vars:
      topic: 'a robot learning to paint'
```

### Custom Provider with Full Configuration

```yaml
providers:
  - id: helicone:openai/gpt-4o
    config:
      baseUrl: https://custom-gateway.example.com:8080
      router: production
      apiKey: your_custom_api_key
      temperature: 0.7
      max_tokens: 1000
      headers:
        Authorization: Bearer your_target_provider_api_key
        Custom-Header: custom-value

prompts:
  - 'Answer the following question: {{question}}'

tests:
  - vars:
      question: 'What is artificial intelligence?'
```

### Caching and Performance Optimization

```yaml
providers:
  - id: helicone:openai/gpt-3.5-turbo
    config:
      cache: true
      properties:
        cache_strategy: 'aggressive'
        use_case: 'batch_processing'

prompts:
  - 'Summarize: {{text}}'

tests:
  - vars:
      text: 'Large text content to summarize...'
    assert:
      - type: latency
        threshold: 2000 # Should be faster due to caching
```

## Features

### Request Monitoring

All requests routed through Helicone are automatically logged with:

- Request/response payloads
- Token usage and costs
- Latency metrics
- Custom properties and tags

### Cost Analytics

Track costs across different providers and models:

- Per-request cost breakdown
- Aggregated cost analytics
- Cost optimization recommendations

### Caching

Intelligent response caching:

- Semantic similarity matching
- Configurable cache duration
- Cost reduction through cache hits

### Rate Limiting

Built-in rate limiting:

- Per-user limits
- Per-session limits
- Custom rate limiting rules

## Best Practices

1. **Use Meaningful Tags**: Tag your requests with relevant metadata for better analytics
2. **Track Sessions**: Use session IDs to track conversation flows
3. **Enable Caching**: For repeated or similar requests, enable caching to reduce costs
4. **Monitor Costs**: Regularly review cost analytics in the Helicone dashboard
5. **Custom Properties**: Use custom properties to segment and analyze your usage

## Troubleshooting

### Common Issues

1. **Authentication Failed**: Ensure your `HELICONE_API_KEY` is set correctly
2. **Unknown Provider**: Check that the provider is in the supported list or use a custom `targetUrl`
3. **Request Timeout**: Check your network connection and target provider availability

### Debug Mode

Enable debug logging to see detailed request/response information:

```bash
LOG_LEVEL=debug promptfoo eval
```

## Related Links

- [Helicone Documentation](https://docs.helicone.ai/)
- [Helicone Dashboard](https://helicone.ai/dashboard)
- [Helicone GitHub](https://github.com/Helicone/helicone)
- [promptfoo Provider Guide](/docs/providers/)