# provider-elevenlabs/agents (ElevenLabs Conversational Agents)

You can run this example with:

```bash
npx promptfoo@latest init --example provider-elevenlabs/agents
cd provider-elevenlabs/agents
```

Test and evaluate ElevenLabs voice AI agents with multi-turn conversations.

## What this tests

- **Agent conversation quality**: Multi-turn dialogue handling
- **Evaluation criteria**: Greeting, understanding, accuracy, helpfulness
- **Simulated user behavior**: Automated conversation testing
- **Tool usage**: Agent tool calls and responses
- **Cost and latency** metrics

## Setup

Set your ElevenLabs API key:

```bash
export ELEVENLABS_API_KEY=your_api_key_here
```

## Run the example

```bash
npx promptfoo@latest eval -c ./promptfooconfig.yaml
```

Or view in the UI:

```bash
npx promptfoo@latest eval -c ./promptfooconfig.yaml
npx promptfoo@latest view
```

## What to look for

1. **Conversation flow**: How well the agent maintains context across turns
2. **Evaluation scores**: Automated grading on multiple criteria (0-1 scale)
3. **Tool usage**: When and how the agent calls available tools
4. **Response quality**: Agent's ability to understand and respond accurately
5. **Cost tracking**: Per-conversation and per-turn costs

## Conversation formats

This example supports multiple input formats:

### 1. Plain text (treated as first user message)

```yaml
prompts:
  - 'Hello, I need help with my order'
```

### 2. Multi-line with role prefixes

```yaml
prompts:
  - |
    User: Hi, what's the weather like?
    Agent: I'd be happy to help! Where are you located?
    User: I'm in San Francisco
```

### 3. Structured JSON

```yaml
prompts:
  - |
    {
      "turns": [
        {"speaker": "user", "message": "Hello"},
        {"speaker": "agent", "message": "Hi! How can I help?"},
        {"speaker": "user", "message": "I need support"}
      ]
    }
```

## Agent configuration

Customize the agent behavior:

```yaml
config:
  agentConfig:
    name: Customer Support Agent
    prompt: You are a helpful, empathetic customer support agent...
    firstMessage: Hi! I'm here to help. What can I do for you today?
    language: en
    voiceId: 21m00Tcm4TlvDq8ikWAM
    llmModel: gpt-4o
    temperature: 0.7
    maxTokens: 500
```

## Evaluation criteria

Common criteria presets available:

- `greeting` - Professional greeting (weight: 0.8, threshold: 0.8)
- `understanding` - Accurate intent understanding (weight: 1.0, threshold: 0.9)
- `accuracy` - Correct information (weight: 1.0, threshold: 0.9)
- `helpfulness` - Helpful responses (weight: 0.9, threshold: 0.8)
- `professionalism` - Professional tone (weight: 0.7, threshold: 0.8)
- `empathy` - Empathetic responses (weight: 0.8, threshold: 0.7)
- `efficiency` - Concise responses (weight: 0.7, threshold: 0.7)
- `resolution` - Problem resolution (weight: 1.0, threshold: 0.8)

## Simulated user

Configure the simulated user's behavior:

```yaml
simulatedUser:
  prompt: Act as a customer who is frustrated but polite
  temperature: 0.8
  responseStyle: casual # concise | verbose | casual | formal
```

## Available tools

Example tools for agents:

- `get_weather` - Get current weather
- `search_knowledge_base` - Search documentation
- `create_ticket` - Create support ticket
- `send_email` - Send email notification
- `get_order_status` - Check order status
- `schedule_callback` - Schedule callback
- `transfer_agent` - Transfer to human agent

## Learn more

- [ElevenLabs Conversational AI Docs](https://elevenlabs.io/docs/conversational-ai)
- [Agent Configuration Guide](https://elevenlabs.io/docs/conversational-ai/agents)
- [Pricing](https://elevenlabs.io/pricing)