--- title: 'ElevenLabs' description: 'Test ElevenLabs AI audio capabilities: Text-to-Speech, Speech-to-Text, Conversational Agents, and audio processing tools' --- # ElevenLabs The ElevenLabs provider integrates multiple AI audio capabilities for comprehensive voice AI testing and evaluation. :::tip For a comprehensive step-by-step tutorial, see the [Evaluating ElevenLabs voice AI guide](/docs/guides/evaluate-elevenlabs/). ::: ## Quick Start Get started with ElevenLabs in 3 steps: 1. **Install and authenticate:** ```sh npm install -g promptfoo export ELEVENLABS_API_KEY=your_api_key_here ``` 2. **Create a config file** (`promptfooconfig.yaml`): ```yaml prompts: - 'Welcome to our customer service. How can I help you today?' providers: - id: elevenlabs:tts:rachel tests: - description: Generate welcome message assert: - type: cost threshold: 0.01 - type: latency threshold: 2000 ``` 3. **Run your first eval:** ```sh promptfoo eval ``` View results with `promptfoo view` or in the web UI. ## Setup Set your ElevenLabs API key as an environment variable: ```sh export ELEVENLABS_API_KEY=your_api_key_here ``` Alternatively, specify the API key directly in your configuration: ```yaml providers: - id: elevenlabs:tts config: apiKey: your_api_key_here ``` :::tip Get your API key from [ElevenLabs Settings](https://elevenlabs.io/app/settings/api-keys). Free tier includes 10,000 characters/month. ::: ## Capabilities The ElevenLabs provider supports multiple capabilities: ### Text-to-Speech (TTS) Generate high-quality voice synthesis with multiple models and voices: - `elevenlabs:tts:` - TTS with specified voice (e.g., `elevenlabs:tts:rachel`) - `elevenlabs:tts` - TTS with default voice **Models available:** - `eleven_flash_v2_5` - Fastest, lowest latency (~200ms) - `eleven_turbo_v2_5` - High quality, fast - `eleven_multilingual_v2` - Best for non-English languages - `eleven_monolingual_v1` - English only, high quality **Example:** ```yaml providers: - id: elevenlabs:tts:rachel config: modelId: eleven_flash_v2_5 voiceSettings: stability: 0.5 similarity_boost: 0.75 speed: 1.0 ``` ### Speech-to-Text (STT) Transcribe audio with speaker diarization and accuracy metrics: - `elevenlabs:stt` - Speech-to-text transcription **Features:** - Speaker diarization (identify multiple speakers) - Word Error Rate (WER) calculation - Multiple language support **Example:** ```yaml providers: - id: elevenlabs:stt config: modelId: scribe_v1 diarization: true maxSpeakers: 3 ``` ### Conversational Agents Test voice AI agents with LLM backends and evaluation criteria: - `elevenlabs:agents` - Voice AI agent testing **Features:** - Multi-turn conversation simulation - Automated evaluation criteria - Tool calling and mocking - LLM cascading for cost optimization - Custom LLM endpoints - Multi-voice conversations - Phone integration (Twilio, SIP) **Example:** ```yaml providers: - id: elevenlabs:agents config: agentConfig: name: Customer Support Agent prompt: You are a helpful support agent voiceId: 21m00Tcm4TlvDq8ikWAM llmModel: gpt-4o evaluationCriteria: - name: helpfulness description: Agent provides helpful responses weight: 1.0 passingThreshold: 0.8 ``` ### Supporting APIs Additional audio processing capabilities: - `elevenlabs:history` - Retrieve agent conversation history - `elevenlabs:isolation` - Remove background noise from audio - `elevenlabs:alignment` - Generate time-aligned subtitles ## Configuration Parameters All providers support these common parameters: | Parameter | Description | | --------------- | ------------------------------------------------- | | `apiKey` | Your ElevenLabs API key | | `apiKeyEnvar` | Environment variable containing the API key | | `baseUrl` | Custom base URL for API (default: ElevenLabs API) | | `timeout` | Request timeout in milliseconds | | `cache` | Enable response caching | | `cacheTTL` | Cache time-to-live in seconds | | `enableLogging` | Enable debug logging | | `retries` | Number of retry attempts for failed requests | ### TTS-Specific Parameters | Parameter | Description | | ------------------------- | ----------------------------------------------------------- | | `modelId` | TTS model (e.g., `eleven_flash_v2_5`) | | `voiceId` | Voice ID or name (e.g., `21m00Tcm4TlvDq8ikWAM` or `rachel`) | | `voiceSettings` | Voice customization (stability, similarity, style, speed) | | `outputFormat` | Audio format (e.g., `mp3_44100_128`, `pcm_44100`) | | `seed` | Seed for deterministic output | | `streaming` | Enable WebSocket streaming for low latency | | `pronunciationDictionary` | Custom pronunciation rules | | `voiceDesign` | Generate voice from text description | | `voiceRemix` | Modify voice characteristics (gender, accent, age) | ### STT-Specific Parameters | Parameter | Description | | ------------- | ------------------------------------------ | | `modelId` | STT model (default: `scribe_v1`) | | `language` | ISO 639-1 language code (e.g., `en`, `es`) | | `diarization` | Enable speaker diarization | | `maxSpeakers` | Expected number of speakers (hint) | | `audioFormat` | Input audio format | ### Agent-Specific Parameters | Parameter | Description | | -------------------- | ----------------------------------------- | | `agentId` | Use existing agent ID | | `agentConfig` | Ephemeral agent configuration | | `simulatedUser` | Automated user simulation settings | | `evaluationCriteria` | Evaluation criteria for agent performance | | `toolMockConfig` | Mock tool responses for testing | | `maxTurns` | Maximum conversation turns (default: 10) | | `llmCascade` | LLM fallback configuration | | `customLLM` | Custom LLM endpoint configuration | | `mcpConfig` | Model Context Protocol integration | | `multiVoice` | Multi-voice conversation configuration | | `postCallWebhook` | Webhook notification after conversation | | `phoneConfig` | Twilio or SIP phone integration | ## Examples ### Text-to-Speech: Voice Comparison ```yaml prompts: - 'Welcome to ElevenLabs. Our AI voice technology delivers natural-sounding speech.' providers: - id: elevenlabs:tts:rachel config: modelId: eleven_flash_v2_5 - id: elevenlabs:tts:clyde config: modelId: eleven_turbo_v2_5 tests: - description: Audio generation succeeds assert: - type: cost threshold: 0.01 - type: latency threshold: 5000 ``` ### Speech-to-Text: Accuracy Testing ```yaml prompts: - file://audio/test-recording.mp3 providers: - id: elevenlabs:stt config: diarization: true tests: - description: WER is acceptable assert: - type: javascript value: | const result = JSON.parse(output); return result.wer < 0.05; // Less than 5% error ``` ### Conversational Agents: Evaluation ```yaml prompts: - | User: I need help with my order Agent: I'd be happy to help! What's your order number? User: ORDER-12345 providers: - id: elevenlabs:agents config: agentConfig: prompt: You are a helpful customer support agent llmModel: gpt-4o evaluationCriteria: - name: greeting weight: 0.8 passingThreshold: 0.8 - name: understanding weight: 1.0 passingThreshold: 0.9 tests: - description: Agent meets evaluation criteria assert: - type: javascript value: | const result = JSON.parse(output); const passed = result.analysis.evaluation_criteria_results.filter(r => r.passed); return passed.length >= 2; ``` ### Audio Processing: Pipeline ```yaml # 1. Remove noise from audio providers: - id: elevenlabs:isolation # 2. Transcribe cleaned audio providers: - id: elevenlabs:stt # 3. Generate subtitles providers: - id: elevenlabs:alignment ``` ## Advanced Features ### Pronunciation Dictionaries Customize pronunciation for technical terms: ```yaml providers: - id: elevenlabs:tts:rachel config: pronunciationDictionary: - word: 'API' pronunciation: 'A P I' - word: 'OAuth' phoneme: 'əʊɔːθ' ``` ### Voice Design Generate custom voices from descriptions: ```yaml providers: - id: elevenlabs:tts config: voiceDesign: name: Custom Voice description: A middle-aged American male with a deep, authoritative tone gender: male age: middle_aged accent: american ``` ### LLM Cascading Optimize costs with automatic fallback: ```yaml providers: - id: elevenlabs:agents config: llmCascade: primary: gpt-4o fallback: - gpt-4o-mini - gpt-3.5-turbo cascadeOnError: true cascadeOnLatency: enabled: true maxLatencyMs: 5000 ``` ### Multi-voice Conversations Different voices for different characters: ```yaml providers: - id: elevenlabs:agents config: multiVoice: characters: - name: Agent voiceId: 21m00Tcm4TlvDq8ikWAM role: Customer support representative - name: Customer voiceId: 2EiwWnXFnvU5JabPnv8n role: Customer seeking help ``` ### Phone Integration Test agents with real phone calls: ```yaml providers: - id: elevenlabs:agents config: phoneConfig: provider: twilio twilioAccountSid: ${TWILIO_ACCOUNT_SID} twilioAuthToken: ${TWILIO_AUTH_TOKEN} twilioPhoneNumber: +1234567890 ``` ## Cost Tracking ElevenLabs usage is tracked automatically: **TTS Costs:** - Flash v2.5: ~$0.015 per 1,000 characters - Turbo v2.5: ~$0.02 per 1,000 characters - Multilingual v2: ~$0.03 per 1,000 characters **STT Costs:** - ~$0.10 per minute of audio **Agent Costs:** - Based on conversation duration (~$0.10-0.50 per minute depending on LLM) **Supporting API Costs:** - Audio Isolation: ~$0.10 per minute - Forced Alignment: ~$0.05 per minute View costs in eval results: ```yaml tests: - assert: - type: cost threshold: 0.50 # Max $0.50 per test ``` ## Popular Voices Common voice IDs and names: | Name | ID | Description | | ------ | -------------------- | ------------------ | | Rachel | 21m00Tcm4TlvDq8ikWAM | Calm, clear female | | Clyde | 2EiwWnXFnvU5JabPnv8n | Warm male | | Drew | 29vD33N1CtxCmqQRPOHJ | Well-rounded male | | Paul | 5Q0t7uMcjvnagumLfvZi | Casual male | | Domi | AZnzlk1XvdvUeBnXmlld | Energetic female | | Bella | EXAVITQu4vr4xnSDxMaL | Expressive female | | Antoni | ErXwobaYiN019PkySvjV | Deep male | | Elli | MF3mGyEYCl7XYWbV9V6O | Young female | ## Common Workflows ### Voice Quality Testing Compare voice quality across models and voices: ```yaml prompts: - 'The quick brown fox jumps over the lazy dog. This sentence contains every letter of the alphabet.' providers: - id: flash-model label: Flash Model (Fastest) config: modelId: eleven_flash_v2_5 voiceId: rachel - id: turbo-model label: Turbo Model (Best Quality) config: modelId: eleven_turbo_v2_5 voiceId: rachel tests: - description: Flash model completes quickly provider: flash-model assert: - type: latency threshold: 1000 - description: Turbo model has better quality provider: turbo-model assert: - type: cost threshold: 0.01 ``` ### Transcription Accuracy Pipeline Test end-to-end TTS → STT accuracy: ```yaml prompts: - | The meeting is scheduled for Thursday at 2 PM in conference room B. Please bring your laptop and quarterly report. providers: - id: tts-generator label: elevenlabs:tts:rachel config: modelId: eleven_flash_v2_5 - id: stt-transcriber label: elevenlabs:stt config: calculateWER: true tests: - vars: referenceText: 'The meeting is scheduled for Thursday at 2 PM in conference room B. Please bring your laptop and quarterly report.' assert: - type: javascript value: | const result = JSON.parse(output); if (result.wer_result) { return result.wer_result.wer < 0.03; // Less than 3% error } return true; ``` ### Agent Regression Testing Ensure agent improvements don't degrade performance: ```yaml prompts: - | User: I need to cancel my subscription User: Yes, I'm sure User: Account email is user@example.com providers: - id: elevenlabs:agents config: agentConfig: prompt: You are a customer service agent. Always confirm cancellations. llmModel: gpt-4o evaluationCriteria: - name: confirmation_requested description: Agent asks for confirmation before canceling weight: 1.0 passingThreshold: 0.9 - name: professional_tone description: Agent maintains professional tone weight: 0.8 passingThreshold: 0.8 tests: - description: Agent handles cancellation properly assert: - type: javascript value: | const result = JSON.parse(output); const criteria = result.analysis.evaluation_criteria_results; return criteria.every(c => c.passed); ``` ## Best Practices ### 1. Choose the Right Model - **Flash v2.5**: Use for real-time applications, live streaming, or when latency is critical (<200ms) - **Turbo v2.5**: Use for high-quality pre-recorded content where quality matters more than speed - **Multilingual v2**: Use for non-English languages or when switching between languages - **Monolingual v1**: Use for English-only content requiring the highest quality ### 2. Optimize Voice Settings **For natural conversation:** ```yaml voiceSettings: stability: 0.5 # More variation similarity_boost: 0.75 speed: 1.0 ``` **For consistent narration:** ```yaml voiceSettings: stability: 0.8 # Less variation similarity_boost: 0.85 speed: 0.95 ``` **For expressiveness:** ```yaml voiceSettings: stability: 0.3 # High variation similarity_boost: 0.5 style: 0.8 # Amplify style speed: 1.1 ``` ### 3. Cost Optimization **Use caching for repeated phrases:** ```yaml providers: - id: elevenlabs:tts:rachel config: cache: true cacheTTL: 86400 # 24 hours ``` **Implement LLM cascading for agents:** ```yaml providers: - id: elevenlabs:agents config: llmCascade: primary: gpt-4o-mini # Cheaper first fallback: - gpt-4o # Better fallback cascadeOnError: true ``` **Test with shorter prompts during development:** ```yaml providers: - id: elevenlabs:tts:rachel tests: - vars: shortPrompt: 'Test' # Use during dev fullPrompt: 'Full production message' ``` ### 4. Agent Testing Strategy **Start simple, add complexity incrementally:** ```yaml # Phase 1: Basic functionality evaluationCriteria: - name: responds description: Agent responds to user weight: 1.0 # Phase 2: Add quality checks evaluationCriteria: - name: responds weight: 0.8 - name: accurate description: Response is factually correct weight: 1.0 # Phase 3: Add conversation flow evaluationCriteria: - name: responds weight: 0.6 - name: accurate weight: 1.0 - name: natural_flow description: Conversation feels natural weight: 0.8 ``` ### 5. Audio Quality Assurance **Always test on target platforms:** ```yaml providers: - id: elevenlabs:tts:rachel config: outputFormat: mp3_44100_128 # Good for web # outputFormat: pcm_44100 # Better for phone systems # outputFormat: mp3_22050_32 # Smaller files for mobile ``` **Test with diverse content:** ```yaml prompts: # Numbers and dates - 'Your appointment is on March 15th at 3:30 PM. Confirmation number: 4829.' # Technical terms - 'The API returns a JSON response with OAuth2 authentication tokens.' # Multi-language - 'Bonjour! Welcome to our multilingual support.' # Edge cases - 'Hello... um... can you hear me? Testing, 1, 2, 3.' ``` ### 6. Monitoring and Observability **Track key metrics:** ```yaml tests: - assert: # Latency thresholds - type: latency threshold: 2000 # Cost budgets - type: cost threshold: 0.50 # Quality metrics - type: javascript value: | // Track custom metrics const result = JSON.parse(output); if (result.audio) { console.log('Audio size:', result.audio.sizeBytes); console.log('Format:', result.audio.format); } return true; ``` **Use labels for organized results:** ```yaml providers: - label: v1-baseline id: elevenlabs:tts:rachel config: modelId: eleven_flash_v2_5 - label: v2-improved id: elevenlabs:tts:rachel config: modelId: eleven_flash_v2_5 voiceSettings: stability: 0.6 # Tweaked setting ``` ## Troubleshooting ### API Key Issues **Error: `ELEVENLABS_API_KEY environment variable is not set`** Solution: Ensure your API key is properly set: ```sh # Check if key is set echo $ELEVENLABS_API_KEY # Set it if missing export ELEVENLABS_API_KEY=your_key_here # Or add to your shell profile echo 'export ELEVENLABS_API_KEY=your_key' >> ~/.zshrc source ~/.zshrc ``` ### Authentication Errors **Error: `401 Unauthorized`** Solution: Verify your API key is valid: ```sh # Test API key directly curl -H "xi-api-key: $ELEVENLABS_API_KEY" https://api.elevenlabs.io/v1/voices ``` If this fails, regenerate your API key at [ElevenLabs Settings](https://elevenlabs.io/app/settings/api-keys). ### Rate Limiting **Error: `429 Too Many Requests`** Solution: Add retry logic and respect rate limits: ```yaml providers: - id: elevenlabs:tts:rachel config: retries: 3 # Retry failed requests timeout: 30000 # Allow time for retries ``` For high-volume testing, consider: - Spreading tests over time - Upgrading to a paid plan - Using caching to avoid redundant requests ### Audio File Issues **Error: `Failed to read audio file` or `Unsupported audio format`** Solution: Ensure audio files are accessible and in supported formats: ```yaml providers: - id: elevenlabs:stt config: audioFormat: mp3 # Supported: mp3, wav, flac, ogg, webm, m4a ``` Verify file exists: ```sh ls -lh /path/to/audio.mp3 file /path/to/audio.mp3 ``` ### Agent Conversation Timeouts **Error: `Conversation timeout after X turns`** Solution: Adjust conversation limits: ```yaml providers: - id: elevenlabs:agents config: maxTurns: 20 # Increase if needed timeout: 120000 # 2 minutes ``` ### Memory Issues with Large Evals **Error: `JavaScript heap out of memory`** Solution: Increase Node.js memory: ```sh export NODE_OPTIONS="--max-old-space-size=4096" promptfoo eval ``` Or run fewer concurrent tests: ```sh promptfoo eval --max-concurrency 2 ``` ### Voice Not Found **Error: `Voice ID not found`** Solution: Use correct voice ID or name: ```yaml providers: # Use official voice ID (preferred) - id: elevenlabs:tts:21m00Tcm4TlvDq8ikWAM # Or use voice name (case-sensitive) - id: elevenlabs:tts:Rachel ``` List available voices: ```sh curl -H "xi-api-key: $ELEVENLABS_API_KEY" https://api.elevenlabs.io/v1/voices ``` ### Cost Tracking Inaccuracies **Issue: Cost estimates don't match billing** Solution: Cost tracking is estimated based on: - TTS: Character count × model rate - STT: Audio duration × per-minute rate - Agents: Conversation duration × LLM rates For exact costs, check your [ElevenLabs billing dashboard](https://elevenlabs.io/app/usage). ## Examples Complete working examples: - [TTS Basic](https://github.com/promptfoo/promptfoo/tree/main/examples/provider-elevenlabs/tts) - Simple voice generation - [TTS Advanced](https://github.com/promptfoo/promptfoo/tree/main/examples/provider-elevenlabs/tts-advanced) - Voice design, streaming, pronunciation - [STT](https://github.com/promptfoo/promptfoo/tree/main/examples/provider-elevenlabs/stt) - Transcription with diarization - [Agents Basic](https://github.com/promptfoo/promptfoo/tree/main/examples/provider-elevenlabs/agents) - Simple agent testing ## Learn More ### Promptfoo Resources - [Evaluating ElevenLabs voice AI](/docs/guides/evaluate-elevenlabs/) - Step-by-step tutorial ### ElevenLabs Resources - [ElevenLabs API Documentation](https://elevenlabs.io/docs/introduction) - [Voice Library](https://elevenlabs.io/voice-library) - Browse and preview voices - [Conversational AI Docs](https://elevenlabs.io/docs/conversational-ai) - Agent setup guide - [Pricing](https://elevenlabs.io/pricing) - Plan comparison - [Status Page](https://status.elevenlabs.io/) - API status and incidents