# provider-elevenlabs/tts-advanced (ElevenLabs Advanced TTS Features) This example demonstrates advanced TTS capabilities: - **Pronunciation Dictionaries** - Custom pronunciation for technical terms - **Voice Design** - Generate voices from text descriptions - **Voice Remixing** - Modify existing voices (style, pacing, gender, age) - **Streaming with Advanced Features** - Combine streaming with pronunciation control ## Quick Start ```bash npx promptfoo@latest init --example provider-elevenlabs/tts-advanced cd provider-elevenlabs/tts-advanced export ELEVENLABS_API_KEY=your_api_key_here npx promptfoo@latest eval ``` ## Features Demonstrated ### 1. Pronunciation Dictionaries Control how technical terms, acronyms, and brand names are pronounced. **Use Case**: Technical documentation, product demos, brand-specific content ```yaml providers: - id: elevenlabs:tts config: pronunciationRules: # Spell out acronyms - word: API pronunciation: A-P-I # Custom pronunciation - word: SQL pronunciation: sequel # Multi-word terms - word: PostgreSQL pronunciation: post-gres-Q-L # Brand names - word: OpenAI pronunciation: open-A-I ``` **Common Use Cases**: 1. **Technical Content** ```yaml pronunciationRules: - word: JavaScript pronunciation: java-script - word: TypeScript pronunciation: type-script - word: Python pronunciation: pie-thon - word: Node.js pronunciation: node-jay-ess - word: GraphQL pronunciation: graph-Q-L ``` 2. **Medical/Scientific Terms** ```yaml pronunciationRules: - word: COVID-19 pronunciation: covid-nineteen - word: mRNA pronunciation: messenger-R-N-A - word: DNA pronunciation: D-N-A ``` 3. **Brand Names & Products** ```yaml pronunciationRules: - word: Anthropic pronunciation: an-throw-pick - word: Llama pronunciation: lama - word: ChatGPT pronunciation: chat-G-P-T ``` ### 2. Voice Design Generate custom voices from natural language descriptions. **Use Case**: Create unique voices for specific content types or brand identities ```yaml providers: - id: elevenlabs:tts config: voiceDesign: description: A warm, professional voice with excellent clarity and a slight smile in the tone, perfect for technical documentation gender: female age: middle_aged accent: american accentStrength: 0.5 # 0-2, subtle to strong ``` **Voice Design Templates**: #### Professional Voices ```yaml # Corporate Presenter voiceDesign: description: A confident, authoritative voice with clear articulation, perfect for business presentations gender: male age: middle_aged accent: american # Educational Instructor voiceDesign: description: A warm, patient voice with excellent clarity, ideal for educational content gender: female age: middle_aged accent: british ``` #### Friendly & Conversational ```yaml # Customer Service voiceDesign: description: A friendly, approachable voice with a smile in the tone, great for customer interactions gender: female age: young accent: american # Podcast Host voiceDesign: description: A casual, engaging voice with natural conversational flow, perfect for podcasts gender: male age: young accent: australian ``` #### Narrative & Storytelling ```yaml # Audiobook Narrator voiceDesign: description: A deep, resonant voice with storytelling quality and emotional range gender: male age: middle_aged accent: british # Meditation Guide voiceDesign: description: A soothing, tranquil voice with calming tones and gentle pacing gender: female age: middle_aged accent: american accentStrength: 0.3 ``` ### 3. Voice Remixing Modify existing voices to change their characteristics. **Use Case**: Adapt pre-made voices for different contexts or emotions ```yaml providers: # Make a voice more energetic - id: elevenlabs:tts:energetic config: voiceId: 21m00Tcm4TlvDq8ikWAM # Rachel voiceRemix: style: energetic pacing: fast promptStrength: medium # low, medium, high, max # Make a voice calmer and slower - id: elevenlabs:tts:calm config: voiceId: 21m00Tcm4TlvDq8ikWAM voiceRemix: style: calm pacing: slow promptStrength: high ``` **Remix Parameters**: | Parameter | Options | Use Case | | ---------------- | ----------------------------------------------- | ----------------------------- | | `style` | energetic, calm, professional, casual, dramatic | Match voice to content mood | | `pacing` | slow, normal, fast | Adjust speech speed | | `gender` | male, female | Change voice gender | | `age` | young, middle_aged, old | Adjust perceived age | | `accent` | american, british, australian, etc. | Change accent | | `promptStrength` | low, medium, high, max | How strongly to apply changes | **Common Remix Scenarios**: ```yaml # Sports Commentary (Energetic & Fast) voiceRemix: style: energetic pacing: fast promptStrength: max # ASMR Content (Calm & Slow) voiceRemix: style: calm pacing: slow promptStrength: high # News Anchor (Professional & Measured) voiceRemix: style: professional pacing: normal promptStrength: medium # Storytelling (Dramatic & Expressive) voiceRemix: style: dramatic pacing: normal promptStrength: high ``` ## Advanced Combinations ### Streaming + Pronunciation Combine real-time streaming with custom pronunciation: ```yaml providers: - id: elevenlabs:tts config: streaming: true pronunciationRules: - word: API pronunciation: A-P-I - word: WebSocket pronunciation: web-socket ``` **Benefits**: - ~75ms first chunk latency - Custom pronunciation for technical terms - Ideal for live demos and interactive applications ### Voice Design + Pronunciation Create a custom voice with domain-specific pronunciation: ```yaml providers: - id: elevenlabs:tts config: voiceDesign: description: A friendly tech educator with clear pronunciation gender: female age: middle_aged pronunciationRules: - word: Python pronunciation: pie-thon - word: JavaScript pronunciation: java-script ``` ## Cost Optimization All advanced features use the same character-based pricing as basic TTS: - ~$0.00002 per character (~$0.02 per 1000 characters) - Free tier: 10,000 characters/month **Cost Tracking**: ```yaml tests: - assert: - type: cost threshold: 0.05 # Max $0.05 per test ``` ## Testing Assertions ### Pronunciation Accuracy ```yaml tests: - description: Verify tech terms are included vars: expectedTerms: - API - SQL - JavaScript assert: - type: javascript value: | const terms = context.vars.expectedTerms; terms.every(term => output.includes(term)) ``` ### Voice Quality Comparison ```yaml tests: - description: Compare baseline vs custom pronunciation vars: baseline: '{{providers[0].output}}' custom: '{{providers[1].output}}' assert: - type: javascript value: | // Both should succeed !context.vars.baseline.includes('error') && !context.vars.custom.includes('error') ``` ### Latency with Advanced Features ```yaml tests: - description: Ensure advanced features don't slow generation assert: - type: latency threshold: 8000 # 8 seconds max ``` ## Real-World Use Cases ### 1. Technical Documentation ```yaml config: voiceDesign: description: Clear, professional voice for technical content gender: female age: middle_aged pronunciationRules: - word: API pronunciation: A-P-I - word: REST pronunciation: rest - word: GraphQL pronunciation: graph-Q-L - word: WebSocket pronunciation: web-socket - word: JSON pronunciation: jay-sawn - word: YAML pronunciation: yam-mel ``` ### 2. Brand-Specific Content ```yaml config: voiceId: your-brand-voice-id voiceRemix: style: professional pacing: normal pronunciationRules: - word: YourProduct pronunciation: your-product - word: YourCompany pronunciation: your-company ``` ### 3. Multi-Language Support ```yaml # English with British accent providers: - id: elevenlabs:tts:en-gb config: voiceDesign: description: British English speaker accent: british accentStrength: 1.5 # English with American accent - id: elevenlabs:tts:en-us config: voiceDesign: description: American English speaker accent: american accentStrength: 1.0 ``` ### 4. Dynamic Content Adaptation ```yaml # Morning news (Energetic) providers: - id: elevenlabs:tts:morning config: voiceId: news-anchor-voice voiceRemix: style: energetic pacing: fast # Evening news (Calm) - id: elevenlabs:tts:evening config: voiceId: news-anchor-voice voiceRemix: style: calm pacing: normal ``` ## Troubleshooting ### Voice Design Not Working ```text Error: Voice design failed ``` **Solutions**: 1. Ensure description is detailed (minimum 10 characters) 2. Specify gender and age for better results 3. Check API quota (voice design uses generation credits) ### Pronunciation Not Applied ```text Warning: Pronunciation dictionary not found ``` **Solutions**: 1. Verify pronunciation rules syntax 2. Ensure words match exactly (case-sensitive) 3. Check that you're not using both `pronunciationDictionaryId` and `pronunciationRules` ### Remix Changes Too Subtle ```text Issue: Voice sounds the same after remix ``` **Solutions**: 1. Increase `promptStrength` from medium to high or max 2. Make more significant parameter changes 3. Some voices have limited remix range - try a different base voice ## API Reference ### Pronunciation Dictionary Options | Option | Type | Description | | --------------------------- | --------------------- | ----------------------------- | | `pronunciationRules` | `PronunciationRule[]` | Array of pronunciation rules | | `pronunciationDictionaryId` | string | Use existing dictionary by ID | **PronunciationRule**: ```typescript { word: string; // Word to customize pronunciation: string; // Phonetic pronunciation phoneme?: string; // IPA/CMU phoneme (advanced) alphabet?: 'ipa' | 'cmu'; // Phonetic alphabet } ``` ### Voice Design Options ```typescript { description: string; // Natural language description gender?: 'male' | 'female'; age?: 'young' | 'middle_aged' | 'old'; accent?: string; // e.g., 'british', 'american' accentStrength?: number; // 0-2, default 1.0 sampleText?: string; // Optional sample for preview } ``` ### Voice Remix Options ```typescript { style?: string; // e.g., 'energetic', 'calm' pacing?: 'slow' | 'normal' | 'fast'; gender?: 'male' | 'female'; age?: 'young' | 'middle_aged' | 'old'; accent?: string; promptStrength?: 'low' | 'medium' | 'high' | 'max'; } ``` ## Related Examples - [Basic TTS](../tts/) - Voice comparison and basic features - [STT](../stt/) - Speech-to-Text transcription - [Streaming TTS](../tts/#streaming) - Real-time voice generation ## Resources - [ElevenLabs Voice Design Docs](https://elevenlabs.io/docs/voice-design) - [Pronunciation Dictionary Guide](https://elevenlabs.io/docs/pronunciation) - [Voice Remixing API](https://elevenlabs.io/docs/voice-remix) - [Supported Accents](https://elevenlabs.io/voice-library)