--- description: Use Nscale Serverless Inference API with promptfoo for cost-effective AI model evaluation and testing --- # Nscale The Nscale provider enables you to use [Nscale's Serverless Inference API](https://nscale.com/serverless) models with promptfoo. Nscale offers cost-effective AI inference with up to 80% savings compared to other providers, zero rate limits, and no cold starts. ## Setup Set your Nscale service token as an environment variable: ```bash export NSCALE_SERVICE_TOKEN=your_service_token_here ``` Alternatively, you can add it to your `.env` file: ```env NSCALE_SERVICE_TOKEN=your_service_token_here ``` ### Obtaining Credentials You can obtain service tokens by: 1. Signing up at [Nscale](https://nscale.com/) 2. Navigating to your account settings 3. Going to "Service Tokens" section ## Configuration To use Nscale models in your promptfoo configuration, use the `nscale:` prefix followed by the model name: ```yaml providers: - nscale:openai/gpt-oss-120b - nscale:meta/llama-3.3-70b-instruct - nscale:qwen/qwen-3-235b-a22b-instruct ``` ## Model Types Nscale supports different types of models through specific endpoint formats: ### Chat Completion Models (Default) For chat completion models, you can use either format: ```yaml providers: - nscale:chat:openai/gpt-oss-120b - nscale:openai/gpt-oss-120b # Defaults to chat ``` ### Completion Models For text completion models: ```yaml providers: - nscale:completion:openai/gpt-oss-20b ``` ### Embedding Models For embedding models: ```yaml providers: - nscale:embedding:qwen/qwen3-embedding-8b - nscale:embeddings:qwen/qwen3-embedding-8b # Alternative format ``` ## Popular Models Nscale offers a wide range of popular AI models: ### Text Generation Models | Model | Provider Format | Use Case | | ----------------------------- | ----------------------------------------------- | ----------------------------------- | | GPT OSS 120B | `nscale:openai/gpt-oss-120b` | General-purpose reasoning and tasks | | GPT OSS 20B | `nscale:openai/gpt-oss-20b` | Lightweight general-purpose model | | Qwen 3 235B Instruct | `nscale:qwen/qwen-3-235b-a22b-instruct` | Large-scale language understanding | | Qwen 3 235B Instruct 2507 | `nscale:qwen/qwen-3-235b-a22b-instruct-2507` | Latest Qwen 3 235B variant | | Qwen 3 4B Thinking 2507 | `nscale:qwen/qwen-3-4b-thinking-2507` | Reasoning and thinking tasks | | Qwen 3 8B | `nscale:qwen/qwen-3-8b` | Mid-size general-purpose model | | Qwen 3 14B | `nscale:qwen/qwen-3-14b` | Enhanced reasoning capabilities | | Qwen 3 32B | `nscale:qwen/qwen-3-32b` | Large-scale reasoning and analysis | | Qwen 2.5 Coder 3B Instruct | `nscale:qwen/qwen-2.5-coder-3b-instruct` | Lightweight code generation | | Qwen 2.5 Coder 7B Instruct | `nscale:qwen/qwen-2.5-coder-7b-instruct` | Code generation and programming | | Qwen 2.5 Coder 32B Instruct | `nscale:qwen/qwen-2.5-coder-32b-instruct` | Advanced code generation | | Qwen QwQ 32B | `nscale:qwen/qwq-32b` | Specialized reasoning model | | Llama 3.3 70B Instruct | `nscale:meta/llama-3.3-70b-instruct` | High-quality instruction following | | Llama 3.1 8B Instruct | `nscale:meta/llama-3.1-8b-instruct` | Efficient instruction following | | Llama 4 Scout 17B | `nscale:meta/llama-4-scout-17b-16e-instruct` | Image-Text-to-Text capabilities | | DeepSeek R1 Distill Llama 70B | `nscale:deepseek/deepseek-r1-distill-llama-70b` | Efficient reasoning model | | DeepSeek R1 Distill Llama 8B | `nscale:deepseek/deepseek-r1-distill-llama-8b` | Lightweight reasoning model | | DeepSeek R1 Distill Qwen 1.5B | `nscale:deepseek/deepseek-r1-distill-qwen-1.5b` | Ultra-lightweight reasoning | | DeepSeek R1 Distill Qwen 7B | `nscale:deepseek/deepseek-r1-distill-qwen-7b` | Compact reasoning model | | DeepSeek R1 Distill Qwen 14B | `nscale:deepseek/deepseek-r1-distill-qwen-14b` | Mid-size reasoning model | | DeepSeek R1 Distill Qwen 32B | `nscale:deepseek/deepseek-r1-distill-qwen-32b` | Large reasoning model | | Devstral Small 2505 | `nscale:mistral/devstral-small-2505` | Code generation and development | | Mixtral 8x22B Instruct | `nscale:mistral/mixtral-8x22b-instruct-v0.1` | Large mixture-of-experts model | ### Embedding Models | Model | Provider Format | Use Case | | ------------------- | ------------------------------------------ | ------------------------------ | | Qwen 3 Embedding 8B | `nscale:embedding:Qwen/Qwen3-Embedding-8B` | Text embeddings and similarity | ### Text-to-Image Models | Model | Provider Format | Use Case | | --------------------- | ------------------------------------------------------- | ----------------------------- | | Flux.1 Schnell | `nscale:image:BlackForestLabs/FLUX.1-schnell` | Fast image generation | | Stable Diffusion XL | `nscale:image:stabilityai/stable-diffusion-xl-base-1.0` | High-quality image generation | | SDXL Lightning 4-step | `nscale:image:ByteDance/SDXL-Lightning-4step` | Ultra-fast image generation | | SDXL Lightning 8-step | `nscale:image:ByteDance/SDXL-Lightning-8step` | Balanced speed and quality | ## Configuration Options Nscale supports standard OpenAI-compatible parameters: ```yaml providers: - id: nscale:openai/gpt-oss-120b config: temperature: 0.7 max_tokens: 1024 top_p: 0.9 frequency_penalty: 0.1 presence_penalty: 0.2 stop: ['END', 'STOP'] stream: true ``` ### Supported Parameters - `temperature`: Controls randomness (0.0 to 2.0) - `max_tokens`: Maximum number of tokens to generate - `top_p`: Nucleus sampling parameter - `frequency_penalty`: Reduces repetition based on frequency - `presence_penalty`: Reduces repetition based on presence - `stop`: Stop sequences to halt generation - `stream`: Enable streaming responses - `seed`: Deterministic sampling seed ## Example Configuration Here's a complete example configuration: ```yaml providers: - id: nscale-gpt-oss config: temperature: 0.7 max_tokens: 512 - id: nscale-llama config: temperature: 0.5 max_tokens: 1024 prompts: - 'Explain {{concept}} in simple terms' - 'What are the key benefits of {{concept}}?' tests: - vars: concept: quantum computing assert: - type: contains value: 'quantum' - type: llm-rubric value: 'Explanation should be clear and accurate' ``` ## Pricing Nscale offers highly competitive pricing: - **Text Generation**: Starting from $0.01 input / $0.03 output per 1M tokens - **Embeddings**: $0.04 per 1M tokens - **Image Generation**: Starting from $0.0008 per mega-pixel For the most current pricing information, visit [Nscale's pricing page](https://docs.nscale.com/pricing). ## Key Features - **Cost-Effective**: Up to 80% savings compared to other providers - **Zero Rate Limits**: No throttling or request limits - **No Cold Starts**: Instant response times - **Serverless**: No infrastructure management required - **OpenAI Compatible**: Standard API interface - **Global Availability**: Low-latency inference worldwide ## Error Handling The Nscale provider includes built-in error handling for common issues: - Network timeouts and retries - Rate limiting (though Nscale has zero rate limits) - Invalid API key errors - Model availability issues ## Support For support with the Nscale provider: - [Nscale Documentation](https://docs.nscale.com/) - [Nscale Community Discord](https://discord.gg/nscale) - [promptfoo GitHub Issues](https://github.com/promptfoo/promptfoo/issues)