--- sidebar_label: Cerebras description: Configure Cerebras' Llama 4 Scout and Llama 3 models through their OpenAI-compatible API for enterprise-grade inference with advanced MoE architecture support --- # Cerebras This provider enables you to use Cerebras models through their [Inference API](https://docs.cerebras.ai). Cerebras offers an OpenAI-compatible API for various large language models including Llama models, DeepSeek, and more. You can use it as a drop-in replacement for applications currently using the [OpenAI API](/docs/providers/openai/) chat endpoints. ## Setup Generate an API key from the Cerebras platform. Then set the `CEREBRAS_API_KEY` environment variable or pass it via the `apiKey` configuration field. ```bash export CEREBRAS_API_KEY=your_api_key_here ``` Or in your config: ```yaml providers: - id: cerebras:llama3.1-8b config: apiKey: your_api_key_here ``` ## Provider Format The Cerebras provider uses a simple format: - `cerebras:` - Using the chat completion interface for all models ## Available Models The Cerebras Inference API officially supports these models: - `llama-4-scout-17b-16e-instruct` - Llama 4 Scout 17B model with 16 expert MoE - `llama3.1-8b` - Llama 3.1 8B model - `llama-3.3-70b` - Llama 3.3 70B model - `deepSeek-r1-distill-llama-70B` (private preview) To get the current list of available models, use the `/models` endpoint: ```bash curl https://api.cerebras.ai/v1/models -H "Authorization: Bearer your_api_key_here" ``` ## Parameters The provider accepts standard OpenAI chat parameters: - `temperature` - Controls randomness (0.0 to 1.5) - `max_completion_tokens` - Maximum number of tokens to generate - `top_p` - Nucleus sampling parameter - `stop` - Sequences where the API will stop generating further tokens - `seed` - Seed for deterministic generation - `response_format` - Controls the format of the model response (e.g., for JSON output) - `logprobs` - Whether to return log probabilities of the output tokens ## Advanced Capabilities ### Structured Outputs Cerebras models support structured outputs with JSON schema enforcement to ensure your AI-generated responses follow a consistent, predictable format. This makes it easier to build reliable applications that can process AI outputs programmatically. To use structured outputs, set the `response_format` parameter to include a JSON schema: ```yaml providers: - id: cerebras:llama-4-scout-17b-16e-instruct config: response_format: type: 'json_schema' json_schema: name: 'movie_schema' strict: true schema: type: 'object' properties: title: { 'type': 'string' } director: { 'type': 'string' } year: { 'type': 'integer' } required: ['title', 'director', 'year'] additionalProperties: false ``` Alternatively, you can use simple JSON mode by setting `response_format` to `{"type": "json_object"}`. ### Tool Use Cerebras models support tool use (function calling), enabling LLMs to programmatically execute specific tasks. To use this feature, define the tools the model can use: ```yaml providers: - id: cerebras:llama-4-scout-17b-16e-instruct config: tools: - type: 'function' function: name: 'calculate' description: 'A calculator that can perform basic arithmetic operations' parameters: type: 'object' properties: expression: type: 'string' description: 'The mathematical expression to evaluate' required: ['expression'] strict: true ``` When using tool calling, you'll need to process the model's response and handle any tool calls it makes, then provide the results back to the model for the final response. ## Example Configuration ```yaml # yaml-language-server: $schema=https://promptfoo.dev/config-schema.json description: Cerebras model evaluation prompts: - You are an expert in {{topic}}. Explain {{question}} in simple terms. providers: - id: cerebras:llama3.1-8b config: temperature: 0.7 max_completion_tokens: 1024 - id: cerebras:llama-3.3-70b config: temperature: 0.7 max_completion_tokens: 1024 tests: - vars: topic: quantum computing question: Explain quantum entanglement in simple terms assert: - type: contains-any value: ['entangled', 'correlated', 'quantum state'] - vars: topic: machine learning question: What is the difference between supervised and unsupervised learning? assert: - type: contains value: 'labeled data' ``` ## See Also - [OpenAI Provider](/docs/providers/openai) - Compatible API format used by Cerebras - [Configuration Reference](/docs/configuration/reference.md) - Full configuration options for providers - [Cerebras API Documentation](https://docs.cerebras.ai) - Official API reference - [Cerebras Structured Outputs Guide](https://docs.cerebras.ai/capabilities/structured-outputs/) - Learn more about JSON schema enforcement - [Cerebras Tool Use Guide](https://docs.cerebras.ai/capabilities/tool-use/) - Learn more about tool calling capabilities