--- sidebar_label: Replicate description: "Deploy and run open-source AI models in the cloud using Replicate's scalable API for image, text, and audio generation" --- # Replicate Replicate is an API for machine learning models. It currently hosts models like [Llama v2](https://replicate.com/replicate/llama70b-v2-chat), [Gemma](https://replicate.com/google-deepmind/gemma-7b-it), and [Mistral/Mixtral](https://replicate.com/mistralai/mixtral-8x7b-instruct-v0.1). :::info The Replicate provider in promptfoo uses direct HTTP requests to the Replicate API, so no additional SDK installation is required. ::: To run a model, specify the Replicate model name and optionally the version: ``` # With specific version (recommended for consistency) replicate:replicate/llama70b-v2-chat:e951f18578850b652510200860fc4ea62b3b16fac280f83ff32282f87bbd2e48 # Without version (uses latest) replicate:meta/meta-llama-3-8b-instruct ``` :::tip For production use, always specify the version to ensure consistent results. You can find version IDs on the model's page on Replicate. ::: ## Examples Here's an example of using Llama on Replicate. In the case of Llama, the version hash and everything under `config` is optional: ```yaml providers: - id: replicate:meta/llama-2-7b-chat config: temperature: 0.01 max_length: 1024 prompt: prefix: '[INST] ' suffix: ' [/INST]' ``` Here's an example of using Gemma on Replicate. Note that unlike Llama, it does not have a default version, so we specify the model version: ```yaml providers: - id: replicate:google-deepmind/gemma-7b-it:2790a695e5dcae15506138cc4718d1106d0d475e6dca4b1d43f42414647993d5 config: temperature: 0.01 max_new_tokens: 1024 prompt: prefix: "user\n" suffix: "\nmodel" ``` ## Configuration The Replicate provider supports several [configuration options](https://github.com/promptfoo/promptfoo/blob/main/src/providers/replicate.ts#L9-L17) that can be used to customize the behavior of the models, like so: | Parameter | Description | | -------------------- | ------------------------------------------------------------- | | `temperature` | Controls randomness in the generation process. | | `max_length` | Specifies the maximum length of the generated text. | | `max_new_tokens` | Limits the number of new tokens to generate. | | `top_p` | Nucleus sampling: a float between 0 and 1. | | `top_k` | Top-k sampling: number of highest probability tokens to keep. | | `repetition_penalty` | Penalizes repetition of words in the generated text. | | `system_prompt` | Sets a system-level prompt for all requests. | | `stop_sequences` | Specifies stopping sequences that halt the generation. | | `seed` | Sets a seed for reproducible results. | :::warning Not every model supports every completion parameter. Be sure to review the API provided by the model beforehand. ::: These parameters are supported for all models: | Parameter | Description | | --------------- | ------------------------------------------------------------------------ | | `apiKey` | The API key for authentication with Replicate. | | `prompt.prefix` | String added before each prompt. Useful for instruction/chat formatting. | | `prompt.suffix` | String added after each prompt. Useful for instruction/chat formatting. | Supported environment variables: - `REPLICATE_API_TOKEN` - Your Replicate API key. - `REPLICATE_API_KEY` - An alternative to `REPLICATE_API_TOKEN` for your API key. - `REPLICATE_MAX_LENGTH` - Specifies the maximum length of the generated text. - `REPLICATE_TEMPERATURE` - Controls randomness in the generation process. - `REPLICATE_REPETITION_PENALTY` - Penalizes repetition of words in the generated text. - `REPLICATE_TOP_P` - Controls the nucleus sampling: a float between 0 and 1. - `REPLICATE_TOP_K` - Controls the top-k sampling: the number of highest probability vocabulary tokens to keep for top-k-filtering. - `REPLICATE_SEED` - Sets a seed for reproducible results. - `REPLICATE_STOP_SEQUENCES` - Specifies stopping sequences that halt the generation. - `REPLICATE_SYSTEM_PROMPT` - Sets a system-level prompt for all requests. ## Images Image generators such as SDXL can be used like so: ```yaml prompts: - 'Generate an image: {{subject}}' providers: - id: replicate:image:stability-ai/sdxl:7762fd07cf82c948538e41f63f77d685e02b063e37e496e96eefd46c929f9bdc config: width: 768 height: 768 num_inference_steps: 50 tests: - vars: subject: fruit loops ``` ## Supported Parameters for Images These parameters are supported for image generation models: | Parameter | Description | | --------------------- | ------------------------------------------------------------- | | `width` | The width of the generated image. | | `height` | The height of the generated image. | | `refine` | Which refine style to use | | `apply_watermark` | Apply a watermark to the generated image. | | `num_inference_steps` | The number of inference steps to use during image generation. | :::warning Not every model supports every image parameter. Be sure to review the API provided by the model beforehand. ::: Supported environment variables for images: - `REPLICATE_API_TOKEN` - Your Replicate API key. - `REPLICATE_API_KEY` - An alternative to `REPLICATE_API_TOKEN` for your API key. :::warning **Important:** Replicate image URLs are temporary and typically expire after 24 hours. If you need to preserve generated images, download them immediately or use the automated download hook described below. ::: ## Downloading Generated Images Since Replicate image URLs expire, you may want to automatically download and save images during evaluation. You can use an `afterEach` hook for this purpose: Create a file `save-images.js`: ```javascript const fs = require('fs'); const path = require('path'); // For Node >= 20, fetch is available globally const { fetch } = globalThis; /** * Downloads and saves Replicate generated images after each test */ module.exports = { async hook(hookName, context) { // Only run for afterEach hook and when we have an output if (hookName !== 'afterEach') { return; } // Extract URL from markdown image format const output = context.result?.response?.output; if (!output || typeof output !== 'string') { return; } const match = output.match(/!\[.*?\]\((.*?)\)/); const imageUrl = match?.[1]; if (!imageUrl || !imageUrl.includes('replicate.delivery')) { return; } try { // Create images directory if it doesn't exist const imagesDir = path.join(__dirname, 'images'); await fs.promises.mkdir(imagesDir, { recursive: true }); // Generate filename from test description and timestamp const testDesc = context.test.description || 'unnamed'; const timestamp = new Date().toISOString().replace(/[:.]/g, '-'); const sanitizedName = testDesc .replace(/[^a-z0-9\s-]/gi, '') .trim() .replace(/\s+/g, '-') .toLowerCase(); const filename = `${sanitizedName}-${timestamp}.png`; const filepath = path.join(imagesDir, filename); // Download and save the image const response = await fetch(imageUrl); if (!response.ok) { throw new Error(`HTTP error: ${response.status}`); } const buffer = await response.arrayBuffer(); await fs.promises.writeFile(filepath, Buffer.from(buffer)); console.log(`✓ Saved image: ${filename}`); } catch (error) { console.error(`❌ Failed to save image: ${error.message}`); } }, }; ``` Then reference it in your promptfoo configuration: ```yaml # yaml-language-server: $schema=https://promptfoo.dev/config-schema.json extensions: - file://save-images.js:hook prompts: - 'Generate an image: {{subject}}' providers: - replicate:image:black-forest-labs/flux-dev tests: - vars: subject: a beautiful sunset over mountains ``` This hook will automatically download all generated images to an `images/` directory with descriptive filenames based on the test description and timestamp.