--- sidebar_label: Ollama description: "Run open-source LLMs locally using Ollama's streamlined interface for rapid prototyping and offline model evaluation" --- # Ollama The `ollama` provider is compatible with [Ollama](https://github.com/jmorganca/ollama), which enables access to Llama, Mixtral, Mistral, and more. You can use its `/api/generate` endpoint by specifying any of the following providers from the [Ollama library](https://ollama.ai/library): - `ollama:completion:llama3.2` - `ollama:completion:llama3.3` - `ollama:completion:phi4` - `ollama:completion:qwen2.5` - `ollama:completion:granite3.2` - `ollama:completion:deepcoder` - `ollama:completion:codellama` - `ollama:completion:llama2-uncensored` - ... Or, use the `/api/chat` endpoint for chat-formatted prompts: - `ollama:chat:llama3.2` - `ollama:chat:llama3.2:1b` - `ollama:chat:llama3.2:3b` - `ollama:chat:llama3.3` - `ollama:chat:llama3.3:70b` - `ollama:chat:phi4` - `ollama:chat:phi4-mini` - `ollama:chat:qwen2.5` - `ollama:chat:qwen2.5:14b` - `ollama:chat:qwen2.5:72b` - `ollama:chat:qwq:32b` - `ollama:chat:granite3.2` - `ollama:chat:granite3.2:2b` - `ollama:chat:granite3.2:8b` - `ollama:chat:deepcoder` - `ollama:chat:deepcoder:1.5b` - `ollama:chat:deepcoder:14b` - `ollama:chat:mixtral:8x7b` - `ollama:chat:mixtral:8x22b` - ... We also support the `/api/embeddings` endpoint via `ollama:embeddings:` for model-graded assertions such as [similarity](/docs/configuration/expected-outputs/similar/). Supported environment variables: - `OLLAMA_BASE_URL` - protocol, host name, and port (defaults to `http://localhost:11434`) - `OLLAMA_API_KEY` - (optional) api key that is passed as the Bearer token in the Authorization Header when calling the API - `REQUEST_TIMEOUT_MS` - request timeout in milliseconds To pass configuration options to Ollama, use the `config` key like so: ```yaml title="promptfooconfig.yaml" providers: - id: ollama:chat:llama3.3 config: num_predict: 1024 temperature: 0.7 top_p: 0.9 think: true # Enable thinking/reasoning mode (top-level API parameter) ``` You can also pass arbitrary fields directly to the Ollama API using the `passthrough` option: ```yaml title="promptfooconfig.yaml" providers: - id: ollama:chat:llama3.3 config: passthrough: keep_alive: '5m' format: 'json' # Any other Ollama API fields ``` ## Function Calling Ollama chat models that support function calling (like Llama 3.1, Llama 3.3, Qwen, and others) can use tools with the `tools` config: ```yaml title="promptfooconfig.yaml" prompts: - 'What is the weather like in {{city}}?' providers: - id: ollama:chat:llama3.3 config: tools: - type: function function: name: get_current_weather description: Get the current weather in a given location parameters: type: object properties: location: type: string description: City and state, e.g. San Francisco, CA unit: type: string enum: [celsius, fahrenheit] required: [location] tests: - vars: city: Boston assert: - type: is-valid-openai-tools-call ``` ## Using Ollama as a Local Grading Provider ### Using Ollama for Model-Graded Assertions Ollama can be used as a local grading provider for assertions that require language model evaluation. When you have tests that use both text-based assertions (like `llm-rubric`, `answer-relevance`) and embedding-based assertions (like `similar`), you can configure different Ollama models for each type: ```yaml title="promptfooconfig.yaml" defaultTest: options: provider: # Text provider for llm-rubric, answer-relevance, factuality, etc. text: id: ollama:chat:gemma3:27b config: temperature: 0.1 # Embedding provider for similarity assertions embedding: id: ollama:embeddings:nomic-embed-text config: # embedding-specific config if needed providers: - ollama:chat:llama3.3 - ollama:chat:qwen2.5:14b tests: - vars: question: 'What is the capital of France?' assert: # Uses the text provider (gemma3:27b) - type: llm-rubric value: 'The answer correctly identifies Paris as the capital' # Uses the embedding provider (nomic-embed-text) - type: similar value: 'Paris is the capital city of France' threshold: 0.85 ``` When running with `--max-concurrency 1` and no per-eval timeout, Promptfoo groups eligible model-graded assertion calls by grading provider ID to reduce local model switching. This is not request batching; each assertion call still runs separately, and report row order is unchanged. ### Using Ollama Embedding Models for Similarity Assertions Ollama's embedding models can be used with the `similar` assertion to check semantic similarity between outputs and expected values: ```yaml title="promptfooconfig.yaml" providers: - ollama:chat:llama3.2 defaultTest: assert: - type: similar value: 'The expected response should explain the concept clearly' threshold: 0.8 # Override the default embedding provider to use Ollama provider: ollama:embeddings:nomic-embed-text tests: - vars: question: 'What is photosynthesis?' assert: - type: similar value: 'Photosynthesis is the process by which plants convert light energy into chemical energy' threshold: 0.85 ``` You can also set the embedding provider globally for all similarity assertions: ```yaml title="promptfooconfig.yaml" defaultTest: options: provider: embedding: id: ollama:embeddings:nomic-embed-text assert: - type: similar value: 'Expected semantic content' threshold: 0.75 providers: - ollama:chat:llama3.2 tests: # Your test cases here ``` Popular Ollama embedding models include: - `ollama:embeddings:nomic-embed-text` - General purpose embeddings - `ollama:embeddings:mxbai-embed-large` - High-quality embeddings - `ollama:embeddings:all-minilm` - Lightweight, fast embeddings ## Using a Remote Ollama Server To connect to Ollama running on another machine (e.g., a more powerful server on your local network), set `OLLAMA_BASE_URL` to the remote address: ```bash export OLLAMA_BASE_URL="http://192.168.1.100:11434" ``` Or in a `.env` file: ``` OLLAMA_BASE_URL=http://192.168.1.100:11434 ``` ```bash promptfoo eval -c promptfooconfig.yaml --env-file .env ``` Make sure the Ollama server is listening on `0.0.0.0` so it accepts remote connections. For Docker Compose, this is typically the default. If running Ollama directly, set `OLLAMA_HOST=0.0.0.0:11434` before starting the server. ## `localhost` and IPv4 vs IPv6 If locally developing with `localhost` (promptfoo's default), and Ollama API calls are failing with `ECONNREFUSED`, then there may be an IPv4 vs IPv6 issue going on with `localhost`. Ollama's default host uses [`127.0.0.1`](https://github.com/jmorganca/ollama/blob/main/api/client.go#L19), which is an IPv4 address. The possible issue here arises from `localhost` being bound to an IPv6 address, as configured by the operating system's `hosts` file. To investigate and fix this issue, there's a few possible solutions: 1. Change Ollama server to use IPv6 addressing by running `export OLLAMA_HOST=":11434"` before starting the Ollama server. Note this IPv6 support requires Ollama version `0.0.20` or newer. 2. Change promptfoo to directly use an IPv4 address by configuring `export OLLAMA_BASE_URL="http://127.0.0.1:11434"`. 3. Update your OS's [`hosts`]() file to bind `localhost` to IPv4. ## Evaluating models serially By default, promptfoo evaluates all providers concurrently for each prompt. However, you can run evaluations serially using the `-j 1` option: ```bash promptfoo eval -j 1 ``` This sets concurrency to 1, which means: 1. Evaluations happen one provider at a time, then one prompt at a time. 2. Only one model is loaded into memory, conserving system resources. 3. You can easily swap models between evaluations without conflicts. This approach is particularly useful for: - Local setups with limited RAM - Testing multiple resource-intensive models - Debugging provider-specific issues