---
sidebar_label: OpenLLM
description: "Deploy and serve open-source LLMs efficiently using BentoML's OpenLLM framework for production-ready model inference"
---

# OpenLLM

To use [OpenLLM](https://github.com/bentoml/OpenLLM) with promptfoo, we take advantage of OpenLLM's support for [OpenAI-compatible endpoint](https://colab.research.google.com/github/bentoml/OpenLLM/blob/main/examples/openllm-llama2-demo/openllm_llama2_demo.ipynb#scrollTo=0G5clTYV_M8J&line=3&uniqifier=1).

1. Start the server using the `openllm start` command.

2. Set environment variables:
   - Set `OPENAI_BASE_URL` to `http://localhost:8001/v1`
   - Set `OPENAI_API_KEY` to a dummy value `foo`.

3. Depending on your use case, use the `chat` or `completion` model types.

   **Chat format example**:
   To run a Llama2 eval using chat-formatted prompts, first start the model:

   ```sh
   openllm start llama --model-id meta-llama/Llama-2-7b-chat-hf
   ```

   Then set the promptfoo configuration:

   ```yaml
   providers:
     - openai:chat:llama2
   ```

   **Completion format example**:
   To run a Flan eval using completion-formatted prompts, first start the model:

   ```sh
   openllm start flan-t5 --model-id google/flan-t5-large
   ```

   Then set the promptfoo configuration:

   ```yaml
   providers:
     - openai:completion:flan-t5
   ```

4. See [OpenAI provider documentation](/docs/providers/openai) for more details.