---
sidebar_label: Looper
description: Automate LLM testing in CI/CD by integrating Promptfoo with Looper workflows. Configure quality gates, caching, and multi-environment evaluations for production AI pipelines.
---

# Setting up Promptfoo with Looper

This guide shows you how to integrate **Promptfoo** evaluations into a Looper CI/CD workflow so that every pull‑request (and optional nightly job) automatically runs your prompt tests.

## Prerequisites

- A working Looperinstallation with workflow execution enabled
- A build image (or declared tools) that provides **Node 22+** and **jq 1.6+**
- `promptfooconfig.yaml` and your prompt fixtures (`prompts/**/*.json`) committed to the repository

## Create `.looper.yml`

Add the following file to the root of your repo:

```yaml
language: workflow # optional but common

tools:
  nodejs: 22 # Looper provisions Node.js
  jq: 1.7

envs:
  global:
    variables:
      PROMPTFOO_CACHE_PATH: '${HOME}/.promptfoo/cache'

triggers:
  - pr # run on every pull‑request
  - manual: 'Nightly Prompt Tests' # manual button in UI
    call: nightly # invokes the nightly flow below

flows:
  # ---------- default PR flow ----------
  default:
    - (name Install Promptfoo) npm install -g promptfoo

    - (name Evaluate Prompts) |
      promptfoo eval \
      -c promptfooconfig.yaml \
      --prompts "prompts/**/*.json" \
      --share \
      -o output.json

    - (name Quality gate) |
      SUCC=$(jq -r '.results.stats.successes' output.json)
      FAIL=$(jq -r '.results.stats.failures' output.json)
      echo "✅ $SUCC  ❌ $FAIL"
      test "$FAIL" -eq 0 # non‑zero exit fails the build

  # ---------- nightly scheduled flow ----------
  nightly:
    - call: default # reuse the logic above
    - (name Upload artefacts) |
      aws s3 cp output.json s3://your-bucket/promptfoo/output.json
```

### How it works

| Section                 | Purpose                                                             |
| ----------------------- | ------------------------------------------------------------------- |
| `tools`                 | Declares tool versions Looper should provision.                     |
| `envs.global.variables` | Environment variables available to every step.                      |
| `triggers`              | Determines when the workflow runs (`pr`, `manual`, `cron`, etc.).   |
| `flows`                 | Ordered shell commands; execution stops on the first non‑zero exit. |

## Caching Promptfoo results

Looper lacks a first‑class cache API. Two common approaches:

1. **Persistent volume** – mount `${HOME}/.promptfoo/cache` on a reusable volume.
2. **Persistence tasks** – pull/push the cache at the start and end of the flow:

## Setting quality thresholds

```yaml
    - (name Pass‑rate gate) |
        TOTAL=$(jq '.results.stats.successes + .results.stats.failures' output.json)
        PASS=$(jq '.results.stats.successes' output.json)
        RATE=$(echo "scale=2; 100*$PASS/$TOTAL" | bc)
        echo "Pass rate: $RATE%"
        test $(echo "$RATE >= 95" | bc) -eq 1   # fail if <95 %
```

## Multi‑environment evaluations

Evaluate both staging and production configs and compare failures:

```yaml
flows:
  compare-envs:
    - (name Eval‑prod) |
      promptfoo eval \
      -c promptfooconfig.prod.yaml \
      --prompts "prompts/**/*.json" \
      -o output-prod.json

    - (name Eval‑staging) |
      promptfoo eval \
      -c promptfooconfig.staging.yaml \
      --prompts "prompts/**/*.json" \
      -o output-staging.json

    - (name Compare) |
      PROD_FAIL=$(jq '.results.stats.failures' output-prod.json)
      STAGE_FAIL=$(jq '.results.stats.failures' output-staging.json)
      if [ "$STAGE_FAIL" -gt "$PROD_FAIL" ]; then
      echo "⚠️  Staging has more failures than production!"
      fi
```

## Posting evaluation results to GitHub/GitLab

In order to send evaluation results elsewhere, use:

- **GitHub task**
  ```yaml
  - github --add-comment \
    --repository "$CI_REPOSITORY" \
    --issue "$PR_NUMBER" \
    --body "$(cat comment.md)" # set comment as appropriate
  ```
- **cURL** with a Personal Access Token (PAT) against the REST API.

## Troubleshooting

| Problem                  | Remedy                                                                                  |
| ------------------------ | --------------------------------------------------------------------------------------- |
| `npm: command not found` | Add `nodejs:` under `tools` or use an image with Node pre‑installed.                    |
| Cache not restored       | Verify the path and that the `files pull` task succeeds.                                |
| Long‑running jobs        | Split prompt sets into separate flows or raise `timeoutMillis` in the build definition. |
| API rate limits          | Enable Promptfoo cache and/or rotate API keys.                                          |

## Best practices

1. **Incremental testing** – feed `looper diff --name-only prompts/` into `promptfoo eval` to test only changed prompts.
2. **Semantic version tags** – tag prompt sets/configs so you can roll back easily.
3. **Secret management** – store API keys in a secret store and inject them as environment variables.
4. **Reusable library flows** – if multiple repos need the same evaluation, host the flow definition in a central repo and `import` it.