---
sidebar_label: n8n
title: Using Promptfoo in n8n Workflows
description: Learn how to integrate Promptfoo's LLM evaluation into your n8n workflows for automated testing, security and quality gates, and result sharing
---

# Using Promptfoo in n8n Workflows

This guide shows how to run Promptfoo evaluations from an **n8n** workflow so you can:

- schedule nightly or ad‑hoc LLM tests,
- gate downstream steps (Slack/Teams alerts, merge approvals, etc.) on pass‑rates, and
- publish rich results links generated by Promptfoo.

## Prerequisites

| What                                                                                 | Why                                                   |
| ------------------------------------------------------------------------------------ | ----------------------------------------------------- |
| **Self‑hosted n8n ≥ v1** (Docker or bare‑metal)                                      | Gives access to the “Execute Command” node.           |
| **Promptfoo CLI** available in the container/host                                    | Needed to run `promptfoo eval`.                       |
| (Optional) **LLM provider API keys** set as environment variables or n8n credentials | Example: `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, …     |
| (Optional) **Slack / email / GitHub nodes** in the same workflow                     | For notifications or comments once the eval finishes. |

### Shipping a custom Docker image (recommended)

The easiest way is to bake Promptfoo into your n8n image so every workflow run already has the CLI:

```dockerfile
# Dockerfile
FROM n8nio/n8n:latest          # or a fixed tag
USER root                      # gain perms to install packages
RUN npm install -g promptfoo   # installs CLI system‑wide
USER node                      # drop back to non‑root
```

Update **`docker‑compose.yml`**:

```yaml
services:
  n8n:
    build: .
    env_file: .env # where your OPENAI_API_KEY lives
    volumes:
      - ./data:/data # prompts & configs live here
```

If you prefer not to rebuild the image you _can_ install Promptfoo on the fly inside the
**Execute Command** node, but that adds 10‑15 s to every execution.

## Basic “Run & Alert” workflow

Below is the minimal pattern most teams start with:

| #   | Node                          | Purpose                                                           |
| --- | ----------------------------- | ----------------------------------------------------------------- |
| 1   | **Trigger** (Cron or Webhook) | Decide _when_ to evaluate (nightly, on Git push webhook …).       |
| 2   | **Execute Command**           | Runs Promptfoo and emits raw stdout / stderr.                     |
| 3   | **Code / Set** node           | Parses the resulting JSON, extracts pass/fail counts & share‑URL. |
| 4   | **IF** node                   | Branches on “failures > 0”.                                       |
| 5   | **Slack / Email / GitHub**    | Sends alert or PR comment when the gate fails.                    |

### Execute Command node configuration

```sh
promptfoo eval \
 -c /data/promptfooconfig.yaml \
 --prompts "/data/prompts/**/*.json" \
 --output /tmp/pf-results.json \
 --share --fail-on-error
cat /tmp/pf-results.json
```

Set the working directory to `/data` (mount it with Docker volume) and set it to execute once (one run per trigger).

The node writes a machine‑readable results file **and** prints it to stdout,
so the next node can simply `JSON.parse($json["stdout"])`.

:::info
The **Execute Command** node that we rely on is only available in **self‑hosted** n8n. n8n Cloud does **not** expose it yet.
:::

### Sample “Parse & alert” snippet (Code node, TypeScript)

```ts
// Input: raw JSON string from previous node
const output = JSON.parse(items[0].json.stdout as string);

const { successes, failures } = output.results.stats;
items[0].json.passRate = successes / (successes + failures);
items[0].json.failures = failures;
items[0].json.shareUrl = output.shareableUrl;

return items;
```

An **IF** node can then route execution:

- **failures = 0** → take _green_ path (maybe just archive the results).
- **failures > 0** → post to Slack or comment on the pull request.

## Evaluating n8n AI Agent prompts and outputs

If your goal is to test the **prompt inside an n8n AI Agent / OpenAI node** (not just run Promptfoo from a workflow), treat the n8n node like any other app contract:

1. Put the agent prompt in a file,
2. Map incoming n8n fields to `tests.vars`, and
3. Assert on the exact JSON or tool-call shape that downstream n8n nodes expect.

This works well when you want to regression-test an agent before wiring it into a larger workflow.

### Validate JSON that downstream n8n nodes consume

If your agent is supposed to emit structured data for a **Set**, **Code**, **Switch**, or **HTTP Request** node, validate the payload directly.

```yaml title="promptfooconfig.yaml"
prompts:
  - file://./prompts/n8n-support-router.txt

providers:
  - openai:gpt-5-mini

tests:
  - vars:
      customer_message: 'Customer wants to cancel order #4815 and asks for a refund'
    assert:
      - type: contains-json
        value:
          type: object
          required: [route, priority, reply]
          properties:
            route:
              type: string
              enum: [billing, support, sales]
            priority:
              type: string
              enum: [low, medium, high]
            reply:
              type: string
```

Use `contains-json` when the model may wrap JSON in prose or a markdown code block. If your node must return **only** JSON, use [`is-json`](/docs/guides/evaluate-json) instead.

### Validate tool calls for agent workflows

If your n8n setup uses an OpenAI-compatible agent that should call tools before continuing, validate that Promptfoo sees a real tool call and that it matches your schema.

```yaml title="promptfooconfig.yaml"
prompts:
  - file://./prompts/n8n-calendar-agent.txt

providers:
  - id: openai:gpt-5-mini
    config:
      tools: file://./tools/calendar-tools.yaml

tests:
  - vars:
      user_request: "Move tomorrow's standup to 3pm and notify the team"
    assert:
      - type: finish-reason
        value: tool_calls
      - type: is-valid-openai-tools-call
```

That pattern is especially useful when your n8n workflow branches on whether the LLM produced a tool invocation versus a final answer.

### Useful building blocks

- [`/docs/configuration/tools`](/docs/configuration/tools) for defining tool schemas
- [`/docs/guides/evaluate-json`](/docs/guides/evaluate-json) for JSON and schema assertions
- [`examples/openai-tools-call`](https://github.com/promptfoo/promptfoo/tree/main/examples/openai-tools-call) for a concrete OpenAI tool-calling config
- [`examples/eval-tool-use`](https://github.com/promptfoo/promptfoo/tree/main/examples/eval-tool-use) for finish-reason and tool-use checks across providers

## Advanced patterns

### Run different configs in parallel

Make the first **Execute Command** node loop over an array of model IDs or config
files and push each run as a separate item.
Downstream nodes will automatically fan‑out and handle each result independently.

### Version‑controlled prompts

Mount your prompts directory and config file into the container at
`/data`. When you commit new prompts to Git, your CI/CD system can call the
**n8n REST API** or a **Webhook trigger** to re‑evaluate immediately.

### Auto‑fail the whole workflow

If you run n8n **headless** via `n8n start --tunnel`, you can call this workflow
from CI pipelines (GitHub Actions, GitLab, …) with the [CLI `n8n execute`
command](https://docs.n8n.io/hosting/cli-commands/) and then check the HTTP
response code; returning `exit 1` from the Execute Command node will propagate
the failure.

## Security & best practices

- **Keep API keys secret** – store them in the n8n credential store or inject as
  environment variables from Docker secrets, not hard‑coded in workflows.
- **Resource usage** – Promptfoo supports caching via
  `PROMPTFOO_CACHE_PATH`; mount that directory to persist across runs.
- **Timeouts** – wrap `promptfoo eval` with `timeout --signal=SIGKILL 15m …`
  (Linux) if you need hard execution limits.
- **Logging** – route the `stderr` field of Execute Command to a dedicated log
  channel so you don’t miss stack traces.

## Troubleshooting

| Symptom                                     | Likely cause / fix                                                                                    |
| ------------------------------------------- | ----------------------------------------------------------------------------------------------------- |
| **`Execute Command node not available`**    | You’re on n8n Cloud; switch to self‑hosted.                                                           |
| **`promptfoo: command not found`**          | Promptfoo not installed inside the container. Rebuild your Docker image or add an install step.       |
| **Run fails with `ENOENT` on config paths** | Make sure the prompts/config volume is mounted at the same path you reference in the command.         |
| **Large evals time‑out**                    | Increase the node’s “Timeout (s)” setting _or_ chunk your test cases and iterate inside the workflow. |

## Next steps

1. Combine Promptfoo with the **n8n AI Transform** node to chain evaluations
   into multi‑step RAG pipelines.
2. Use **n8n Insights** (self‑hosted EE) to monitor historical pass‑rates and
   surface regressions.
3. Check out the other [CI integrations](/docs/integrations/ci-cd) ([GitHub Actions](/docs/integrations/github-action), [CircleCI](/docs/integrations/circle-ci), etc) for inspiration.

Happy automating!