# AGENTS.md

Guidance for AI agents working on this TypeScript codebase.

## Project Overview

Promptfoo is an open-source framework for evaluating and testing LLM applications.

## Project Structure

| Directory           | Purpose                         | Local Docs                   |
| ------------------- | ------------------------------- | ---------------------------- |
| `.agents/`          | Codex marketplace metadata      | `.agents/AGENTS.md`          |
| `.github/`          | GitHub Actions and workflows    | `.github/AGENTS.md`          |
| `code-scan-action/` | Code scan GitHub Action wrapper | `code-scan-action/AGENTS.md` |
| `docs/agents/`      | Reusable coding-agent docs      | `docs/agents/AGENTS.md`      |
| `plugins/`          | Agent plugin bundles            | `plugins/AGENTS.md`          |
| `src/`              | Core library                    | -                            |
| `src/app/`          | Web UI (React 19/Vite/MUI v7)   | `src/app/AGENTS.md`          |
| `src/assertions/`   | Assertion handlers              | `src/assertions/AGENTS.md`   |
| `src/codeScan/`     | Code scan scanner               | `src/codeScan/AGENTS.md`     |
| `src/commands/`     | CLI commands                    | `src/commands/AGENTS.md`     |
| `src/matchers/`     | Assertion matcher helpers       | `src/matchers/AGENTS.md`     |
| `src/providers/`    | LLM providers                   | `src/providers/AGENTS.md`    |
| `src/redteam/`      | Security testing                | `src/redteam/AGENTS.md`      |
| `src/server/`       | Backend server                  | `src/server/AGENTS.md`       |
| `test/`             | Tests (Vitest)                  | `test/AGENTS.md`             |
| `site/`             | Docs site (Docusaurus)          | `site/AGENTS.md`             |
| `examples/`         | Example configs                 | `examples/AGENTS.md`         |
| `drizzle/`          | DB migrations                   | `drizzle/AGENTS.md`          |

**Read the relevant AGENTS.md when working in that directory.**

## Build Commands

```bash
# Core commands
npm run build              # Build the project
npm run build:clean        # Clean the dist directory
npm run build:watch        # Watch and rebuild TypeScript files
npm test                   # Run all tests
npm run tsc                # Run TypeScript compiler

# Linting & Formatting
npm run lint               # Run Biome linter (alias for lint:src)
npm run lint:src           # Lint src directory
npm run lint:tests         # Lint test directory
npm run lint:site          # Lint site directory
npm run format             # Format all files (Biome + Prettier)
npm run format:check       # Check formatting without changes
npm run l                  # Lint only changed files
npm run f                  # Format only changed files

# Testing
npm run test:watch         # Run tests in watch mode
npm run test:integration   # Run integration tests
npm run test:redteam:integration  # Run red team integration tests
npm run test:app -- src/pages/path/to/test.test.tsx --run  # Run a specific frontend test file from repo root
npx vitest path/to/test    # Run a specific backend test file

# Development
npm run dev                # Start both server and app
npm run dev:app            # Start only frontend (localhost:3000)
npm run dev:server         # Start only server/API (localhost:15500)
npm run local -- eval      # Test with local build

# Database
npm run db:generate        # Generate Drizzle migrations
npm run db:migrate         # Run database migrations
npm run db:studio          # Open Drizzle studio

# Other
npm run jsonSchema:generate  # Generate JSON schema for config
npm run citation:generate    # Generate citation file
```

## Testing in Development

When testing changes, use the local build:

```bash
npm run local -- eval -c path/to/config.yaml
```

**Important:** Always use `--` before flags with `npm run local`:

```bash
npm run local -- eval --max-concurrency 1  # Correct
npm run local eval --max-concurrency 1     # Wrong - flags go to npm
```

**Don't run `npm run local -- view`** unless explicitly asked. Assume the user already has `npm run dev` running. The `view` command serves static production builds without hot reload.

When starting `npm run dev`, keep it attached in a live terminal session; backgrounding with `&`/`nohup` can exit silently in agent shells. The expected local URLs are `http://localhost:3000/` for the Web UI and `http://localhost:15500` for the server/API. Do not assume Vite's default `5173`; confirm the actual ports from startup output or with `lsof -nP -iTCP:3000 -iTCP:15500 -sTCP:LISTEN`.

### Using Environment Variables

The repository includes a `.env` file for API keys. To use it:

```bash
# Use --env-file flag to load environment variables
npm run local -- eval -c config.yaml --env-file .env

# Or set specific variables inline
OPENAI_API_KEY=sk-... npm run local -- eval -c config.yaml

# Disable remote generation for testing
PROMPTFOO_DISABLE_REMOTE_GENERATION=true npm run local -- eval -c config.yaml
```

**Never commit the `.env` file or expose API keys in code or commit messages.**

## Running Evaluations

**Always run from the repository root**, not from subdirectories.

**Always use `--no-cache` during development** to ensure fresh results:

```bash
npm run local -- eval -c examples/my-example/promptfooconfig.yaml --no-cache
```

**Export and inspect results** to verify pass/fail/errors:

```bash
npm run local -- eval -c path/to/config.yaml -o output.json --no-cache
```

Add `--env-file .env` or another explicit env file only when the eval needs local
secrets and the file exists.

Review the output file for `success`, `score`, and `error` fields. With the default
pass-rate threshold, exit code 0 means the eval met the threshold; still inspect the
JSON for per-test failures, errors, and scores, especially when the threshold has been
lowered. This is the standard command for verifying a PR end-to-end.

Keep local secrets in the repo's gitignored `.env` (or another path the user points at
with `--env-file`); never echo them into logs or commit messages.

## End-to-End Work Expectations

When asked only to review or audit a PR, keep the work read-only: inspect the branch, diff, PR comments, and CI as needed; run non-mutating tests or QA when useful; and report findings without committing, pushing, or changing files unless the user explicitly asks for fixes.

When asked to fix, improve, or land a PR, own the full loop: check out the branch, inspect the diff and PR comments, merge or rebase on current `origin/main` when requested, run focused tests, run the relevant real workflow, commit, push, and watch CI until it is green or the remaining failure is clearly unrelated.

**Standing commit/push authorization on feature branches.** When the user has asked you to fix, improve, or land work on a non-`main` branch, you have durable authorization to `git commit` and `git push` to that branch's tracking remote without per-step confirmation. Do not pause to ask "want me to commit?" — committing and pushing is part of the requested work. The safety constraints in _Git Workflow (CRITICAL)_ below (no commits to `main`, no `--force` without approval, no `--no-verify`, etc.) still apply.

For behavior changes, do not stop at unit tests. Run the actual CLI or example with the local build. For eval and redteam work, prefer:

```bash
npm run local -- eval -c path/to/promptfooconfig.yaml --no-cache -o output.json
```

Add `--env-file .env` only when the eval needs local credentials and the file exists.

Inspect exported JSON for `success`, `score`, `error`, provider outputs, traces, and redteam findings. If you claim a redteam ran, report the plugins, strategies, interesting failures, and the evidence reviewed.

## Debugging & Troubleshooting

**Before running tests or review checks, align Node with the repo version first:**

```bash
nvm use
```

If you're using npm rather than pnpm/yarn, match the repo's npm major before treating install behavior as authoritative:

```bash
npm install -g npm@11
```

If Node-based tools fail with `ERR_MODULE_NOT_FOUND` or similar missing-package errors in a fresh worktree, run `npm ci` before treating the environment as blocked.

If database-backed tests fail with a `better-sqlite3` ABI or `NODE_MODULE_VERSION` mismatch after switching Node versions, rebuild the native module for the active Node version before treating the test run as blocked:

```bash
npm rebuild better-sqlite3
```

This is an environment repair step, not a product bug. Agents should try `nvm use` first and `npm rebuild better-sqlite3` second before concluding that review-time tests are blocked by the local setup.

**Verbose logging:**

```bash
npm run local -- eval -c config.yaml --verbose
# Or set environment variable
LOG_LEVEL=debug npm run local -- eval -c config.yaml
```

**Disable cache** (results may be cached during development):

```bash
npm run local -- eval -c config.yaml --no-cache
```

**View results in web UI:** First check if the Web UI is running on port 3000, then ask user before starting. Use `npm run dev` for localhost:3000.

**Cache:** Located at `~/.promptfoo/cache` by default, unless overridden with
`PROMPTFOO_CACHE_PATH` or `PROMPTFOO_CONFIG_DIR`. **NEVER delete or clear the cache
without explicit permission.** Use `--no-cache` flag instead.

**Database:** Located at `~/.promptfoo/promptfoo.db` (SQLite). You may read from it but **NEVER delete it**.

## Git Workflow (CRITICAL)

- **NEVER** commit/push directly to main
- **NEVER** use `--force` without explicit approval
- **NEVER** comment on GitHub issues - only create PRs to address them
- **ALWAYS create new commits** - never amend, squash, or rebase unless explicitly asked
- All changes go through pull requests

**Standard workflow:**

```bash
git checkout main && git pull origin main   # Always start fresh
git checkout -b feature/your-branch-name    # New branch for changes
# Make changes...
git add <specific-files>                    # Never blindly add everything
npm run l && npm run f                      # Lint and format before commit/push
git commit -m "type(scope): description"    # Conventional commit format
git fetch origin main && git merge origin/main  # Sync with main
git push -u origin feature/your-branch-name # Push branch
```

**Conventional commit types:** `feat`, `fix`, `chore`, `docs`, `test`, `refactor`, `ci`, `perf`

See `docs/agents/git-workflow.md` for full workflow.
See `docs/agents/pr-conventions.md` for PR title format and scope selection (especially THE REDTEAM RULE).

## Pull Request Creation

- **Default to full (non-draft) PRs.** Omit `--draft` from `gh pr create` unless the
  user explicitly asks for a draft, or the PR is for an unpublished security advisory
  (see "Security-Sensitive PRs" below). `docs/agents/pr-conventions.md` lists the full
  set of draft exceptions.
- **Never attribute commits or PR bodies to Claude / Claude Code.** No
  `Co-Authored-By: Claude…` trailers, no "Generated with Claude Code" footers. Use
  your configured git identity only.
- **Update the existing PR instead of opening a new one** when iterating on a branch
  that already has an open PR. Push to the same branch. Only run `gh pr create` if the
  user explicitly asks for a new PR or the existing PR is closed.
- **Don't let `npm audit fix` drift ride along with an unrelated change.** If
  `package-lock.json` changes outside the scope of the PR, revert the drift and ship
  it separately so reviewers can reason about each change independently.

## Security-Sensitive PRs

- **Before opening any public PR for a CVE/GHSA:** confirm the advisory has been
  published and the coordinated-disclosure embargo has lifted. See `SECURITY.md` for
  the disclosure policy. If the advisory is still private, use the GHSA private
  collaboration flow (or a temporary private fork) until the release that contains the
  fix is cut.
- Do **not** put the CVE/GHSA identifier, exploit description, or vulnerable-version
  range in a PR title, body, or branch name before disclosure.
- Every security fix should land with a regression test that exercises the original
  attack vector.

## Screenshots for Pull Requests

GitHub has no official API for uploading images to PR descriptions. When asked to add screenshots to a PR:

1. **Take the screenshot** using browser tools or other methods
2. **Upload to freeimage.host** (no API key required):

```bash
curl -s -X POST \
  -F "source=@/path/to/screenshot.png" \
  -F "type=file" \
  -F "action=upload" \
  "https://freeimage.host/api/1/upload?key=6d207e02198a847aa98d0a2a901485a5" \
  | jq -r '.image.url'
```

3. **Update the PR body** with the returned URL:

```bash
gh pr edit <PR_NUMBER> --body "$(cat <<'EOF'
## Summary
...
## Screenshot
![Screenshot](https://iili.io/XXXXXXX.png)
...
EOF
)"
```

**Do NOT:**

- Commit screenshots to the branch
- Upload to GitHub release assets
- Use GitHub's internal upload endpoints (require browser cookies, not PATs)

## Code Style Guidelines

- Use TypeScript with strict type checking
- Keep tracked root-owned TypeScript files under the root compiler project unless they belong to an explicitly separate project such as `src/app/` or `test/code-scan-action/`.
- Follow consistent import order (Biome handles sorting)
- Use consistent curly braces for all control statements
- Prefer `const` over `let`; avoid `var`
- Use object shorthand syntax whenever possible
- Use `async/await` for asynchronous code
- Use Vitest for all tests (both `test/` and `src/app/`)
- Use consistent error handling with proper type checks
- Avoid re-exporting from files; import directly from the source module

**Before committing:** `npm run l && npm run f`

**Pre-commit hook:** A pre-commit hook is installed automatically on `npm install` and runs Biome and Prettier on staged files.

## Logging

Use the logger with object context (auto-sanitized):

```typescript
logger.debug('[Component] Message', { headers, body, config });
```

See `docs/agents/logging.md` for details on sanitization patterns.

## Testing

- **Vitest** is the test framework for all tests
- Frontend tests (`src/app/`): Vitest with explicit imports
- Backend tests (`test/`): Vitest with globals enabled (`describe`, `it`, `expect` available without imports)

See `test/AGENTS.md` for testing patterns.

## Project Conventions

- **ESM modules** (type: "module" in package.json)
- **Node.js ^20.20.0 || >=22.22.0** - Before `npm`/`vite`/`vitest`, run `source ~/.nvm/nvm.sh && nvm use` so `node -v` matches `.nvmrc`. If you're using npm, upgrade to `npm@11` so the repo's release-age policy is applied consistently. If native modules still mismatch before tests or review checks, run `npm rebuild better-sqlite3`. `.npmrc` sets `engine-strict=true`
- **Alternative package managers** (pnpm, yarn) are supported
- **File structure:** core logic in `src/`, tests in `test/`
- **Examples** belong in `examples/` with clear README.md
- **Drizzle ORM** for database operations
- **Workspaces** include `src/app` and `site` directories
- **Don't edit `CHANGELOG.md`** - it's auto-generated

## Before Writing Code

- **Search for existing implementations** before creating new code
- **Check for existing utilities** in `src/util/` before adding helpers
- **Don't add dependencies** without checking if functionality exists in current deps
- **Reuse patterns** from similar files in the codebase
- **Test both success and error cases** for all functionality
- **Document provider configurations** following examples in existing code

## Adversarial and Redteam Bias

For security, model scanning, redteam, and coding-agent work, test like an attacker first. Look for false negatives, bypasses, hidden payloads, unsafe tool use, prompt injection, exfiltration, cache misuse, and evidence gaps. When a bypass is found, add a focused regression test before or alongside the fix.

For demo/example apps used to show red teaming, do not harden away all interesting findings unless explicitly asked. A slightly vulnerable sample app is useful when the goal is to demonstrate Promptfoo's ability to find real breaks.

## Review Guidelines

- Prioritize security regressions first, especially injection risks, unsafe handling of user-controlled or adversarial content, credential exposure, SSRF, path traversal, unsafe deserialization, and authorization mistakes.
- Then prioritize correctness issues that can break behavior, public APIs, data integrity, concurrency, or error handling.
- Treat missing or ineffective tests as a P1 issue when a change adds security-sensitive behavior, changes public behavior, or fixes a bug without meaningful coverage.
- Focus on the code changed by the pull request. Do not flag pre-existing issues outside the touched diff unless the pull request materially worsens them.
- Avoid repeating findings that were already raised in the current pull request unless the new diff reintroduces them or leaves the same risk in newly changed code.
- Verify findings on the current branch tip after syncing with the latest `main`.
- Treat existing PR comments and bot reviews as hints; confirm they still apply before reporting them. If CI is failing, inspect the failing job logs and separate unrelated base-branch failures from PR regressions.
- Ignore formatting, import ordering, naming, and other style-only issues already enforced by CI or repository tooling.
- If a pull request is primarily about redteam functionality, verify the title follows THE REDTEAM RULE in `docs/agents/pr-conventions.md` and uses `(redteam)` scope. Incidental `src/redteam/` touches in broad maintenance PRs do not require `(redteam)` scope.

## Documentation Testing

When testing doc changes, speed up builds by skipping OG image generation:

```bash
cd site
SKIP_OG_GENERATION=true npm run build
```

See `site/AGENTS.md` for documentation guidelines.

## Additional Documentation

Read these when relevant to your task:

| Document                               | When to Read                              |
| -------------------------------------- | ----------------------------------------- |
| `docs/agents/pr-conventions.md`        | Creating pull requests                    |
| `docs/agents/git-workflow.md`          | Git operations                            |
| `docs/agents/dependency-management.md` | Updating packages                         |
| `docs/agents/logging.md`               | Adding logging to code                    |
| `docs/agents/python.md`                | Python providers/scripts                  |
| `docs/agents/database-security.md`     | Writing database queries                  |
| `src/app/AGENTS.md`                    | Frontend React development                |
| `src/providers/AGENTS.md`              | Adding/modifying LLM providers            |
| `test/AGENTS.md`                       | Writing tests                             |
| `site/AGENTS.md`                       | Documentation site changes                |
| `.github/AGENTS.md`                    | GitHub Actions / release workflow changes |