# AGENTS.md Guidance for AI agents working on this TypeScript codebase. ## Project Overview Promptfoo is an open-source framework for evaluating and testing LLM applications. ## Project Structure | Directory | Purpose | Local Docs | | ------------------- | ------------------------------- | ---------------------------- | | `.agents/` | Codex marketplace metadata | `.agents/AGENTS.md` | | `.github/` | GitHub Actions and workflows | `.github/AGENTS.md` | | `code-scan-action/` | Code scan GitHub Action wrapper | `code-scan-action/AGENTS.md` | | `docs/agents/` | Reusable coding-agent docs | `docs/agents/AGENTS.md` | | `plugins/` | Agent plugin bundles | `plugins/AGENTS.md` | | `src/` | Core library | - | | `src/app/` | Web UI (React 19/Vite/MUI v7) | `src/app/AGENTS.md` | | `src/assertions/` | Assertion handlers | `src/assertions/AGENTS.md` | | `src/codeScan/` | Code scan scanner | `src/codeScan/AGENTS.md` | | `src/commands/` | CLI commands | `src/commands/AGENTS.md` | | `src/matchers/` | Assertion matcher helpers | `src/matchers/AGENTS.md` | | `src/providers/` | LLM providers | `src/providers/AGENTS.md` | | `src/redteam/` | Security testing | `src/redteam/AGENTS.md` | | `src/server/` | Backend server | `src/server/AGENTS.md` | | `test/` | Tests (Vitest) | `test/AGENTS.md` | | `site/` | Docs site (Docusaurus) | `site/AGENTS.md` | | `examples/` | Example configs | `examples/AGENTS.md` | | `drizzle/` | DB migrations | `drizzle/AGENTS.md` | **Read the relevant AGENTS.md when working in that directory.** ## Build Commands ```bash # Core commands npm run build # Build the project npm run build:clean # Clean the dist directory npm run build:watch # Watch and rebuild TypeScript files npm test # Run all tests npm run tsc # Run TypeScript compiler # Linting & Formatting npm run lint # Run Biome linter (alias for lint:src) npm run lint:src # Lint src directory npm run lint:tests # Lint test directory npm run lint:site # Lint site directory npm run format # Format all files (Biome + Prettier) npm run format:check # Check formatting without changes npm run l # Lint only changed files npm run f # Format only changed files # Testing npm run test:watch # Run tests in watch mode npm run test:integration # Run integration tests npm run test:redteam:integration # Run red team integration tests npm run test:app -- src/pages/path/to/test.test.tsx --run # Run a specific frontend test file from repo root npx vitest path/to/test # Run a specific backend test file # Development npm run dev # Start both server and app npm run dev:app # Start only frontend (localhost:3000) npm run dev:server # Start only server/API (localhost:15500) npm run local -- eval # Test with local build # Database npm run db:generate # Generate Drizzle migrations npm run db:migrate # Run database migrations npm run db:studio # Open Drizzle studio # Other npm run jsonSchema:generate # Generate JSON schema for config npm run citation:generate # Generate citation file ``` ## Testing in Development When testing changes, use the local build: ```bash npm run local -- eval -c path/to/config.yaml ``` **Important:** Always use `--` before flags with `npm run local`: ```bash npm run local -- eval --max-concurrency 1 # Correct npm run local eval --max-concurrency 1 # Wrong - flags go to npm ``` **Don't run `npm run local -- view`** unless explicitly asked. Assume the user already has `npm run dev` running. The `view` command serves static production builds without hot reload. When starting `npm run dev`, keep it attached in a live terminal session; backgrounding with `&`/`nohup` can exit silently in agent shells. The expected local URLs are `http://localhost:3000/` for the Web UI and `http://localhost:15500` for the server/API. Do not assume Vite's default `5173`; confirm the actual ports from startup output or with `lsof -nP -iTCP:3000 -iTCP:15500 -sTCP:LISTEN`. ### Using Environment Variables The repository includes a `.env` file for API keys. To use it: ```bash # Use --env-file flag to load environment variables npm run local -- eval -c config.yaml --env-file .env # Or set specific variables inline OPENAI_API_KEY=sk-... npm run local -- eval -c config.yaml # Disable remote generation for testing PROMPTFOO_DISABLE_REMOTE_GENERATION=true npm run local -- eval -c config.yaml ``` **Never commit the `.env` file or expose API keys in code or commit messages.** ## Running Evaluations **Always run from the repository root**, not from subdirectories. **Always use `--no-cache` during development** to ensure fresh results: ```bash npm run local -- eval -c examples/my-example/promptfooconfig.yaml --no-cache ``` **Export and inspect results** to verify pass/fail/errors: ```bash npm run local -- eval -c path/to/config.yaml -o output.json --no-cache ``` Add `--env-file .env` or another explicit env file only when the eval needs local secrets and the file exists. Review the output file for `success`, `score`, and `error` fields. With the default pass-rate threshold, exit code 0 means the eval met the threshold; still inspect the JSON for per-test failures, errors, and scores, especially when the threshold has been lowered. This is the standard command for verifying a PR end-to-end. Keep local secrets in the repo's gitignored `.env` (or another path the user points at with `--env-file`); never echo them into logs or commit messages. ## End-to-End Work Expectations When asked only to review or audit a PR, keep the work read-only: inspect the branch, diff, PR comments, and CI as needed; run non-mutating tests or QA when useful; and report findings without committing, pushing, or changing files unless the user explicitly asks for fixes. When asked to fix, improve, or land a PR, own the full loop: check out the branch, inspect the diff and PR comments, merge or rebase on current `origin/main` when requested, run focused tests, run the relevant real workflow, commit, push, and watch CI until it is green or the remaining failure is clearly unrelated. **Standing commit/push authorization on feature branches.** When the user has asked you to fix, improve, or land work on a non-`main` branch, you have durable authorization to `git commit` and `git push` to that branch's tracking remote without per-step confirmation. Do not pause to ask "want me to commit?" — committing and pushing is part of the requested work. The safety constraints in _Git Workflow (CRITICAL)_ below (no commits to `main`, no `--force` without approval, no `--no-verify`, etc.) still apply. For behavior changes, do not stop at unit tests. Run the actual CLI or example with the local build. For eval and redteam work, prefer: ```bash npm run local -- eval -c path/to/promptfooconfig.yaml --no-cache -o output.json ``` Add `--env-file .env` only when the eval needs local credentials and the file exists. Inspect exported JSON for `success`, `score`, `error`, provider outputs, traces, and redteam findings. If you claim a redteam ran, report the plugins, strategies, interesting failures, and the evidence reviewed. ## Debugging & Troubleshooting **Before running tests or review checks, align Node with the repo version first:** ```bash nvm use ``` If you're using npm rather than pnpm/yarn, match the repo's npm major before treating install behavior as authoritative: ```bash npm install -g npm@11 ``` If Node-based tools fail with `ERR_MODULE_NOT_FOUND` or similar missing-package errors in a fresh worktree, run `npm ci` before treating the environment as blocked. If database-backed tests fail with a `better-sqlite3` ABI or `NODE_MODULE_VERSION` mismatch after switching Node versions, rebuild the native module for the active Node version before treating the test run as blocked: ```bash npm rebuild better-sqlite3 ``` This is an environment repair step, not a product bug. Agents should try `nvm use` first and `npm rebuild better-sqlite3` second before concluding that review-time tests are blocked by the local setup. **Verbose logging:** ```bash npm run local -- eval -c config.yaml --verbose # Or set environment variable LOG_LEVEL=debug npm run local -- eval -c config.yaml ``` **Disable cache** (results may be cached during development): ```bash npm run local -- eval -c config.yaml --no-cache ``` **View results in web UI:** First check if the Web UI is running on port 3000, then ask user before starting. Use `npm run dev` for localhost:3000. **Cache:** Located at `~/.promptfoo/cache` by default, unless overridden with `PROMPTFOO_CACHE_PATH` or `PROMPTFOO_CONFIG_DIR`. **NEVER delete or clear the cache without explicit permission.** Use `--no-cache` flag instead. **Database:** Located at `~/.promptfoo/promptfoo.db` (SQLite). You may read from it but **NEVER delete it**. ## Git Workflow (CRITICAL) - **NEVER** commit/push directly to main - **NEVER** use `--force` without explicit approval - **NEVER** comment on GitHub issues - only create PRs to address them - **ALWAYS create new commits** - never amend, squash, or rebase unless explicitly asked - All changes go through pull requests **Standard workflow:** ```bash git checkout main && git pull origin main # Always start fresh git checkout -b feature/your-branch-name # New branch for changes # Make changes... git add # Never blindly add everything npm run l && npm run f # Lint and format before commit/push git commit -m "type(scope): description" # Conventional commit format git fetch origin main && git merge origin/main # Sync with main git push -u origin feature/your-branch-name # Push branch ``` **Conventional commit types:** `feat`, `fix`, `chore`, `docs`, `test`, `refactor`, `ci`, `perf` See `docs/agents/git-workflow.md` for full workflow. See `docs/agents/pr-conventions.md` for PR title format and scope selection (especially THE REDTEAM RULE). ## Pull Request Creation - **Default to full (non-draft) PRs.** Omit `--draft` from `gh pr create` unless the user explicitly asks for a draft, or the PR is for an unpublished security advisory (see "Security-Sensitive PRs" below). `docs/agents/pr-conventions.md` lists the full set of draft exceptions. - **Never attribute commits or PR bodies to Claude / Claude Code.** No `Co-Authored-By: Claude…` trailers, no "Generated with Claude Code" footers. Use your configured git identity only. - **Update the existing PR instead of opening a new one** when iterating on a branch that already has an open PR. Push to the same branch. Only run `gh pr create` if the user explicitly asks for a new PR or the existing PR is closed. - **Don't let `npm audit fix` drift ride along with an unrelated change.** If `package-lock.json` changes outside the scope of the PR, revert the drift and ship it separately so reviewers can reason about each change independently. ## Security-Sensitive PRs - **Before opening any public PR for a CVE/GHSA:** confirm the advisory has been published and the coordinated-disclosure embargo has lifted. See `SECURITY.md` for the disclosure policy. If the advisory is still private, use the GHSA private collaboration flow (or a temporary private fork) until the release that contains the fix is cut. - Do **not** put the CVE/GHSA identifier, exploit description, or vulnerable-version range in a PR title, body, or branch name before disclosure. - Every security fix should land with a regression test that exercises the original attack vector. ## Screenshots for Pull Requests GitHub has no official API for uploading images to PR descriptions. When asked to add screenshots to a PR: 1. **Take the screenshot** using browser tools or other methods 2. **Upload to freeimage.host** (no API key required): ```bash curl -s -X POST \ -F "source=@/path/to/screenshot.png" \ -F "type=file" \ -F "action=upload" \ "https://freeimage.host/api/1/upload?key=6d207e02198a847aa98d0a2a901485a5" \ | jq -r '.image.url' ``` 3. **Update the PR body** with the returned URL: ```bash gh pr edit --body "$(cat <<'EOF' ## Summary ... ## Screenshot ![Screenshot](https://iili.io/XXXXXXX.png) ... EOF )" ``` **Do NOT:** - Commit screenshots to the branch - Upload to GitHub release assets - Use GitHub's internal upload endpoints (require browser cookies, not PATs) ## Code Style Guidelines - Use TypeScript with strict type checking - Keep tracked root-owned TypeScript files under the root compiler project unless they belong to an explicitly separate project such as `src/app/` or `test/code-scan-action/`. - Follow consistent import order (Biome handles sorting) - Use consistent curly braces for all control statements - Prefer `const` over `let`; avoid `var` - Use object shorthand syntax whenever possible - Use `async/await` for asynchronous code - Use Vitest for all tests (both `test/` and `src/app/`) - Use consistent error handling with proper type checks - Avoid re-exporting from files; import directly from the source module **Before committing:** `npm run l && npm run f` **Pre-commit hook:** A pre-commit hook is installed automatically on `npm install` and runs Biome and Prettier on staged files. ## Logging Use the logger with object context (auto-sanitized): ```typescript logger.debug('[Component] Message', { headers, body, config }); ``` See `docs/agents/logging.md` for details on sanitization patterns. ## Testing - **Vitest** is the test framework for all tests - Frontend tests (`src/app/`): Vitest with explicit imports - Backend tests (`test/`): Vitest with globals enabled (`describe`, `it`, `expect` available without imports) See `test/AGENTS.md` for testing patterns. ## Project Conventions - **ESM modules** (type: "module" in package.json) - **Node.js ^20.20.0 || >=22.22.0** - Before `npm`/`vite`/`vitest`, run `source ~/.nvm/nvm.sh && nvm use` so `node -v` matches `.nvmrc`. If you're using npm, upgrade to `npm@11` so the repo's release-age policy is applied consistently. If native modules still mismatch before tests or review checks, run `npm rebuild better-sqlite3`. `.npmrc` sets `engine-strict=true` - **Alternative package managers** (pnpm, yarn) are supported - **File structure:** core logic in `src/`, tests in `test/` - **Examples** belong in `examples/` with clear README.md - **Drizzle ORM** for database operations - **Workspaces** include `src/app` and `site` directories - **Don't edit `CHANGELOG.md`** - it's auto-generated ## Before Writing Code - **Search for existing implementations** before creating new code - **Check for existing utilities** in `src/util/` before adding helpers - **Don't add dependencies** without checking if functionality exists in current deps - **Reuse patterns** from similar files in the codebase - **Test both success and error cases** for all functionality - **Document provider configurations** following examples in existing code ## Adversarial and Redteam Bias For security, model scanning, redteam, and coding-agent work, test like an attacker first. Look for false negatives, bypasses, hidden payloads, unsafe tool use, prompt injection, exfiltration, cache misuse, and evidence gaps. When a bypass is found, add a focused regression test before or alongside the fix. For demo/example apps used to show red teaming, do not harden away all interesting findings unless explicitly asked. A slightly vulnerable sample app is useful when the goal is to demonstrate Promptfoo's ability to find real breaks. ## Review Guidelines - Prioritize security regressions first, especially injection risks, unsafe handling of user-controlled or adversarial content, credential exposure, SSRF, path traversal, unsafe deserialization, and authorization mistakes. - Then prioritize correctness issues that can break behavior, public APIs, data integrity, concurrency, or error handling. - Treat missing or ineffective tests as a P1 issue when a change adds security-sensitive behavior, changes public behavior, or fixes a bug without meaningful coverage. - Focus on the code changed by the pull request. Do not flag pre-existing issues outside the touched diff unless the pull request materially worsens them. - Avoid repeating findings that were already raised in the current pull request unless the new diff reintroduces them or leaves the same risk in newly changed code. - Verify findings on the current branch tip after syncing with the latest `main`. - Treat existing PR comments and bot reviews as hints; confirm they still apply before reporting them. If CI is failing, inspect the failing job logs and separate unrelated base-branch failures from PR regressions. - Ignore formatting, import ordering, naming, and other style-only issues already enforced by CI or repository tooling. - If a pull request is primarily about redteam functionality, verify the title follows THE REDTEAM RULE in `docs/agents/pr-conventions.md` and uses `(redteam)` scope. Incidental `src/redteam/` touches in broad maintenance PRs do not require `(redteam)` scope. ## Documentation Testing When testing doc changes, speed up builds by skipping OG image generation: ```bash cd site SKIP_OG_GENERATION=true npm run build ``` See `site/AGENTS.md` for documentation guidelines. ## Additional Documentation Read these when relevant to your task: | Document | When to Read | | -------------------------------------- | ----------------------------------------- | | `docs/agents/pr-conventions.md` | Creating pull requests | | `docs/agents/git-workflow.md` | Git operations | | `docs/agents/dependency-management.md` | Updating packages | | `docs/agents/logging.md` | Adding logging to code | | `docs/agents/python.md` | Python providers/scripts | | `docs/agents/database-security.md` | Writing database queries | | `src/app/AGENTS.md` | Frontend React development | | `src/providers/AGENTS.md` | Adding/modifying LLM providers | | `test/AGENTS.md` | Writing tests | | `site/AGENTS.md` | Documentation site changes | | `.github/AGENTS.md` | GitHub Actions / release workflow changes |