--- title: Promptfoo MCP Server description: Deploy promptfoo as Model Context Protocol server enabling external AI agents to access evaluation and red teaming capabilities sidebar_label: MCP Server sidebar_position: 21 --- # Promptfoo MCP Server Expose promptfoo's eval tools to AI agents via Model Context Protocol (MCP). :::info Prerequisites - Node.js installed on your system - A promptfoo project with some evaluations (for testing the connection) - Cursor IDE, Claude Desktop, or another MCP-compatible AI tool ::: ## Quick Start ### 1. Start the Server ```bash # For Cursor, Claude Desktop (STDIO transport) npx promptfoo@latest mcp --transport stdio # For web tools (HTTP transport) npx promptfoo@latest mcp --transport http --port 3100 ``` ### 2. Configure Your AI Tool **Cursor**: Create `.cursor/mcp.json` in your project root ```json title=".cursor/mcp.json" { "mcpServers": { "promptfoo": { "command": "npx", "args": ["promptfoo@latest", "mcp", "--transport", "stdio"], "description": "Promptfoo MCP server for LLM evaluation and testing" } } } ``` :::warning Development vs Production Configuration **For regular usage:** Always use `npx promptfoo@latest` as shown above. **For promptfoo contributors:** The repository's `.cursor/mcp.json` runs from source code for development. It requires the repo's dev dependencies and won't work elsewhere. ::: **Claude Desktop**: Add to config file Config file locations: - **macOS:** `~/Library/Application Support/Claude/claude_desktop_config.json` - **Windows:** `%APPDATA%\Claude\claude_desktop_config.json` - **Linux:** `~/.config/Claude/claude_desktop_config.json` ```json title="claude_desktop_config.json" { "mcpServers": { "promptfoo": { "command": "npx", "args": ["promptfoo@latest", "mcp", "--transport", "stdio"], "description": "Promptfoo MCP server for LLM evaluation and testing" } } } ``` **Restart your AI tool** after adding the configuration. ### 3. Test the Connection After restarting your AI tool, you should see promptfoo tools available. Try asking: > "List my recent evaluations using the promptfoo tools" ## Available Tools ### Core Evaluation Tools - **`list_evaluations`** - Browse your evaluation runs with optional dataset filtering - **`get_evaluation_details`** - Get comprehensive results, metrics, and test cases for a specific evaluation - **`run_evaluation`** - Execute evaluations with custom parameters, test case filtering, and concurrency control - **`share_evaluation`** - Generate publicly shareable URLs for evaluation results ### Generation Tools - **`generate_dataset`** - Generate test datasets using AI for comprehensive evaluation coverage - **`generate_test_cases`** - Generate test cases with assertions for existing prompts - **`compare_providers`** - Compare multiple AI providers side-by-side for performance and quality ### Redteam Security Tools - **`redteam_run`** - Execute comprehensive security testing against AI applications with dynamic attack probes - **`redteam_generate`** - Generate adversarial test cases for redteam security testing with configurable plugins and strategies ### Configuration & Testing - **`validate_promptfoo_config`** - Validate configuration files using the same logic as the CLI - **`test_provider`** - Test AI provider connectivity, credentials, and response quality - **`run_assertion`** - Test individual assertion rules against outputs for debugging ## Example Workflows ### 1. Basic Evaluation Workflow Ask your AI assistant: > "Help me run an evaluation. First, validate my config, then list recent evaluations, and finally run a new evaluation with just the first 5 test cases." The AI will use these tools in sequence: 1. `validate_promptfoo_config` - Check your configuration 2. `list_evaluations` - Show recent runs 3. `run_evaluation` - Execute with test case filtering, such as `{"start": 0, "end": 5}` for the first five zero-based test indices ### 2. Provider Comparison > "Compare the performance of GPT-4, Claude 3, and Gemini Pro on my customer support prompt." The AI will: 1. `test_provider` - Verify each provider works 2. `compare_providers` - Run side-by-side comparison 3. Analyze results and provide recommendations ### 3. Security Testing > "Run a security audit on my chatbot prompt to check for jailbreak vulnerabilities." The AI will: 1. `redteam_generate` - Create adversarial test cases 2. `redteam_run` - Execute security tests 3. `get_evaluation_details` - Analyze vulnerabilities found ### 4. Dataset Generation > "Generate 20 diverse test cases for my email classification prompt, including edge cases." The AI will: 1. `generate_dataset` - Create test data with AI 2. `generate_test_cases` - Add appropriate assertions 3. `run_evaluation` - Test the generated cases ## Transport Types Choose the appropriate transport based on your use case: - **STDIO (`--transport stdio`)**: For desktop AI tools (Cursor, Claude Desktop) that communicate via stdin/stdout - **HTTP (`--transport http`)**: For web applications, APIs, and remote integrations that need HTTP endpoints ## Best Practices ### 1. Start Small Begin with simple tools like `list_evaluations` and `validate_promptfoo_config` before moving to more complex operations. ### 2. Use Filtering When working with large datasets: - Filter evaluations by dataset ID - Use test case indices to run partial evaluations - Apply prompt/provider filters for focused testing ### 3. Iterative Testing 1. Validate configuration first 2. Test providers individually before comparisons 3. Run small evaluation subsets before full runs 4. Review results with `get_evaluation_details` ### 4. Security First When using redteam tools: - Start with basic plugins before advanced attacks - Review generated test cases before running - Always analyze results thoroughly ## Troubleshooting ### Server Issues **Server won't start:** ```bash # Verify promptfoo installation npx promptfoo@latest --version # Check if you have a valid promptfoo project npx promptfoo@latest validate # Test the MCP server manually npx promptfoo@latest mcp --transport stdio ``` **Port conflicts (HTTP mode):** ```bash # Use a different port npx promptfoo@latest mcp --transport http --port 8080 # Check what's using port 3100 lsof -i :3100 # macOS/Linux netstat -ano | findstr :3100 # Windows ``` ### AI Tool Connection Issues **AI tool can't connect:** 1. **Verify config syntax:** Ensure your JSON configuration exactly matches the examples above 2. **Check file paths:** Confirm config files are in the correct locations 3. **Restart completely:** Close your AI tool entirely and reopen it 4. **Test HTTP endpoint:** For HTTP transport, verify with `curl http://localhost:3100/health` **Tools not appearing:** 1. Look for MCP or "tools" indicators in your AI tool's interface 2. Try asking explicitly: "What promptfoo tools do you have access to?" 3. Check your AI tool's logs for MCP connection errors ### Tool-Specific Errors **"Eval not found":** - Use `list_evaluations` first to see available evaluation IDs - Ensure you're in a directory with promptfoo evaluation data **"Config error":** - Run `validate_promptfoo_config` to check your configuration - Verify `promptfooconfig.yaml` exists and is valid **"Provider error":** - Use `test_provider` to diagnose connectivity and authentication issues - Check your API keys and provider configurations ## Advanced Usage ### Custom HTTP Integrations For HTTP transport, you can integrate with any system that supports HTTP: ```javascript // Example: Call MCP server from Node.js const response = await fetch('http://localhost:3100/mcp', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ method: 'tools/call', params: { name: 'list_evaluations', arguments: { datasetId: 'my-dataset' }, }, }), }); ``` ### Environment Variables The MCP server respects all promptfoo environment variables: ```bash # Set provider API keys export OPENAI_API_KEY=sk-... export ANTHROPIC_API_KEY=sk-ant-... # Configure promptfoo behavior export PROMPTFOO_CONFIG_DIR=/path/to/configs export PROMPTFOO_OUTPUT_DIR=/path/to/outputs # Start server with environment npx promptfoo@latest mcp --transport stdio ``` ## Resources - [MCP Protocol Documentation](https://modelcontextprotocol.io) - [Promptfoo Documentation](https://promptfoo.dev) - [Example Configurations](https://github.com/promptfoo/promptfoo/tree/main/examples)