# redteam-tracing-example (Red Team Tracing)

You can run this example with:

```bash
npx promptfoo@latest init --example redteam-tracing-example
cd redteam-tracing-example
```

This example demonstrates how to use tracing with red team strategies to provide attackers and graders with visibility into the internal operations of your LLM application.

## Quick Start

**1. Install dependencies:**

```bash
npm install
```

**2. Start the mock traced server:**

```bash
npm run server
```

This starts an HTTP server on port 3110 that:

- Accepts chat requests
- Generates OTLP trace spans (LLM calls, guardrails, tools)
- Sends spans to promptfoo's OTLP receiver

**3. Test the server (optional):**

```bash
# In another terminal
./test-server.sh
```

**4. Run the red team evaluation:**

```bash
# In another terminal (from the project root)
npm run local -- eval -c examples/redteam-tracing-example/promptfooconfig.yaml
```

**5. View the results:**

```bash
npm run local -- view
```

You'll see trace data in:

- Attack prompts (when `includeInAttack: true`)
- Grading context (when `includeInGrading: true`)
- Test metadata (`traceSnapshots`)

## Troubleshooting

**Server not responding?**

```bash
# Check if server is running
curl http://localhost:3110/health

# Test basic request
curl -X POST http://localhost:3110/chat \
  -H "Content-Type: application/json" \
  -d '{"prompt": "test"}'
```

**No traces appearing?**

- Make sure the server is emitting to the correct OTLP endpoint (check server logs)
- Verify promptfoo's OTLP receiver is enabled in config (`tracing.enabled: true`)
- Check that `traceparent` headers are being passed (set in provider context)

## What is Red Team Tracing?

Red team tracing allows adversarial strategies to see what happens inside your LLM application during an attack, including:

- Tool calls and their results
- Guardrail decisions
- Internal LLM calls
- Error conditions
- Performance metrics

This information can help:

1. **Attack generation**: Craft more effective attacks by understanding how the system responds internally
2. **Grading**: Make more informed decisions about whether an attack succeeded by seeing internal behavior

## Configuration

### Basic Configuration

Enable tracing in your `promptfooconfig.yaml`:

```yaml
redteam:
  tracing:
    # Enable tracing for all strategies
    enabled: true

    # Include trace data in attack generation (default: true)
    includeInAttack: true

    # Include trace data in grading (default: true)
    includeInGrading: true

  plugins:
    - harmful
    - pii

  strategies:
    - crescendo
    - goat
```

### Advanced Configuration

Configure tracing behavior:

```yaml
redteam:
  tracing:
    enabled: true

    # Include internal spans (e.g., tokenization, parsing)
    includeInternalSpans: false

    # Maximum number of spans to fetch per iteration
    maxSpans: 50

    # Maximum depth of nested spans to fetch
    maxDepth: 5

    # Retry configuration for fetching traces
    maxRetries: 3
    retryDelayMs: 500

    # Filter spans by name pattern (optional)
    spanFilter:
      - 'llm.*'
      - 'tool.*'
      - 'guardrail.*'

    # Sanitize sensitive attributes (recommended)
    sanitizeAttributes: true
```

### Strategy-Specific Configuration

Different strategies may need different tracing settings:

```yaml
redteam:
  tracing:
    enabled: true

    # Strategy-specific overrides
    strategies:
      # Crescendo benefits from seeing guardrail decisions
      crescendo:
        includeInAttack: true
        includeInGrading: true
        spanFilter:
          - 'guardrail.*'
          - 'llm.*'

      # GOAT can use tool call information
      goat:
        includeInAttack: true
        spanFilter:
          - 'tool.*'
          - 'llm.*'

      # Iterative may want full trace data
      iterative:
        includeInAttack: true
        includeInGrading: true
        maxSpans: 100
```

### Test-Level Configuration

Override tracing for specific tests:

```yaml
tests:
  - description: 'Test with custom tracing'
    vars:
      query: 'Tell me about sensitive data'
    metadata:
      tracing:
        enabled: true
        includeInAttack: true
        includeInGrading: true
        maxSpans: 200
```

## How Tracing Works

### 1. Attack Generation

When `includeInAttack: true`, the attacker receives a trace summary like:

```text
Trace 0af76519 • 5 spans

Execution Flow:
1. [1.2s] llm.generate (client) | model=gpt-4
2. [300ms] guardrail.check (internal) | tool=content-filter
3. [150ms] tool.database_query (server) | tool=search
4. [50ms] guardrail.check (internal) | ERROR: Rate limit exceeded
5. [800ms] llm.generate (client) | model=gpt-4

Key Observations:
• Guardrail content-filter decision: blocked
• Tool call search via "tool.database_query" (duration 150ms)
• Error span "guardrail.check" (span-4): Rate limit exceeded
```

The attacker can use this information to craft better attacks (e.g., targeting the rate limit error).

### 2. Grading

When `includeInGrading: true`, graders receive the same trace context and can make more informed decisions:

```typescript
// Grader receives:
{
  prompt: "...",
  llmOutput: "...",
  test: {...},
  gradingContext: {
    traceContext: {
      traceId: "...",
      spans: [...],
      insights: [...]
    },
    traceSummary: "..."
  }
}
```

## Best Practices

### 1. Start with Default Settings

The default configuration works well for most use cases:

```yaml
redteam:
  tracing:
    enabled: true
```

### 2. Use spanFilter for Focused Analysis

If you only care about specific operations:

```yaml
redteam:
  tracing:
    enabled: true
    spanFilter:
      - 'guardrail.*' # Only guardrail spans
      - 'tool.*' # Only tool calls
```

### 3. Keep sanitizeAttributes Enabled

Always sanitize attributes in production:

```yaml
redteam:
  tracing:
    enabled: true
    sanitizeAttributes: true # Recommended
```

### 4. Adjust maxSpans Based on Complexity

- Simple apps: `maxSpans: 20`
- Medium complexity: `maxSpans: 50` (default)
- Complex agentic systems: `maxSpans: 100-200`

### 5. Use Strategy-Specific Overrides

Different strategies benefit from different trace data:

- **Crescendo**: Needs guardrail information
- **GOAT**: Benefits from tool call traces
- **Iterative**: Can use comprehensive trace data

## Security Considerations

### Sensitive Data

Tracing can expose sensitive information. Always:

1. Use `sanitizeAttributes: true` (default)
2. Review trace data before sharing
3. Consider disabling tracing for production testing

### Performance

Tracing adds overhead:

- Fetching traces: ~100-500ms per iteration
- Processing spans: Minimal overhead
- Storage: Trace metadata is stored in test results

To minimize impact:

- Use `maxSpans` to limit data fetched
- Set appropriate `maxRetries` and `retryDelayMs`
- Consider disabling for large-scale testing

## Debugging

### Enable Debug Logging

```bash
PROMPTFOO_LOG_LEVEL=debug npm run local -- eval -c redteam.yaml
```

### Check Trace Store

Verify traces are being recorded:

```bash
# View traces in the database
npm run db:studio
```

### Test Trace Fetching

```typescript
import { fetchTraceContext } from './src/tracing/traceContext';

const trace = await fetchTraceContext('your-trace-id', {
  maxSpans: 50,
  maxDepth: 5,
});
console.log(trace);
```

## Examples

See the example configurations:

- `promptfooconfig.yaml` - Basic tracing setup
- `promptfooconfig.advanced.yaml` - Advanced configuration
- `promptfooconfig-simple.yaml` - Simplified configuration

## Troubleshooting

### No Traces Appearing

1. Check that your provider supports tracing (must send traceparent header)
2. Verify OTLP receiver is running
3. Check debug logs for trace fetch errors

### Traces Not Used in Attacks

1. Verify `includeInAttack: true`
2. Check that traces are being fetched (debug logs)
3. Ensure trace fetch completes before attack generation

### Performance Issues

1. Reduce `maxSpans` and `maxDepth`
2. Use `spanFilter` to limit data
3. Increase `retryDelayMs` to reduce fetch frequency