# Red Team Testing
**What this is:** Adversarial/security testing framework to identify LLM application vulnerabilities through automated attacks.
## Architecture
```
src/redteam/
├── plugins/ # Vulnerability-specific test generators
│ ├── pii.ts # PII leakage detection
│ ├── harmful.ts # Harmful content generation
│ ├── sql-injection.ts # SQL injection attempts
│ └── ...
├── strategies/ # Attack transformation techniques
│ ├── jailbreak.ts # Guardrail bypass attempts
│ ├── prompt-injection.ts
│ ├── base64.ts # Obfuscation strategies
│ └── ...
└── graders.ts # Response evaluation logic
```
## Key Concepts
**Plugins** generate test cases for specific vulnerability types (what to test).
**Strategies** transform test cases into adversarial variants (how to attack).
**Graders** evaluate if attacks succeeded (did it work?).
## Plugin vs Strategy
```yaml
plugins:
- pii # Generate PII leakage tests
- harmful # Generate harmful content tests
strategies:
- jailbreak # Apply jailbreak techniques to ALL tests
- base64 # Obfuscate with base64 encoding
```
One plugin can be tested with multiple strategies for comprehensive coverage.
## Generation and Strategy QA
When changing redteam generation, trace the full path before editing: `src/redteam/commands/generate.ts`, `src/redteam/index.ts`, `src/redteam/plugins/index.ts`, plugin `generateTests`, strategies, and iterative providers such as meta, hydra, and crescendo.
Evaluate generated cases for diversity, realism, coverage, and failure modes. For agent redteams, include coding-agent risks, connectors, sandboxing, traces, raw provider events, changed files, canaries, and sidecar evidence where available.
## Public Documentation
Redteam behavior is user-facing. When changing plugins, strategies, generated config,
grading, or reports, update the matching pages under `site/docs/red-team/`.
## Logging
See `docs/logging.md` - especially important here since test content may contain harmful/sensitive data.
## Adding New Plugins
1. Implement `RedteamPluginObject` interface
2. Generate targeted test cases for vulnerability
3. Include assertions defining failure conditions
4. Add tests in `test/redteam/`
See `src/redteam/plugins/pii.ts` for reference pattern.
## Plugin/Grader Standards
**CRITICAL:** All graders must use standardized tags per `.claude/skills/redteam-plugin-development/skill.md`
Quick reference:
- User prompt: `{{prompt}}` (NOT ``, ``, or ``)
- Purpose: `{{purpose}}`
- Entities: `` with `` children
See `src/redteam/plugins/harmful/graders.ts` for reference implementation.
## Risk Scoring
Results include severity levels:
- `critical` - PII leaks, SQL injection
- `high` - Jailbreaks, prompt injection, harmful content
- `medium` - Bias, hallucination
- `low` - Overreliance