---
sidebar_label: GOAT
title: GOAT Jailbreaking Strategy
description: Apply GOAT (Generative Offensive Agent Tester) for sophisticated multi-turn jailbreaking using adversarial agent dialogues
---

# GOAT Technique for Jailbreaking LLMs

The GOAT (Generative Offensive Agent Tester) strategy is an advanced automated red teaming technique that uses an "attacker" LLM to dynamically generate **multi-turn** conversations aimed at bypassing a target model's safety measures.

It was [introduced by Meta researchers](https://arxiv.org/abs/2410.01606) in 2024 and achieves high success rates against modern LLMs by simulating how real users interact with AI systems, with an Attack Success Rate (ASR@10) of 97% against Llama 3.1 and 88% against GPT-4-Turbo on the JailbreakBench dataset.

## Implementation

Use it by selecting it in the Strategies UI or by editing your config:

```yaml title="promptfooconfig.yaml"
strategies:
  - id: goat
    config:
      maxTurns: 5 # Maximum conversation turns (default)
      stateful: false # Sends the entire conversation history with each turn (Default)
```

:::info
If your system maintains a conversation history and only expects the latest message to be sent, set `stateful: true`. [Make sure to configure cookies or sessions in your provider as well.](/docs/providers/http/#server-side-session-management)
:::

## How It Works

![GOAT LLM attack](/img/docs/goat.svg)

GOAT uses an attacker LLM that engages in multi-turn conversations with a target model.

The attacker pursues multiple adversarial techniques: output manipulation, safe response distractors, and fictional scenarios. Unlike simpler approaches, GOAT adapts its strategy based on the target model's responses, similar to how human red teamers operate.

Each conversation turn follows a structured three-step reasoning process:

1. **Observation**: Analyzes the target model's previous response and identifies triggered safety mechanisms
2. **Strategic Planning**: Reflects on conversation progress and develops the next approach
3. **Attack Generation**: Selects and combines appropriate techniques to generate the next prompt

![GOAT attack flow](/img/docs/goat-attack-flow.svg)

This process is looped until we either achieve the goal or reach a maximum number of turns.

GOAT's effectiveness stems from its ability to simulate realistic user behavior while maintaining technical sophistication.

Instead of relying on brute-force approaches or static prompts, its dynamic conversation and reasoning make it particularly effective at identifying vulnerabilities in LLM applications.

## Related Concepts

- [Multi-turn Jailbreaks](multi-turn.md)
- [Tree-based Jailbreaks](tree.md)
- [Iterative Jailbreaks](iterative.md)
- [Red Team Strategies](/docs/red-team/strategies/) - Full strategy catalog