---
sidebar_label: Best-of-N
title: Best-of-N Jailbreaking Strategy
description: Execute black-box jailbreaking by testing multiple prompt variations simultaneously to maximize bypass success probability
---

# Best-of-N (BoN) Jailbreaking Strategy

Best-of-N (BoN) is a simple but effective black-box jailbreaking algorithm that works by repeatedly sampling variations of a prompt with modality-specific augmentations until a harmful response is elicited.

Introduced by [Hughes et al. (2024)](https://arxiv.org/abs/2412.03556), it achieves high attack success rates across text, vision, and audio modalities.

:::tip
While this technique achieves high attack success rates - 89% on GPT-4o and 78% on Claude 3.5 Sonnet - it generally requires a very large number of samples to achieve this.
:::

Use it like so in your `promptfooconfig.yaml`:

```yaml title="promptfooconfig.yaml"
strategies:
  - id: best-of-n
    config:
      useBasicRefusal: false
      maxConcurrency: 3 # Maximum concurrent API calls (default)
      nSteps: 10000 # Maximum number of attempts (optional)
      maxCandidatesPerStep: 1 # Maximum candidates per batch (optional)
```

## How It Works

![BoN Overview](/img/docs/best-of-n-cycle.svg)

BoN Jailbreaking works through a simple three-step process:

1. **Generate Variations**: Creates multiple versions of the input prompt using modality-specific augmentations:
   - Text: Random capitalization, character scrambling, character noising
   - Vision: Font variations, background colors, text positioning
   - Audio: Speed, pitch, volume, background noise modifications

2. **Concurrent Testing**: Tests multiple variations simultaneously against the target model

3. **Success Detection**: Monitors responses until a harmful output is detected or the maximum attempts are reached

The strategy's effectiveness comes from exploiting the stochastic nature of LLM outputs and their sensitivity to small input variations.

## Configuration Parameters

### useBasicRefusal

**Type:** `boolean`  
**Default:** `false`

When enabled, uses a simple refusal check instead of LLM-as-a-judge assertions. This is much faster and cheaper than using an LLM judge, making it ideal for testing when the typical response of an LLM to a prompt is a refusal.

We recommend using this setting whenever possible if the default response to your original prompts is a "Sorry, I can't do that"-style refusal.

### maxConcurrency

**Type:** `number`  
**Default:** `3`

Maximum number of prompt variations to test simultaneously. Higher values increase throughput. We recommend setting this as high as your rate limits allow.

### nSteps

**Type:** `number`  
**Default:** `undefined`

Maximum number of total attempts before giving up. Each step generates `maxCandidatesPerStep` variations. Higher values increase success rate but also cost. The original paper achieved best results with 10,000 steps.

### maxCandidatesPerStep

**Type:** `number`  
**Default:** `1`

Number of prompt variations to generate in each batch. Lower values provide more fine-grained control, while higher values are more efficient but may waste API calls if a successful variation is found early in the batch.

Usually best to set this to `1` and increase `nSteps` until you get a successful jailbreak.

:::tip
For initial testing, we recommend starting with `useBasicRefusal: true` and relatively low values for `nSteps` and `maxCandidatesPerStep`. This allows you to quickly validate the strategy's effectiveness for your use case before scaling up to more comprehensive testing.
:::

## Performance

BoN achieves impressive attack success rates across different models and modalities:

- Text: 89% on GPT-4, 78% on Claude 3.5 Sonnet (10,000 samples)
- Vision: 56% on GPT-4 Vision
- Audio: 72% on GPT-4 Audio

The attack success rate follows a power-law scaling with the number of samples, meaning it reliably improves as more variations are tested. This illustrates why [ASR comparisons must account for attempt budget](/blog/asr-not-portable-metric): a 1% per-attempt method becomes 98% with 392 tries.

## Key Features

- **Simple Implementation**: No need for gradients or model internals
- **Multi-modal Support**: Works across text, vision, and audio inputs
- **Highly Parallelizable**: Can test multiple variations concurrently
- **Predictable Scaling**: Success rate follows power-law behavior

## Related Concepts

- [GOAT Strategy](goat.md)
- [Iterative Jailbreaks](iterative.md)
- [Multi-turn Jailbreaks](multi-turn.md)
- [Best of N configuration example](https://github.com/promptfoo/promptfoo/tree/main/examples/redteam-bestOfN-strategy)
- [Red Team Strategies](/docs/red-team/strategies/) - Full strategy catalog