---
displayed_sidebar: promptfoo
sidebar_label: HuggingFace Datasets
title: Loading Test Cases from HuggingFace Datasets
description: Load HuggingFace datasets directly for LLM evaluation with automatic splits, filtering, and format conversion capabilities
keywords:
  [
    huggingface datasets,
    test cases,
    dataset integration,
    promptfoo datasets,
    ml evaluation,
    dataset import,
    existing datasets,
  ]
pagination_prev: configuration/datasets
pagination_next: configuration/scenarios
---

# HuggingFace Datasets

Promptfoo can import test cases directly from [HuggingFace datasets](https://huggingface.co/docs/datasets) using the `huggingface://datasets/` prefix.

## Basic usage

To load an entire dataset:

```yaml
tests: huggingface://datasets/fka/awesome-chatgpt-prompts
```

Run the evaluation:

```bash
npx promptfoo eval
```

Each dataset row becomes a test case with all dataset fields available as variables.

## Dataset splits

Load specific portions of datasets using query parameters:

```yaml
# Load from training split
tests: huggingface://datasets/fka/awesome-chatgpt-prompts?split=train

# Load from validation split with custom configuration
tests: huggingface://datasets/fka/awesome-chatgpt-prompts?split=validation&config=custom
```

## Use dataset fields in prompts

Dataset fields automatically become prompt variables. Here's how:

```yaml title="promptfooconfig.yaml"
prompts:
  - "Question: {{question}}\nAnswer:"

tests: huggingface://datasets/rajpurkar/squad
```

## Query parameters

| Parameter | Description                                   | Default     |
| --------- | --------------------------------------------- | ----------- |
| `split`   | Dataset split to load (train/test/validation) | `test`      |
| `config`  | Dataset configuration name                    | `default`   |
| `subset`  | Dataset subset (for multi-subset datasets)    | `none`      |
| `limit`   | Maximum number of test cases to load          | `unlimited` |

The loader accepts any parameter supported by the [HuggingFace Datasets API](https://huggingface.co/docs/datasets-server/api_reference#get-apirows). Additional parameters beyond these common ones are passed directly to the API.

To limit the number of test cases:

```yaml
tests: huggingface://datasets/fka/awesome-chatgpt-prompts?split=train&limit=50
```

To load a specific subset (common with MMLU datasets):

```yaml
tests: huggingface://datasets/cais/mmlu?split=test&subset=physics&limit=10
```

## Authentication

For private datasets or increased rate limits, authenticate using your HuggingFace token. Set one of these environment variables:

```bash
# Any of these environment variables will work:
export HF_TOKEN=your_token_here
export HF_API_TOKEN=your_token_here
export HUGGING_FACE_HUB_TOKEN=your_token_here
```

:::info
Authentication is required for private datasets and gated models. For public datasets, authentication is optional but provides higher rate limits.
:::

## Implementation details

- Each dataset row becomes a test case
- All dataset fields are available as prompt variables
- Large datasets are automatically paginated (100 rows per request)
- Variable expansion is disabled to preserve original data

## Example configurations

### Basic chatbot evaluation

```yaml title="promptfooconfig.yaml"
description: Testing with HuggingFace dataset

prompts:
  - 'Act as {{act}}. {{prompt}}'

providers:
  - openai:gpt-5-mini

tests: huggingface://datasets/fka/awesome-chatgpt-prompts?split=train
```

### Question answering with limits

```yaml title="promptfooconfig.yaml"
description: SQUAD evaluation with authentication

prompts:
  - 'Question: {{question}}\nContext: {{context}}\nAnswer:'

providers:
  - openai:gpt-5-mini

tests: huggingface://datasets/rajpurkar/squad?split=validation&limit=100

env:
  HF_TOKEN: your_token_here
```

## Example projects

| Example                                                                                                           | Use Case          | Key Features         |
| ----------------------------------------------------------------------------------------------------------------- | ----------------- | -------------------- |
| [Basic Setup](https://github.com/promptfoo/promptfoo/tree/main/examples/huggingface/dataset)                      | Simple evaluation | Default parameters   |
| [MMLU-Pro Comparison](https://github.com/promptfoo/promptfoo/tree/main/examples/compare-gpt-model-tiers-mmlu-pro) | Query parameters  | Split, config, limit |
| [Red Team Safety](https://github.com/promptfoo/promptfoo/tree/main/examples/redteam-beavertails)                  | Safety testing    | BeaverTails dataset  |

## Troubleshooting

### Authentication errors

Ensure your HuggingFace token is set correctly: `export HF_TOKEN=your_token`

### Dataset not found

Verify the dataset path format: `owner/repo` (e.g., `rajpurkar/squad`)

### Empty results

Check that the specified split exists for the dataset. Try `split=train` if `split=test` returns no results.

### Performance issues

Add the `limit` parameter to reduce the number of rows loaded: `&limit=100`

## See Also

- [Test Case Configuration](/docs/configuration/test-cases) - Complete guide to configuring test cases
- [HuggingFace Provider](/docs/providers/huggingface) - Using HuggingFace models for inference
- [CSV Test Cases](/docs/configuration/test-cases#csv-format) - Loading test cases from CSV files
- [Red Team Configuration](/docs/red-team/configuration) - Using datasets in red team evaluations