# provider-transformers-local (Fully Local LLM Evaluation)

This example demonstrates a completely local LLM evaluation setup using Transformers.js - no API keys or external services required.

## Prerequisites

Install the optional Transformers.js dependency:

```bash
npm install @huggingface/transformers
```

## Usage

```bash
npx promptfoo@latest init --example provider-transformers-local
cd provider-transformers-local
npx promptfoo@latest eval
```

## What This Example Shows

- **Local text generation** with `onnx-community/Qwen3-0.6B-ONNX` (latest Qwen3 model with thinking capabilities)
- **Local embeddings** with `Xenova/all-MiniLM-L6-v2` for similarity assertions
- Fully offline evaluation after initial model download
- No API keys needed

## Models Used

| Model                            | Task            | Size   | Purpose               |
| -------------------------------- | --------------- | ------ | --------------------- |
| `onnx-community/Qwen3-0.6B-ONNX` | Text Generation | ~600MB | Generate responses    |
| `Xenova/all-MiniLM-L6-v2`        | Embeddings      | ~23MB  | Similarity assertions |

## First Run

The first evaluation downloads both models (cached for subsequent runs):

```text
Downloading Qwen3-0.6B-ONNX... ~600MB
Downloading all-MiniLM-L6-v2... ~23MB
```

Subsequent runs use cached models and are much faster.

## Configuration Highlights

```yaml
providers:
  - id: transformers:text-generation:onnx-community/Qwen3-0.6B-ONNX
    config:
      maxNewTokens: 100
      temperature: 0.6
      topP: 0.95
      doSample: true

defaultTest:
  options:
    provider:
      embedding:
        id: transformers:feature-extraction:Xenova/all-MiniLM-L6-v2
```

## Notes

- Runs entirely on CPU by default
- For faster inference, use `device: webgpu` if your system supports it
- Use `dtype: q4` for smaller memory footprint with quantized models
- Run with `-j 1` for systems with limited RAM