# provider-transformers-local (Fully Local LLM Evaluation) This example demonstrates a completely local LLM evaluation setup using Transformers.js - no API keys or external services required. ## Prerequisites Install the optional Transformers.js dependency: ```bash npm install @huggingface/transformers ``` ## Usage ```bash npx promptfoo@latest init --example provider-transformers-local cd provider-transformers-local npx promptfoo@latest eval ``` ## What This Example Shows - **Local text generation** with `onnx-community/Qwen3-0.6B-ONNX` (latest Qwen3 model with thinking capabilities) - **Local embeddings** with `Xenova/all-MiniLM-L6-v2` for similarity assertions - Fully offline evaluation after initial model download - No API keys needed ## Models Used | Model | Task | Size | Purpose | | -------------------------------- | --------------- | ------ | --------------------- | | `onnx-community/Qwen3-0.6B-ONNX` | Text Generation | ~600MB | Generate responses | | `Xenova/all-MiniLM-L6-v2` | Embeddings | ~23MB | Similarity assertions | ## First Run The first evaluation downloads both models (cached for subsequent runs): ```text Downloading Qwen3-0.6B-ONNX... ~600MB Downloading all-MiniLM-L6-v2... ~23MB ``` Subsequent runs use cached models and are much faster. ## Configuration Highlights ```yaml providers: - id: transformers:text-generation:onnx-community/Qwen3-0.6B-ONNX config: maxNewTokens: 100 temperature: 0.6 topP: 0.95 doSample: true defaultTest: options: provider: embedding: id: transformers:feature-extraction:Xenova/all-MiniLM-L6-v2 ``` ## Notes - Runs entirely on CPU by default - For faster inference, use `device: webgpu` if your system supports it - Use `dtype: q4` for smaller memory footprint with quantized models - Run with `-j 1` for systems with limited RAM