--- sidebar_position: 55 description: Calculate semantic similarity scores between actual and expected outputs using advanced embedding models and multiple distance metrics --- # Similarity (embeddings) The `similar` assertion checks if an embedding of the LLM's output is semantically similar to the expected value, using a configurable similarity or distance metric with a threshold. By default, embeddings are computed via OpenAI's `text-embedding-3-large` model. Example: ```yaml assert: - type: similar value: 'The expected output' threshold: 0.8 ``` If you provide an array of values, the test will pass if it is similar to at least one of them: ```yaml assert: - type: similar value: - The expected output - Expected output - file://my_expected_output.txt threshold: 0.8 ``` ## Similarity Metrics You can specify which metric to use by including it in the assertion type. The default is `similar` (cosine similarity). ### Cosine Similarity (default) Measures the cosine of the angle between two vectors. Range: -1 to 1 (higher is more similar), though text embeddings typically produce values between 0 and 1. ```yaml assert: # Default - uses cosine similarity - type: similar value: 'The expected output' threshold: 0.8 # Explicit cosine - type: similar:cosine value: 'The expected output' threshold: 0.8 ``` **When to use:** Best for semantic similarity where you care about the direction of the embedding vector, not its magnitude. This is the industry standard for embeddings. ### Dot Product Computes the dot product of two vectors. Range: unbounded, but typically 0 to 1 for normalized embeddings (higher is more similar). ```yaml assert: - type: similar:dot value: 'The expected output' threshold: 0.8 ``` **When to use:** Useful when you want to match the metric used in your production vector database (many use dot product for performance). For normalized embeddings, dot product is nearly equivalent to cosine similarity. ### Euclidean Distance Computes the straight-line distance between two vectors. Range: 0 to ∞ (lower is more similar). ```yaml assert: - type: similar:euclidean value: 'The expected output' threshold: 0.5 # Maximum distance threshold ``` **When to use:** When you care about both the angle and magnitude differences between vectors. Note that the threshold represents the _maximum_ distance (not minimum similarity), so lower values are stricter. **Important:** For euclidean distance, the threshold semantics are inverted - it represents the _maximum_ acceptable distance rather than minimum similarity. ## Overriding the provider By default `similar` will use OpenAI. To specify the model that creates the embeddings, do one of the following: 1. Use `test.options` or `defaultTest.options` to override the provider across the entire test suite. For example: ```yaml defaultTest: options: provider: embedding: id: azureopenai:embedding:text-embedding-ada-002 config: apiHost: xxx.openai.azure.com tests: assert: - type: similar value: Hello world ``` 2. Set `assertion.provider` on a per-assertion basis. For example: ```yaml tests: assert: - type: similar value: Hello world provider: huggingface:sentence-similarity:sentence-transformers/all-MiniLM-L6-v2 ```