# integration-strands-agents (Strands Agents SDK example)

This example demonstrates how to evaluate [Strands Agents SDK](https://github.com/strands-agents/sdk-python) with [promptfoo](https://promptfoo.dev).

[Strands Agents](https://strandsagents.com/) is an open-source AI agent framework developed by [AWS](https://github.com/strands-agents) that provides a model-driven approach to building AI agents.

You can run this example with:

```bash
npx promptfoo@latest init --example integration-strands-agents
cd integration-strands-agents
```

## Overview

This example showcases:

- Creating a [Strands agent](https://strandsagents.com/latest/user-guide/concepts/agents/) with custom tools
- Using the [`@tool` decorator](https://strandsagents.com/latest/user-guide/concepts/tools/python-tools/) to define agent capabilities
- Evaluating agent responses with various [promptfoo assertions](https://promptfoo.dev/docs/configuration/expected-outputs/)
- Testing tool usage with mock weather and temperature conversion tools

## Prerequisites

- Python 3.9+
- [OpenAI API key](https://platform.openai.com/api-keys) (default) or other supported provider

## Setup

### 1. Install Python dependencies

```bash
pip install -r requirements.txt
```

This installs:

- [`strands-agents[openai]`](https://pypi.org/project/strands-agents/) - The Strands Agents SDK with OpenAI support
- [`pydantic`](https://docs.pydantic.dev/) - Data validation library required by Strands

### 2. Set environment variables

```bash
export OPENAI_API_KEY=your-api-key-here
```

### Alternative: use Anthropic or Bedrock

[Strands supports multiple model providers](https://strandsagents.com/latest/user-guide/concepts/model-providers/). To use [Anthropic](https://www.anthropic.com/):

```bash
pip install 'strands-agents[anthropic]'
export ANTHROPIC_API_KEY=your-key
```

Then modify `agent.py` to use [`AnthropicModel`](https://strandsagents.com/latest/user-guide/concepts/model-providers/anthropic/) instead of [`OpenAIModel`](https://strandsagents.com/latest/user-guide/concepts/model-providers/openai/).

To use [Amazon Bedrock](https://strandsagents.com/latest/user-guide/concepts/model-providers/amazon-bedrock/):

```bash
pip install 'strands-agents[bedrock]'
```

## Running the example

```bash
# Run evaluation
npx promptfoo eval

# View results in the web UI
npx promptfoo view
```

## How it works

### Agent structure

The agent is defined in `agent.py` using the [Strands Agent class](https://strandsagents.com/latest/user-guide/concepts/agents/) with two tools:

- `get_weather`: Returns mock weather data for cities (New York, London, Tokyo, Paris, Seattle, San Francisco)
- `convert_temperature`: Converts temperatures between Fahrenheit and Celsius

Tools are defined using the [`@tool` decorator](https://strandsagents.com/latest/user-guide/concepts/tools/python-tools/) which automatically exposes them to the LLM based on their docstrings.

### Provider integration

`agent_provider.py` exposes a `call_api` function that [promptfoo's Python provider](https://promptfoo.dev/docs/providers/python/) calls to interact with the Strands agent.

### Test cases and assertion types

The [promptfoo config](https://promptfoo.dev/docs/configuration/guide/) includes 5 test cases that demonstrate different [assertion types](https://promptfoo.dev/docs/configuration/expected-outputs/):

| Test                                | Description                | Assertion types used                    |
| ----------------------------------- | -------------------------- | --------------------------------------- |
| Weather query for New York          | Basic tool usage           | `contains-any`, `llm-rubric`, `latency` |
| Weather query for London            | Verify temperature format  | `contains-any`, `javascript`, `latency` |
| Weather query for Tokyo             | Case-insensitive matching  | `icontains`, `javascript`, `latency`    |
| Weather with temperature conversion | Multi-tool chaining        | `llm-rubric`, `javascript`, `latency`   |
| Weather for unknown city            | Graceful fallback handling | `icontains`, `not-contains`, `latency`  |

#### Assertion types explained

- **[`latency`](https://promptfoo.dev/docs/configuration/expected-outputs/#latency)** - Ensures responses complete within 30 seconds (applied to all tests via `defaultTest`)
- **[`contains-any`](https://promptfoo.dev/docs/configuration/expected-outputs/#contains)** - Verifies the agent returns expected city names and weather data from the mock tool
- **[`icontains`](https://promptfoo.dev/docs/configuration/expected-outputs/#contains)** - Case-insensitive matching to verify city names appear regardless of formatting
- **[`not-contains`](https://promptfoo.dev/docs/configuration/expected-outputs/#not-contains)** - Ensures the agent handles unknown cities gracefully without error messages
- **[`javascript`](https://promptfoo.dev/docs/configuration/expected-outputs/#javascript)** - Validates temperature format (°F/°C symbols) and response length requirements
- **[`llm-rubric`](https://promptfoo.dev/docs/configuration/expected-outputs/model-graded/)** - Semantically evaluates whether the agent correctly chains weather lookup with temperature conversion