# CTF Agent Evaluations This directory contains evaluation tests for the CTF agents using Google ADK's `AgentEvaluator`. ## Structure - `test_agents.py` - Main evaluation test file - `../agents/eval/` - Directory containing evaluation data files (`.evalset.json`) ## Running Evaluations Run all evaluation tests: ```bash uv run pytest ctf/tests/eval/ ``` Run a specific test: ```bash uv run pytest ctf/tests/eval/test_agents.py::test_sub_agents ``` ## Evaluation Data Format Evaluation data files (`.evalset.json`) should be placed in `ctf/agents/eval/`. Each file contains: - `eval_set_id`: Unique identifier for the evaluation set - `name`: Human-readable name - `eval_cases`: Array of test cases with: - `eval_id`: Unique case identifier - `conversation`: Array of conversation turns - `session_input`: Initial session state See `ctf/agents/eval/eval_level0_to_1_block.evalset.json` for an example. ## Configuration Evaluation criteria are configured in `ctf/agents/eval/adk_eval_config.json`: - `tool_trajectory_avg_score`: Score threshold for tool usage - `response_match_score`: Score threshold for response matching ## Adding New Evaluations 1. Create a new `.evalset.json` file in `ctf/agents/eval/` 2. Add a new test function in `test_agents.py` if needed 3. Run the evaluation to verify agent behavior