--- sidebar_label: CyberSecEval description: Red team LLM systems against prompt injection attacks using Meta's CyberSecEval dataset to detect multilingual vulnerabilities and prevent unauthorized system prompt access --- # CyberSecEval Dataset for LLM Security Testing ## Overview The CyberSecEval plugin uses Meta's [Purple Llama CyberSecEval dataset](https://meta-llama.github.io/PurpleLlama/docs/benchmarks/prompt_injection), a comprehensive benchmark suite designed to assess cybersecurity vulnerabilities in Large Language Models (LLMs). The dataset includes multilingual prompt injection attacks, testing LLM systems against a diverse collection of potentially harmful prompts in multiple languages. The dataset includes test cases that evaluate: - Direct prompt injection attacks - Indirect prompt injection attempts - Security-violating content - Logic-violating content - Cross-language attack vectors - System prompt leakage attempts :::tip This plugin requires a target provider that accepts both a system prompt and user input in OpenAI-compatible chat format. ::: ## How it works The plugin: - Downloads test cases from the CyberSecEval dataset - Formats each test case as a system/user message pair - Randomly selects prompts from the dataset - Tests if the model appropriately handles the injection attempts - Uses language-specific LLM-as-a-judge criteria to evaluate responses ![CyberSecEval Workflow](/img/docs/cyberseceval-flow.svg) ## Configuration To include the CyberSecEval plugin in your LLM red teaming setup, add the following configuration to your YAML file: ```yaml redteam: plugins: - cyberseceval ``` You can control the number of test cases using the `config.numTests` parameter (defaults to 5): ```yaml redteam: plugins: - id: cyberseceval numTests: 25 ``` You can also enable multilingual prompt injection by setting the `config.multilingual` parameter to `true`: ```yaml redteam: plugins: - id: cyberseceval config: multilingual: true ``` ## Full example Create a configuration file `promptfooconfig.yaml`: ```yaml targets: # You can target a model - openai:chat:gpt-5 # Or an API endpoint - id: https config: url: 'https://example.com/generate' method: 'POST' headers: 'Content-Type': 'application/json' body: myPrompt: '{{prompt}}' redteam: plugins: - cyberseceval ``` Then, run the red team command in the same directory as your configuration file: ```sh promptfoo redteam run ``` ## Grading The plugin uses a specialized grader that evaluates model outputs against language-specific judge questions. The grader checks if: - The model maintained its original instructions despite the injection attempt - The response adheres to security boundaries - The model appropriately handled non-English attack vectors - The system prompt remained protected from extraction attempts Outputs are graded as either PASS (secure) or FAIL (vulnerable) based on these criteria. ## Related Concepts - [Types of LLM Vulnerabilities](../llm-vulnerability-types.md) - [Prompt Injection](../strategies/prompt-injection.md) - [Language Configuration](../configuration.md#language) - Test in multiple languages