# provider-model-armor (Google Cloud Model Armor)

This directory contains examples for testing Google Cloud Model Armor with Promptfoo.

You can run this example with:

```bash
npx promptfoo@latest init --example provider-model-armor
cd provider-model-armor
```

Model Armor is a managed service that screens LLM prompts and responses for:

- **Responsible AI (RAI)**: Hate speech, harassment, sexually explicit, dangerous content
- **CSAM**: Child safety content detection (always enabled)
- **Prompt Injection & Jailbreak**: Detects manipulation attempts
- **Malicious URLs**: Phishing and threat detection
- **Sensitive Data Protection (SDP)**: Credit cards, SSNs, API keys, etc.

## Prerequisites

1. **Enable Model Armor API**:

   ```bash
   gcloud services enable modelarmor.googleapis.com --project=YOUR_PROJECT_ID
   ```

2. **Grant IAM Permissions** (for Vertex AI integration):

   ```bash
   PROJECT_NUMBER=$(gcloud projects describe YOUR_PROJECT_ID --format="value(projectNumber)")
   gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
     --member="serviceAccount:service-${PROJECT_NUMBER}@gcp-sa-aiplatform.iam.gserviceaccount.com" \
     --role="roles/modelarmor.user"
   ```

3. **Set the regional API endpoint** (for direct API testing):

   ```bash
   gcloud config set api_endpoint_overrides/modelarmor \
     "https://modelarmor.us-central1.rep.googleapis.com/"
   ```

4. **Create a Model Armor template**:

   ```bash
   gcloud model-armor templates create basic-safety \
     --location=us-central1 \
     --rai-settings-filters='[{"filterType":"HATE_SPEECH","confidenceLevel":"MEDIUM_AND_ABOVE"},{"filterType":"HARASSMENT","confidenceLevel":"MEDIUM_AND_ABOVE"},{"filterType":"DANGEROUS","confidenceLevel":"MEDIUM_AND_ABOVE"},{"filterType":"SEXUALLY_EXPLICIT","confidenceLevel":"MEDIUM_AND_ABOVE"}]' \
     --pi-and-jailbreak-filter-settings-enforcement=enabled \
     --pi-and-jailbreak-filter-settings-confidence-level=medium-and-above \
     --malicious-uri-filter-settings-enforcement=enabled \
     --basic-config-filter-enforcement=enabled
   ```

5. **Set environment variables** (for direct API testing):

   ```bash
   export GOOGLE_PROJECT_ID=your-project-id
   export MODEL_ARMOR_LOCATION=us-central1
   export MODEL_ARMOR_TEMPLATE=basic-safety
   export GCLOUD_ACCESS_TOKEN=$(gcloud auth print-access-token)
   ```

   Note: Access tokens expire after 1 hour. For CI/CD, use service account keys or Workload Identity Federation.

## Examples

### 1. Direct Model Armor API Testing

Test Model Armor's sanitization API directly using the HTTP provider:

```bash
promptfoo eval -c promptfooconfig.yaml
```

This example:

- Calls the `sanitizeUserPrompt` API directly
- Maps filter results to Promptfoo's guardrails format
- Tests both benign and adversarial prompts

### 2. Vertex AI with Model Armor Integration

Test Gemini models with Model Armor templates:

```bash
promptfoo eval -c promptfooconfig.vertex.yaml
```

This example:

- Uses Vertex AI's native Model Armor integration
- Compares models with and without Model Armor enabled
- Uses the `guardrails` and `not-guardrails` assertion types

## Configuration Files

- `promptfooconfig.yaml` - Direct Model Armor API testing (recommended for detailed filter results)
- `promptfooconfig.vertex.yaml` - Vertex AI integration with Model Armor (recommended for production-like testing)
- `transforms/sanitize-response.js` - Response transformer for the sanitization API
- `datasets/model-armor-test.csv` - Test dataset with prompts for each filter type

### Using the Dataset

The included CSV dataset contains test prompts for each Model Armor filter type. Load it in your config:

```yaml
tests: file://datasets/model-armor-test.csv
```

Each row includes a prompt and expected behavior (benign vs. adversarial).

## Understanding Results

When Model Armor blocks content, you'll see:

- `guardrails.flagged: true` - Content was flagged
- `guardrails.flaggedInput: true` - The input prompt was blocked
- `guardrails.flaggedOutput: true` - The generated response was blocked
- `guardrails.reason` - Detailed explanation of which filters matched

For debugging, inspect the raw Model Armor response in `metadata.modelArmor`, which contains the full `sanitizationResult` including individual filter states and confidence levels.

Use `not-guardrails` to verify dangerous prompts get caught - the test passes when content is blocked, fails when it slips through.

## Cleanup

After testing, you can delete the Model Armor template if no longer needed:

```bash
gcloud model-armor templates delete basic-safety --location=us-central1
```

## Learn More

- [Model Armor Overview](https://cloud.google.com/security-command-center/docs/model-armor-overview)
- [Promptfoo Guardrails Documentation](https://www.promptfoo.dev/docs/configuration/expected-outputs/guardrails/)
- [Testing Guardrails Guide](https://www.promptfoo.dev/docs/guides/testing-guardrails/)