# provider-amazon-sagemaker (Amazon SageMaker AI Provider) This example demonstrates how to evaluate models deployed on Amazon SageMaker AI endpoints using promptfoo. You can run this example with: ```bash npx promptfoo@latest init --example provider-amazon-sagemaker cd provider-amazon-sagemaker ``` ## Purpose This example shows how to: - Connect to and evaluate models deployed on Amazon SageMaker AI endpoints - Configure various model types (OpenAI, Anthropic, Llama, Mistral) running on SageMaker AI - Compare performance between different SageMaker AI -hosted models - Use transform functions to format prompts for specific model requirements - Work with embeddings models on SageMaker ## Prerequisites 1. AWS account with SageMaker AI access 2. Deployed SageMaker AI endpoints with your models 3. AWS credentials configured locally 4. Required npm packages: ```bash npm install -g @aws-sdk/client-sagemaker-runtime ``` ## Environment Variables This example requires the following environment variables: - `AWS_ACCESS_KEY_ID` - Your AWS access key - `AWS_SECRET_ACCESS_KEY` - Your AWS secret key - `AWS_REGION` - Optional, can also be specified in the configuration You can set these in a `.env` file or directly in your environment. ## Example Configurations This example includes multiple configuration files demonstrating different SageMaker integration patterns: - **promptfooconfig.openai.yaml**: OpenAI-compatible models on SageMaker - **promptfooconfig.jumpstart.yaml**: AWS JumpStart foundation models - **promptfooconfig.llama.yaml**: Llama 3.2 models on SageMaker JumpStart - **promptfooconfig.mistral.yaml**: Mistral 7B v3 models on SageMaker (Hugging Face) - **promptfooconfig.llama-vs-mistral.yaml**: Comparison between Llama and Mistral models - **promptfooconfig.embedding.yaml**: Embedding models on SageMaker - **promptfooconfig.multimodel.yaml**: Multiple model types on SageMaker - **promptfooconfig.transform.yaml**: Transform functions for SageMaker endpoints ## Running the Examples 1. Replace the endpoint names in the configuration files with your actual SageMaker endpoints 2. Run the evaluation using promptfoo: ```bash # Run a specific configuration promptfoo eval -c promptfooconfig.jumpstart.yaml ``` ## Testing Your Setup This directory includes a test script to validate your SageMaker AI endpoint configuration before running a full evaluation: ```bash # Basic test for an OpenAI-compatible endpoint node test-sagemaker-provider.js --endpoint=my-endpoint --model-type=openai # Test with an embedding endpoint node test-sagemaker-provider.js --endpoint=my-embedding-endpoint --embedding=true # Test with transforms node test-sagemaker-provider.js --endpoint=my-endpoint --model-type=llama --transform=true # Test with a custom transform file node test-sagemaker-provider.js --endpoint=my-endpoint --transform=true --transform-file=transform.js ``` ## Transform Functions The SageMaker provider supports transforming prompts before they're sent to the endpoint, which is particularly useful for formatting prompts according to specific model requirements. ### Inline Transform ```yaml providers: - id: sagemaker:llama:your-endpoint config: region: us-west-2 modelType: llama # Apply an inline transform transform: | return `[INST] ${prompt} [/INST]`; ``` ### File-Based Transform This example includes a sample transform file (`transform.js`) that shows how to create reusable transformations: ```yaml providers: - id: sagemaker:jumpstart:your-endpoint config: region: us-west-2 modelType: jumpstart # Reference an external transform file transform: file://transform.js ``` The transform function receives the prompt and a context object containing the provider configuration: ```javascript module.exports = function (prompt, context) { // Access config values const maxTokens = context.config?.maxTokens || 256; // Return transformed input return { inputs: prompt, parameters: { max_new_tokens: maxTokens, temperature: 0.7, }, }; }; ``` ## JumpStart Models JumpStart models require a specific input/output format. The provider handles this automatically when `modelType: jumpstart` is specified: ```yaml providers: - id: sagemaker:jumpstart:your-jumpstart-endpoint config: region: us-west-2 modelType: jumpstart maxTokens: 256 responseFormat: path: 'json.generated_text' ``` ## Rate Limiting with Delays For better rate limiting with SageMaker endpoints, you can add delays between API calls: ```yaml providers: - id: sagemaker:your-endpoint config: region: us-west-2 delay: 500 # Add a 500ms delay between API calls ``` ## Expected Results After running the evaluation, you should expect to see: 1. A comparison of responses from your SageMaker endpoints across different prompts 2. Performance metrics for each endpoint and prompt combination 3. Any errors or issues with specific endpoints or configurations ## Troubleshooting ### "Batch inference failed" Errors If you encounter "Batch inference failed" errors: 1. Add a `delay` parameter (at least 500ms recommended) 2. Verify you're using the correct `modelType` for your endpoint: - For Llama models: Use `modelType: jumpstart` - For Mistral models: Use `modelType: huggingface` 3. Ensure you've specified the correct `contentType` and `acceptType` as "application/json" 4. Check that your endpoint is active and functioning in the SageMaker console ### Response Format Issues If you're getting unusual responses or missing output: 1. Make sure you're using the correct JavaScript expression for your model type: - For Llama models (JumpStart): Use `responseFormat.path: "json.generated_text"` - For Mistral models (Hugging Face): Use `responseFormat.path: "json[0].generated_text"` ### Transform Issues If transforms aren't working correctly: 1. Check that your transform function returns a valid string or object 2. For file-based transforms, verify the file path is correct and the file is accessible 3. Use the test script with `--transform=true` to debug transform behavior ### Rate Limiting If you're still experiencing errors even with the correct configuration: 1. Increase the delay between requests (try 1000ms or higher) 2. Run fewer tests in parallel 3. Monitor your endpoint metrics in the SageMaker console