--- title: 'Building a Security Scanner for LLM Apps' description: 'We built a GitHub Action that scans pull requests for LLM-specific vulnerabilities. Learn why traditional security tools miss these issues and how we trace data flows to find prompt injection risks.' image: /img/blog/building-a-security-scanner-for-llm-apps/call-graph-io-flows.png featured: true keywords: [ LLM security, code scanning, prompt injection, AI security, GitHub Action, security scanner, agentic security, lethal trifecta, code review, ] date: 2025-12-16 authors: [dane] tags: [company-update, code-scanning, ai-security, prompt-injection] --- import styles from './building-a-security-scanner/styles/styles.module.css'; # Building a Security Scanner for LLM Apps We're adding something new to Promptfoo's suite of AI security products: code scanning for LLM-related vulnerabilities. In this post, I will: - Briefly introduce the new product - Explain why we think engineering teams need a scanner focused exclusively on interactions with LLMs and agents - Demonstrate the scanner in action on a few real-world CVEs ([click here to skip the background and jump straight to real examples](/blog/building-a-security-scanner-for-llm-apps#testing-on-real-cves)) While we see this as eventually evolving into at least a few distinct tools based on the same underlying scanning engine, the first incarnation is a [GitHub Action](https://www.promptfoo.dev/code-scanning/github-action/) that automatically reviews every new pull request for security issues related to LLMs and agents. To do this, it uses its own security-focused AI agents to examine the PR, tracing into the larger repo as needed to understand how new code fits in. If it finds any problems, it will comment on the specific lines of code that are concerning, explain the issue, and suggest a fix. It will also supply a prompt an engineer can send straight to an AI coding agent. Code Scan Action results on PR If it doesn't find anything, you'll get an emotionally satisfying `👍 All Clear`, plus a quick summary of what the scanner looked at. All clear comment ## Focus matters We've been using this scanner in our own repos for several weeks now, and it's already flagged some issues that might have slipped through otherwise. We use other automatic code review tools, and we also require that every PR is reviewed by a 100% human engineer. But in a number of the cases that the Promptfoo scanner has found an issue, it was the only reviewer, human or bot, which flagged that particular issue. I think one reason for this is, in a single word: _focus_. Because the scanner has a single job to do, and is designed to find a small set of specific problematic patterns, it's more effective at finding those patterns than either a human or LLM that's doing a more general review. Another reason is that an effective strategy for finding the most common and severe vulnerabilities in LLM apps definitely would _not_ be an effective strategy for a general review tool, even a general security review tool. ## What makes LLM apps different LLM vulnerabilities related to prompt injection

Let's get into why it's valuable to have a security scanner focused just on LLMs. I'd argue that the worst LLM vulnerabilities which are relevant to a _code scanner_ fit into basically three categories: 1. Sensitive information disclosure 2. Jailbreak risk 3. Prompt injection There are a lot of more specific vulnerabilities (check out the [OWASP top 10 for LLM Applications](https://genai.owasp.org/resource/owasp-top-10-for-llm-applications-2025/) for examples), but most of these either: - Fall under one of the three categories above. - Data poisoning, embedding weaknesses, improper output handling, and excessive agency can all be viewed as vectors for prompt injection. - System prompt leakage falls under both sensitive information disclosure and jailbreak risk. - Are out of scope for code scanning. - Supply chain vulnerabilities, model poisoning, and misinformation are problems in the model layer, not the code, though I should note that Promptfoo does also offer [model scanning](https://www.promptfoo.dev/model-security/). - Unbounded consumption is difficult to judge from the code alone, as it depends on implicit assumptions for what kind of token usage is acceptable. Also, model providers have their own `maxTokens` limits and rate limits. Even sensitive information disclosure, which is a legitimate LLM-specific vulnerability class in its own right, is _most_ concerning when it coincides with prompt injection or jailbreak risk. Disclosing sensitive information to an LLM provider, while not ideal, is typically not directly exploitable, and is often an intentional tradeoff that developers make for the sake of building a useful app. So now I'd say that we can whittle the list down to two underlying areas of concern: 1. Jailbreak risk 2. Prompt injection Jailbreak risk is definitely a major concern for LLM apps, but tends to have an interesting quality: it's bimodal in terms of how easy it is to detect. It's either fairly obvious, as in cases where a developer tries to use the system prompt for authorization or access control instead of deterministic checks. Or it's quite difficult, as in cases where many different attack styles need to be tried, or complex conversation state needs to be built up before a jailbreak succeeds. The obvious cases are definitely relevant to code scanning, and Promptfoo's code scanner certainly does look for them as it scans prompts. But exactly because they're obvious, they have less influence on the overall design of the scanner. As long as we are scanning prompts and identifying these kinds of issues, they will be flagged. Obvious jailbreak vectors are important to catch, but they aren't the hard part. For the non-obvious cases, code scanning just isn't the right modality. You need [red teaming](/docs/red-team/) to simulate a wide variety of attacks (often involving many steps and complex state). That leaves prompt injection. The big kahuna. Nearly everything that can go terribly wrong in an LLM app from a security perspective is upstream of it, downstream of it, or somehow connected. It can also be very difficult to detect, and very easy for developers, including experienced, security-conscious developers, to accidentally introduce into an LLM-based system. ## The lethal trifecta (and deadly duo) The lethal trifecta Sadly, developers building apps on top of LLMs are constantly faced with an uncomfortable truth: there is a deep tension between making LLM apps secure and making them compelling and useful as products. To understand why, I'm just going to quote directly from Simon Willison's famous post, [The Lethal Trifecta](https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/). > The lethal trifecta of capabilities is: > > - Access to your private data—one of the most common purposes of tools in the first place! > - Exposure to untrusted content—any mechanism by which text (or images) controlled by a malicious attacker could become available to your LLM > - The ability to externally communicate in a way that could be used to steal your data He then says: > If your agent combines these three features, an attacker can easily trick it into accessing your private data and sending it to that attacker. Now think about the AI products you use regularly. Or go through a list of the last hundred AI startups to raise seed rounds. How many examples can you find that _don't_ include all three of these, in some combination? If an LLM app: - Has access to private data (including user prompts themselves, ala ChatGPT or Claude Code) - Can load random web pages (again, both ChatGPT and Claude Code can do this) It already has the lethal trifecta, because loading web pages covers both "exposure to untrusted content" _and_ "the ability to externally communicate" (arbitrary data can be passed through URL paths or query parameters). We can also expand on "the ability to externally communicate". As Simon pointed out in [another more recent post](https://simonwillison.net/2025/Nov/2/new-prompt-injection-papers/): > The one problem with the lethal trifecta is that it only covers the risk of data exfiltration: there are plenty of other, even nastier risks that arise from prompt injection attacks against LLM-powered agents with access to tools which the lethal trifecta doesn't cover. The point being, that instead of just being capable of "communicating" and sending private data somewhere, agents with the right tools and permissions can make destructive SQL queries, compromise systems, empty [crypto wallets](https://red.anthropic.com/2025/smart-contracts/), and a lot more. The label I'd use is **privileged actions**. And what's more, you don't even need all three of these to have a problem. Just two can be enough to have an issue: a deadly duo, if you will. Exposure to untrusted content + privileged actions is enough to create a vulnerability even without access to private data. And access to private data + external communication or privileged actions _can_ be enough to create a vulnerability if there's any gap in access control. ## Laundering traditional injection vulnerabilities Vulnerability laundering

If you have experience with traditional web app security, you're probably quite familiar with injection vulnerabilities. There are a lot of variations: - [SQL injection](https://xkcd.com/327/) (aka Database Injection, since it doesn't only apply to SQL databases) - Command injection (aka Shell Injection, one of the most dangerous) - Script injection (aka XSS—cross-site scripting) - Path injection (manipulating file paths) - And many others All these follow basically the same pattern. 1. The app receives untrusted input. 2. The app takes some privileged action (there's that term again) which includes the _unsanitized_ input from step 1. In SQL injection, step 2 is an SQL query. In command injection, it's a shell command. In XSS, it's rendering HTML in a browser. Injection vulnerabilities are nasty, and they used to be extremely common. But fortunately, modern database libraries now have built-in protection against SQL injection, and web libraries have built-in protection against XSS. On top of that, awareness about injection vulnerabilities is generally high in the developer community, and addressing them is a fairly simple matter: you just need to sanitize the untrusted input before passing it to a privileged action. User input sent into SQL queries has any SQL keywords escaped. User input sent to a shell command is wrapped in single quotes. `