AI Is Making Offensive Security Faster. That's Not the Same as Better
AI pentesting tools are genuinely capable and improving fast. But organizations treating AI in offensive security as a tool evaluation problem — rather than a program design problem — are setting themselves up for incremental gains when transformational ones are possible.
AI Is Making Offensive Security Faster. That's Not the Same as Better.
Most organizations evaluating AI pentesting tools are asking the right question at the wrong level. Can this tool find vulnerabilities faster than a human team? In many cases, yes. But speed of discovery was rarely the binding constraint in an offensive security program. What happens after you find the vulnerability tends to get far less attention — and that's where most of the value is actually lost.
This isn't a critique of AI pentesting tools, which are genuinely capable and improving quickly. It's a more practical observation: organizations that approach AI in offensive security as a tool evaluation problem, rather than a program design problem, are setting themselves up for incremental gains when transformational ones are possible.
There are three places where that gap tends to show up.
The Program Didn't Change, Just the Scanner
When organizations evaluate AI pentesting tools, the benchmark is usually fidelity to the existing process. Does the AI find what our manual testers find? Does it produce a comparable report? Can it operate within our current scope and schedule?
These are reasonable evaluation criteria, but they're also a trap. They assume the existing process is the right one, and that the goal is to replicate it more efficiently. In practice, many penetration testing programs were designed around the constraints of manual effort. Testing happens quarterly or annually because continuous testing wasn't feasible. Scope is narrowed because broader coverage wasn't affordable. Findings are delivered in batches because that's how the workflow was structured.
AI changes those constraints significantly. Continuous, broad-scope testing becomes practical. The economics of coverage shift. But if the program design doesn't change with it, you've bought a faster engine for a process that was never designed to use the speed. The remediation pipeline, the escalation path, the relationship with engineering teams who receive findings, the governance around what gets fixed and when — all of that has to be rethought alongside the tooling, or the result is a more efficient bottleneck rather than a better program.
The organizations getting the most out of AI in offensive security aren't the ones that found the best tool. They're the ones that used the tool evaluation as a forcing function to ask harder questions about the program itself.
For AI Systems, Finding Bugs Isn't Enough
A separate and more urgent issue is how organizations are approaching security for the AI systems they're building and deploying internally. Here, the instinct to reach for a pentest as the primary security activity is understandable but insufficient.
Take prompt injection. If you've built an application that passes user-controlled input into a large language model, and that model has access to tools, internal data, or downstream systems, prompt injection is a structural property of the system — not a discrete vulnerability to find and fix. Testing will almost certainly surface it. But knowing you have it tells you much less than you need to know.
What matters is the context around it. What can the model access? What actions can it take? If an adversary successfully manipulates the model's behavior, what does the blast radius look like? Is it limited to the current session, or does it extend to persistent storage? Does it affect only the user, or can it reach other users' data, internal APIs, or privileged systems? Those answers determine whether prompt injection is a manageable risk or an existential one — and a pentest alone won't give them to you.
This is why threat modeling and secure design review deserve more attention than they're getting in most AI security programs. Not as a replacement for offensive testing, but as the activity that makes offensive testing meaningful. A design review maps trust boundaries, surfaces assumptions about what the system should and shouldn't be able to do, and produces the context needed to interpret findings correctly. Done early enough, it can also prevent the category of vulnerability that no amount of testing after the fact will fully resolve.
The security field spent years building the habit of "test, find, fix." AI systems don't break that habit, but they do require an earlier and more deliberate layer of analysis before it.
The Governance Gap
There's an irony worth naming in how organizations are approaching AI security tooling. Companies that are rightly concerned about adversarial use of AI are deploying AI security tools with very little governance structure around them.
AI pentesting platforms, AI SOC tools, AI-assisted patch management systems — these are consequential systems making semi-autonomous decisions about what constitutes a vulnerability, what warrants an alert, and in some cases what action to take. The trust being placed in these systems is real. The mechanisms to verify they're operating as intended are often thin or absent.
The governance questions here aren't complicated, but they do require deliberate answers. What is this tool authorized to do, and what is explicitly out of scope? What does correct operation look like, and how is it verified over time? Who is accountable when the tool produces a false negative that a team acts on? How do organizations detect when model performance has drifted or when the tool's behavior has changed in a way that affects security outcomes?
For regulated industries in particular, these questions aren't optional. The expectation from auditors and regulators is increasingly that organizations can demonstrate not just that they're using security tools, but that they understand how those tools work, what their limitations are, and how they're being overseen. Governance built after the fact tends to be more expensive and less effective than governance built from the start.
Putting It Together
None of this suggests slowing down on AI in offensive security. The opposite, actually. The gap between what's achievable with a well-designed AI-augmented offensive program and what most organizations are currently doing is significant — and the organizations moving thoughtfully are building real advantages in coverage, speed, and program maturity.
What it does suggest is that the work of adopting AI in offensive security isn't primarily a buying decision. It's a design decision. Getting the most out of these tools means being willing to revisit the program they're being dropped into, investing in the upstream analysis that makes their outputs actionable, and building the governance that lets you trust what they're telling you.
The tools are ready. The question is whether the programs around them are.
Ready to strengthen your security posture?
Let's discuss how Sidekick Security can help protect your organization.
Schedule a Consultation