How to defend an exploding AI attack surface when the attackers haven't shown up (yet)
The dilemma in AI agent security in 2026 is that organizational attack surface is expanding at comic pace but, at least as of January, the attackers haven’t yet shown up (at least at the level of serious attackers achieving serious objectives a la ransomware or nation state level compromise of a network via AI native risk vectors like prompt injection), which means we lack empirical signals to tune / train defenses, and help prioritize where leaders should pay the price in friction and resources to reduce security risk.
Again, even if we see some petty / script kiddie level attacks jailbreaking models, or some examples of attackers executing traditional kill chains with help from AI, real ransomware criminals and nation-states don’t commonly use prompt injections and model poisoning -- attacks on models themselves -- to achieve real goals (yet).
We also lack catastrophic, unambiguous horror stories where AI coding tools have quietly injected zero click exploitable vulnerabilities into scaled production codebases (yet).
The result is a tension between a discourse where attacks are identified via red teaming and the reality that we have little knowledge of which attacks will be practical for and actually exploited by real attackers. Meanwhile organizational leaders often feel we shouldn't let speculative-sounding security concerns slow down AI adoption.
This presents a kind of AI security risk overhang; an exponentially growing attack surface is not being attacked yet, leaving security folks in a tough spot within our organizations and with respect to how we prioritize burning down this growing mass of poorly understood risk.
Imagine having to set up strategic defenses for a magic, porous fortress whose surface area keeps expanding when you haven’t met or seen the enemy yet and the king keeps wondering why the defenses are so costly and whether enemy is a very big deal.
Dialable controls as the thing to build as fast as possible now
We should build dialable controls that are lightweight and low friction now but which can be easily dialed to be more severe, and higher-friction, as risks materialize, and as a function of how they materialize.
Here are some examples of what I mean:
Dialable deterministic controls
Imagine a highly configurable sandbox for a coding agent. At low settings, users can skip permissions more freely because the sandbox makes that choice safer by default. At high settings, you get hard boundaries around sensitive operations, explicit restrictions on destructive actions, and non-negotiable ingress and egress constraints. The point is not that one mode is universally correct. The point is that the sandbox supports both without reimplementation, and lets leaders tighten it as real attacks on AI agents materialize.
Dialable alignment controls
It is fashionable to dismiss prompt injection classifiers, input validation, and fine-tuning as trivially bypassable. But given market realities, many applications will ship with only this layer for a while. In the messy middle of real usage, where most abuse is not performed by expert red teams running maximally adaptive attacks, these controls are meaningfully better than nothing.
More importantly, they are inherently dialable. We can tighten thresholds, tune sensitivity, escalate guardrail strictness, and adjust policies as threat intelligence improves, rather than treating “guardrails” as a single, brittle bet.
Dialable detection and response
Detection and response imposes cost on AI automation and AI UX while giving you the telemetry to detect patterns, measure the frequency of risky behavior, and make escalation decisions based on data rather than vibes. It is dialable; you can turn it up by increasing investigation staffing, loosening alert thresholds (accepting lower precision to buy higher recall), and evolving monitoring rules as you learn where AI native attacks are used in real killchains.
Medium term, we need a risk heatmap grounded in observed attacker behavior across the industry
Which is why threat intelligence and threat sharing are important as this area of security practice matures. The absence of attackers on AI risk surfaces is a gift, but we should build connective tissue across the industry that allows us to watch them ramp up in AI-native attacks so we can dial our controls correctly and as quickly as possible.


