Offense Tolerates Noisy AI. Defense Can't.

Why AI's current weaknesses help offense first, and what it takes for autonomous defense to win

Attackers can afford to be wrong a lot. Defenders can't. That gap is most of the story when it comes to AI in security.

Hallucinations, indeterminism, and overconfident errors are annoyances for attackers. They retry, edit, and cherry-pick. For defenders, the same traits show up as false alarms, missed incidents, unsafe actions, and broken trust. A model that is "mostly useful" is already valuable to an attacker. The same model is often too unreliable to sit inside a defender's control loop. Security is a worst-case discipline; today's AI is optimized for the average case.

That does not mean AI is a bad fit for defense. It means models favor offense first, and defenders only win when those models are embedded in systems that are grounded, bounded, and reversible.

Variance helps offense

Current AI systems are probabilistic. Ask the same question ten times and one answer may be dramatically better than the rest. For attackers, that is not a bug. It is in some sense a free search.

Indeterminism becomes a mutation engine: ten variants of a phishing lure, ten reframings of a pretext, ten ways to rewrite a loader to slip past brittle controls. Hallucinations are cheap to throw away when you only need one output to land. Best-of-N behavior is an operational advantage because the attacker only needs one output that is good enough for one target, for a few seconds of human judgment, against one weak control. The $25M Arup deepfake fraud in early 2024 is the clean version of this: the attackers did not need a perfect synthetic CFO, only one video call that held together long enough to authorize a wire transfer.

Attackers can put a human in the loop very cheaply: ask for twenty options, throw away nineteen, lightly edit the best one, move on. "The model is often wrong" is not much comfort when being often wrong and occasionally useful is already an effective combination.

This is not a future scenario. In November 2025, Anthropic reported the first documented AI-orchestrated espionage campaign, attributed with high confidence to a Chinese state-sponsored group. The operators jailbroke Claude Code and pointed it at roughly thirty targets across tech, finance, chemicals, and government. The model did 80 to 90 percent of the campaign autonomously, making thousands of requests, often multiple per second. It also, in Anthropic's own words, "occasionally hallucinated credentials or claimed to have extracted secret information that was in fact publicly available." That is the asymmetry in one paragraph: the same hallucinations that would poison a defender's incident queue were a tolerable cost for an attacker running at machine speed.

Defenders pay for variance

Defensive economics are the opposite. A hallucinated detail in an incident summary wastes response time and misdirects investigation. An inconsistent recommendation makes the system harder to test, trust, and audit. A wrong autonomous action can isolate the wrong host, break a production workflow, or create its own outage. Every false positive consumes analyst time; every false negative risks a miss; every inconsistent response erodes trust. Once trust erodes, teams stop relying on the tool even when it is right.

"Defenders need deterministic systems" is directionally true but incomplete. Humans aren't deterministic either. Analysts get tired, miss signals, and disagree on the same alert. Determinism was never really the bar. The real bar is behavior that is bounded, testable, auditable, and safe, whether the decision maker is a model or a person. The closer any decision maker gets to the control plane, the higher that bar becomes.

There is also a tolerance gap worth naming. We forgive human mistakes more readily than automated ones. After a serious pedestrian injury involving one of its vehicles in 2023, Cruise lost its California driverless permit and then voluntarily suspended operations nationwide, in a country where human drivers are involved in tens of thousands of fatalities each year without triggering anything comparable. Security is no different. An analyst who quarantines the wrong host during an incident gets a postmortem. An autonomous agent that does the same tends to get pulled out of production. That is not entirely irrational; automation scales mistakes the way it scales everything else. But it means the quality bar for autonomous defense is higher than the bar we hold ourselves to, and any realistic design has to account for that.

Defenders also inherit a problem attackers do not: once AI is inside the defense stack, it becomes part of the attack surface. Prompt injection, retrieval poisoning, and tool misuse turn a defensive agent into a privileged component an adversary can try to manipulate. The challenge is twofold: absorb the model's uncertainty, and secure the AI system itself.

Models vs. engineered systems

None of this is an argument for keeping AI on the sidelines. At modern enterprise scale, purely manual defense isn't realistic. The volume of alerts, identity changes, endpoint events, and attack paths is simply too large. Defenders need autonomy.

The mistake is not using AI. It is treating model output as if it were already a trustworthy defensive system. Attackers get value at the model layer. Defenders get value at the systems layer.

Attackers can use a generic model opportunistically. Defenders have to integrate models into environments with real telemetry, assets, policies, and consequences. That is harder, but it is also where the advantage lives. Defenders control the sensors, the enforcement points, and the context across identity, endpoint, network, cloud, and application. They can compare model suggestions against ground truth, gate them with policy, simulate, stage, roll back, and log every action with evidence. Attackers do not have that closed-loop environment.

The path to autonomous defense is not waiting for perfect models. It is building bounded, verifiable autonomy around imperfect ones: stochastic planning, policy-bounded execution. Let the model explore hypotheses, candidate detections, and response plans; variance widens the search space. Execution is different. It needs grounding in local telemetry, fact verification, hard policy gates, blast-radius limits, rollback, and audit trails. The problem is not autonomy. The problem is unchecked autonomy.

Google's Project Zero showed what this looks like in practice with Big Sleep, where an LLM agent found a real exploitable vulnerability in SQLite. The model did the creative work of reasoning about code paths, but it operated inside a scaffolded system with fuzzers, sanitizers, and human review gating the claims. The interesting part is not that a model found a bug. It is that the surrounding system made the find trustworthy. That is the template defenders need.

In the short term, AI helps attackers faster because offense tolerates noise. In the long term, defenders own the telemetry, policy context, and enforcement points needed to turn noisy intelligence into reliable action. The contest is not who has the smarter model. It is who builds the better system around it. If we get that right, AI may help attackers first, but it does not have to help them most.

Disclaimer: These are my personal thoughts and do not reflect the views of my current employer or any previous employers.