The Purple Team Flywheel: Using AI to Hunt AI¶

Centaur Security Labs — Jay Hawkins

The views expressed in this publication are those of the author and do not reflect the official policy or position of NORAD, USNORTHCOM, USCYBERCOM, the Department of the Army, the Department of War, or the United States Government.

The Empirical Finding¶

Anthropic just published a report mapping 832 accounts banned for malicious cyber activity over the past year against MITRE ATT&CK. Three findings stand out.

First, attackers are using AI most heavily in post-compromise operations — lateral movement, account discovery, privilege escalation — not in initial access. AI-assisted account discovery rose 8.9% while AI-assisted phishing fell 8.6%. The techniques that used to require deep expertise are being automated away.

Second, the traditional signals for assessing threat actor sophistication no longer work. The least-skilled actors in the dataset used roughly 16 distinct techniques on average; the most skilled used about 20. Technique count tells you almost nothing about risk. What actually differentiates high-risk actors is where in the attack lifecycle they apply AI, and — most importantly — the architecture they build around the model. Higher-risk actors design systems that chain attack stages together with minimal human input. The authors call this "scaffolding."

Third, MITRE ATT&CK doesn't have a technique ID for any of this. Agentic orchestration — an AI that executes commands, makes tactical decisions, and chains multiple attack stages with minimal human oversight — is not represented in the framework that most defenders are using to measure risk.

I've been building toward both sides of this problem. ARCHER — loaded with its penetration testing skill packs — is the offensive scaffolding in this flywheel. Sagittarius is the detection layer. What follows is an explanation of how those two pieces connect, and why I think the flywheel between them is the right architecture for staying ahead of AI-enabled threats.

The Scaffolding, Defined¶

When Anthropic says "scaffolding," they mean the deterministic code layer that orchestrates the model — the code that parses tool output, routes tasks, chains stages together, enforces what the model is allowed to do next. The model itself is probabilistic; the scaffolding is what makes it operationally predictable.

This distinction matters because it means the threat isn't primarily about the model's raw capability. A sophisticated AI-enabled attacker running the same base model as an unsophisticated attacker will produce dramatically different outcomes — not because of model quality, but because of architecture quality. The scaffolding determines whether the model's output chains into a coherent attack, or whether each step is an isolated, easily-interrupted action.

ARCHER is exactly this architecture, built for the defender side: a deterministic scaffold (task routing, halt detection, output parsing, audit logging) around a probabilistic model (command generation, output interpretation, attack chain reasoning). The three-layer split between model, code, and human layer is described in the Centaur Framework paper — but what the Anthropic data adds is empirical validation that this architecture is the primary differentiator in the threat landscape, not just a design preference.

There's a corollary worth stating explicitly: the most dangerous attacks in Anthropic's dataset operated with minimal human intervention. The scaffold removed the human from operational decisions. This is not primarily a capability story — it's an authorization story. The attack becomes dangerous when the human authorization layer is removed from the loop. That has a direct implication for how defenders should be thinking about control architecture, which I'll come back to at the end.

ARCHER as the Poison¶

ARCHER runs post-compromise attack chains against lab targets: Active Directory enumeration, lateral movement, privilege escalation, credential harvesting. It does this with the same three-layer architecture Anthropic describes in high-risk threat actors: a deterministic scaffold orchestrating a probabilistic model, chaining stages together, making tactical decisions, with human authorization at defined checkpoints rather than continuously.

The reason this is useful for defenders is telemetry. Every ARCHER session produces an immutable log: timestamps on every command, elapsed time per stage, the exact command sequences the model emitted, tool selection order, output summaries. That telemetry is a ground-truth record of what an AI-driven post-compromise attack actually looks like at the command-execution layer.

No defensive vendor currently has this data. You can't generate it from incident response — by the time you're doing forensics, the behavioral signals are degraded. You can't generate it from red team reports — those describe what was done, not the millisecond-level timing of how the model made decisions. You can only generate it by running a real agentic attacker in a controlled lab and capturing everything.

That's what ARCHER exists to do.

Sagittarius as the Medicine¶

The question Sagittarius is designed to answer is: what does an AI-driven attack look like from the defender's side, and how do you detect the scaffolding rather than just the individual tools?

The detection methodology comes directly from work I've already done on anomaly detection for AI eval systems — specifically the concept of the behavioral envelope. The behavioral envelope for an agentic system is the distribution of (command count, elapsed time, tool sequence, command n-grams) across sessions. For ARCHER running a lateral movement objective, this envelope is tight: roughly the same number of commands, similar timing, predictable tool ordering. Deviations from that envelope are detectable even when individual commands look benign.

Applied to network defense, this translates into three specific signals:

Command sequence n-gram novelty. An AI agent chains commands differently than a human. The recon-then-execute patterns are more uniform, more consistent across sessions. A human lateral-moving through a network backtracks, reconsiders, uses tools in variable order depending on what they find. An AI agent following a scaffolded plan exhibits characteristic n-gram patterns — tool pairs and triples that appear together with unusual regularity. Sagittarius can build a vocabulary of these patterns from ARCHER's session logs and score live command telemetry against it.

Timing regularity. AI agents executing post-compromise operations have a distinctive timing signature — not the specific latency (which varies by model, hardware, and API configuration), but the regularity of the pattern. A human penetration tester pauses to read output, makes decisions, occasionally goes for coffee. An AI agent's pauses are consistent: model inference takes roughly the same time for similar-complexity outputs, so the burst-pause-burst rhythm is more uniform than human behavior. Detecting this isn't about finding a specific pause length; it's about detecting lower-than-expected variance in inter-command timing across a session.

Behavioral envelope violations. AI agents operating under scaffolded attack plans tend to execute in a more constrained command space than humans. The scaffolding defines what the model is allowed to do; humans exploring a network improvise more. A session that uses exactly the right tools in exactly the right order, completes exactly the expected number of recon steps, and moves to the next phase without the dead ends and retries that characterize human exploration is anomalous in a way that signature-based detection won't surface.

These are anomaly detection signals, not signature signals — they fire when behavior deviates from the baseline of human activity, without requiring prior knowledge of the specific attack tooling being used. That's the critical distinction: the scaffolding has a behavioral fingerprint regardless of what model or tools it's running.

The Flywheel¶

The value of running both systems against each other is that they force each other to evolve.

ARCHER runs a lateral movement campaign in the lab. Sagittarius analyzes the session logs and detects the attack by flagging the timing regularity and n-gram patterns. Feeding that detection back into ARCHER development: add sleep jitter to the scaffold to break timing signatures, instruct the model to vary command ordering on recon steps. Sagittarius updates to detect the new behavioral envelope. The cycle repeats.

This is what makes the purple team approach different from either red or blue in isolation. Red team exercises tell you what an attacker can do. Blue team tooling tells you what it can detect. Running them against each other in a closed loop tells you where the detection gaps are before an attacker finds them — and then closes those gaps through iteration rather than incident response.

The flywheel has a direction: each ARCHER evasion produces a new Sagittarius detection signature. Each Sagittarius detection produces a new ARCHER evasion target. The system doesn't reach a stable equilibrium — it's designed to stay in motion, because the threat landscape doesn't reach a stable equilibrium either.

What the Industry Needs to Build¶

Anthropic ends their report by noting they're in discussions with MITRE about how ATT&CK might evolve to include agentic orchestration behaviors. That's the right direction, but there's a gap between "agentic orchestration as a concept" and "agentic orchestration as a detectable signal."

The ATT&CK framework works because every technique in it is associated with observables — things you can actually look for in logs. If you add "AI-driven orchestration" as a concept without specifying what that looks like in endpoint telemetry, SIEM rules, or network captures, defenders can't operationalize it.

The three signals described above — n-gram regularity, timing variance, behavioral envelope deviation — are candidates for observable-level specification. They're not the complete answer; this is early-stage work. But they're grounded in the same behavioral data that ARCHER generates in practice, not in speculation about what AI attackers might someday do.

The measurement instrument problem cuts both ways here. Just as Anthropic found that technique count is the wrong thing to measure for AI threat actors, the new framework needs to measure what actually distinguishes AI-scaffolded attacks: orchestration quality, autonomy level, and the behavioral signatures that emerge from deterministic scaffolding around probabilistic models.

On Authorization¶

I want to close on the authorization angle, because I think it's the most important finding in the Anthropic data and the one most likely to get lost in the conversation about detection methods.

The highest-risk actors in Anthropic's dataset didn't just use better models. They removed humans from the operational loop. The scaffold ran the attack; the human set the objective and reviewed results. That's a capability claim — but it's also an authorization claim. The attack becomes more dangerous as human authorization checkpoints are removed.

This has a direct implication for the defensive side: the correct response to AI-enabled attacks is not just better detection tooling. It is architectures that require human authorization at defined points in high-consequence decisions, regardless of whether an AI could make those decisions autonomously. Not because humans are better at tactical decisions than AI — in many cases they're not — but because authorization without a human in the loop is not authorization. It's just execution.

ARCHER is designed with this boundary explicit: the model generates, the code verifies, the human authorizes. That boundary is not a limitation of current AI capability. It's load-bearing architecture. Anthropic's data gives it empirical grounding — the threat actors who did the most damage are the ones who removed it.

ARCHER and Sagittarius are under active development. The detection methodology described here represents planned implementation, not current production capability. The behavioral signatures described are derived from ARCHER's session telemetry; Sagittarius's detection layer is in early design. Comments and challenges welcome.