Skip to content

Pending Release

The following papers are in preparation. Titles and abstracts are listed here as they are drafted; papers publish individually as they clear review.

Subscribe via RSS to be notified when new papers publish.


In Preparation

Compute as Cover Renting a high-end AI chip costs around $1.40 an hour — cheap enough to run a powerful, uncensored model privately and use it to write original malware on demand. Cloud providers are structurally unable to detect this activity. This paper examines where defenders can find a foothold, and what policy can realistically do about a threat that operates inside encrypted, ephemeral sessions.

Operational Failure Modes in LLM-Based Security Agents A grounded taxonomy of 16 operational failure classes observed across five weeks of high-cadence eval-driven development: shell variable loss, PTY crashes, wrong-host confusion, training data contamination, premature objective achievement, and eleven others. 130+ diagnosed issues. Four cross-cutting meta-patterns account for the majority of failures.

The Direction Gap AI capability is treated as the variable that determines AI performance. It isn't. Human direction skill is a larger variable — and almost nobody analyzes it. This paper catalogs what that skill actually consists of: context externalization, trust calibration, specification precision, verification independence, and failure pattern encoding.

Range Lock-In AI security agents trained on cyberrange targets learn to solve the box, not the vulnerability class. This paper names the failure mode, explains why it emerges from the hint-writing process, and documents the two-layer design principle that addresses it at the hint level.

What Aggregate Pass Rate Hides A single pass rate tells you whether your LLM agent is failing. Three signals — Objective Achieved (OA), False Positive (FP), and Halt Discipline (HD) — tell you why. Case studies from 218 real eval runs demonstrate that infrastructure drift and harness miscalibration routinely dominate aggregate accuracy signals.

Beyond Pass Rate — Benchmark Paper The academic companion to What Aggregate Pass Rate Hides. A longitudinal benchmark and formal three-axis decomposition for LLM security agents, with a 218-run dataset, formalized definitions, and a Tier 2 LLM-as-judge quality complement.

The Human Parallel The failure modes of AI security tools are not new problems. They are new instances of problems that cognitive psychology has been measuring in humans for decades. Five frameworks — confabulation, Plato's Cave, Kahneman's System 1/2, Gettier's epistemology, Polanyi's tacit knowledge — each grounded in ARCHER empirical data.

The Inverted Apprenticeship Traditional software development coupled the velocity of building to the velocity of understanding. AI-assisted development breaks that coupling. This paper argues the resulting gap is not a deficit but a deferral — and that with deliberate post-hoc dissection, understanding can catch up to and exceed what manual construction would have produced.

Silent Competence The most dangerous failure mode in a role-constrained AI system is not the agent that acts outside its lane. It is the agent that correctly identifies the solution, lacks authorization to implement it, and says nothing — routing around the blocked path and presenting alternatives as if the correct answer were not available.

Investigative Provenance NIS2 and DORA do not regulate AI tools by name — they regulate the outputs those tools produce. This paper argues that investigative provenance is a compliance requirement, not a design preference, and that most current AI security tools cannot satisfy it.

Training Data Integrity Fine-tuning an AI security agent on its own operational sessions sounds efficient. In practice, it is a mechanism for encoding every failure mode, false positive, and bad habit into the model's weights permanently. This paper documents seventeen distinct bug classes observed during ARCHER's training pipeline development.

The Learning Loop The Centaur Framework specifies how three layers divide work in a single session. This paper addresses how the system gets better between sessions — formalizing the knowledge each layer generates, how it flows across layers, and the conditions under which those flows compound into a genuine improvement spiral.

Sufficiency vs. Optimality Most AI security tools are designed to find any correct solution. This is the sufficiency model. It is not sufficient for production use, where multiple correct solutions differ significantly in stealth, operational safety, transferability, and alignment with engagement constraints.

The Measurement Instrument Problem In eval-driven AI development, the evaluation harness is simultaneously the measurement instrument and a candidate for refactoring. You cannot change it while it is measuring without risking corruption of the longitudinal data it produces.

Building AI You Can Trust A practitioner's methodology for evaluating AI security tools: the three-layer architecture, the OA/FP/HD decomposition, and the multi-tier quality pipeline that keeps bad sessions out of the training corpus.

When the Target Fights Back: Adversarial Robustness AI security agents operate in adversarial environments by design. This paper identifies exploitation vectors specific to AI security agent architectures and finds that the same three-layer design that addresses model reliability also provides the primary defenses against adversarial input.

Human Oversight & Causal Reasoning Effective human oversight of AI security agents requires more than reviewing outputs — it requires understanding why the agent succeeded or failed. This paper examines the causal reasoning skills that make oversight meaningful.

Context as Infrastructure Context management in AI agents is treated as a prompt-writing concern. It is an engineering concern. How context is constructed, preserved, and discarded across turns determines what the agent can know and therefore what it can do.

Smarter Than One: Model Tiering, Domain Specialization, and the Future of Multi-Model Security Agents A single model that can reason about Active Directory lateral movement is expensive overkill for running nmap -sn. This paper documents the execute-and-report / interpret-and-reason split, why domain fine-tuned variants outperform generalists at the same parameter count, and what architectures become tractable once the constraint shifts from "which model?" to "how do I compose the right model for each moment?"


Papers are derived from operational experience building ARCHER. Subscribe via RSS to be notified when new papers publish.