Glossary¶
Definitions for the concepts and terms used across Centaur Security Labs research and the ARCHER project. This page is the single reference — terms are defined here once rather than re-defined each time they appear in a paper or article.
Organized by concept group. Cross-references point to the paper where each concept is examined in depth.
Architecture & Design¶
Centaur architecture
A three-layer design in which model, code, and human each handle the class of work they do reliably. The model generates and interprets; deterministic code routes, verifies, and logs; a named human analyst reviews findings and authorizes high-impact actions. The boundary between layers is architectural — not a guideline or a preference. Named after the human-machine collaboration model Garry Kasparov described after his matches against Deep Blue. Examined in depth in The Stochastic Trap.
Code layer
The deterministic component of the centaur architecture: routing decisions, halt detection, safety constraint enforcement, ground-truth verification, session logging, and audit trail maintenance. These are tasks with correct answers. The code layer cannot be probabilistic — placing a language model here is the definition of the stochastic trap.
Compensating logic
Code written to handle cases where the model fails to follow an expected format, structure, or behavior. Parsers for output variations, fallbacks when the model doesn't comply, safety checks added after generation. Each piece of compensating logic is a diagnostic: the model was given a job it is structurally unsuited for. V1 of ARCHER accumulated approximately 300 lines of it before the architectural boundary was enforced.
Ground truth
The actual state of the target system, confirmed by code-layer verification independent of the model's output. A shell responding at uid=0, a file containing expected content, a port confirmed open by an active probe. Not the model's description of what happened — the model can produce a fluent, confident account of success regardless of whether success occurred. The gap between ground truth and model claims is where false positives live.
Human layer
The human analyst component of the centaur architecture. Holds the contextual judgment, accumulated operational experience, and accountability that cannot be transferred to a model or encoded in rules. Final QA on all output, authorization of high-impact actions, and interpretation of findings against organizational risk are human-layer responsibilities — not because we haven't automated them yet, but because the knowledge required to do them is tacit by nature.
Model layer
The language model component of the centaur architecture. Appropriate for tasks where pattern-matching over a large training corpus produces better results than any deterministic rule: command generation, output interpretation, attack chain reasoning, multi-turn investigation. Probabilistic by construction — identical inputs can produce different outputs. Not appropriate for tasks that require deterministic correctness.
Stochastic trap
The design pattern in which a probabilistic system is placed in a role requiring deterministic behavior, and the resulting failures are absorbed by instructions and compensating code rather than by reassigning the work. Each patch treats the symptom rather than the cause: the model is still in a deterministic role, still producing failures at its natural rate, and the system is now carrying the weight of every workaround. Left uncorrected, compensating logic becomes the dominant complexity. Examined in depth in The Stochastic Trap.
Model Behavior¶
Automation bias
The tendency to over-trust automated system output and reduce independent verification over time. In security operations, automation bias manifests as accepting AI-generated findings without tracing them to source tool output, or relaxing human review as the system appears to perform well. The centaur architecture treats automation bias as a structural risk: human review is not optional and is not reducible — it is the third layer of the system.
Confabulation
The production of confident, fluent, plausible output that does not correspond to actual system state. A structural property of language models: they generate the statistically likely continuation of a prompt, which can produce a convincing success narrative when no success occurred. Confabulation is not deception — there is no intent, no theory of mind, no awareness of the gap between the claim and reality. The architectural response is verification: code-layer checks against ground truth that catch confabulated completions before they reach an analyst or a training pipeline.
Context pressure
The degradation of instruction-following behavior as a session accumulates prior turns. As earlier content fills the model's context window, format instructions and behavioral guidelines compete with session history for the model's attention — and lose. A model that reliably follows output structure early in a session may produce non-compliant output in a long one. Predicts output format drift and is one of the reasons ARCHER operates within a fixed context budget.
Output format drift
The progressive departure from a prescribed output format over the course of a long session. Produced by context pressure. Observable as missing structural markers, verbose interpretation where terse tokens were expected, or gradual breakdown of a format the model followed correctly at session start.
System 1 / System 2
Daniel Kahneman's framework for two modes of cognition: System 1 is fast, automatic, pattern-matching; System 2 is slow, deliberate, correctness-checking. In the centaur architecture, the model layer maps to System 1 — generating candidate actions quickly from pattern recognition across training data. The code and human layers map to System 2 — verifying, routing, and authorizing with deliberate correctness requirements. The architecture works when each layer stays in its mode; failures occur when System 1 outputs are treated as System 2 results.
Tacit knowledge
Knowledge that is held but cannot be fully articulated — the accumulated operational judgment that tells an experienced analyst when a finding is meaningful in context, when a scope boundary should hold, when something requires escalation. Tacit knowledge is not a gap to be closed by better prompting or more training data. It is the reason the human layer exists in the centaur architecture and cannot be designed out of it.
Quality & Learning¶
Dark knowledge
Code or systems that the team depends on but does not fully understand — produced when AI-assisted generation outpaces deliberate comprehension. Dark knowledge compounds silently: invisible until something breaks in a way no one can diagnose. The inverted apprenticeship model is the response: building fast with AI assistance, then systematically dissecting what was built before the system becomes load-bearing.
False positive rate
The fraction of task-completion signals that fail independent verification — sessions where the model claimed success but a code-layer ground-truth check confirmed the target state did not match. ARCHER's measured baseline: 9.1% in a controlled 87-session run; 18.4% across 1,639 collected sessions. Every false positive that enters a training pipeline teaches the model to produce more of them.
Halt discipline
The property of an agent session where the model stops issuing commands when the objective is genuinely complete, rather than running to the command ceiling or stopping prematurely on partial evidence. Poor halt discipline takes two forms: running past completion (wasted work, context consumption) and stopping early on the first interesting output (incomplete task). Measuring and improving halt discipline is one of the primary levers on training data quality.
Inverted apprenticeship
A learning model for AI-assisted development in which the practitioner builds first and achieves understanding second, through deliberate dissection of what was built. Inverts the traditional model (understand first, then build) because AI assistance produces working systems faster than construction-time understanding can keep pace with. The critical condition: the system must be engineered to produce diagnostic data, and dissection must be deliberate — not incidental — before the system is load-bearing. Examined in depth in The Inverted Apprenticeship (Centaur Security Labs, pending release).
Probabilistic residual
The set of model assertions in a session that were not confirmed by independent code-layer checks — everything the model produced that the system did not verify against actual target state. The residual is where the model's probabilistic nature is most visible: plausible, sometimes correct, but not evidenced. In production, the residual is the primary artifact for human review. The analyst evaluates what the code layer could not confirm.
Training pipeline
The end-to-end process that converts operational sessions into a fine-tuned model: session collection → structural quality checks → LLM-as-judge quality scoring → data preparation → fine-tuning → deployment. Each stage is a filter. The pipeline closes the operational feedback loop: performance in the field generates training data that improves future performance. The quality of the loop depends entirely on the integrity of the filtering stages — a false positive that enters training teaches the wrong behavior.
ARCHER in Operation¶
ARCHER
A locally-hosted AI security operations agent built on the centaur architecture. Runs entirely within the operator's network boundary on commodity GPU hardware. The model generates candidate actions and interprets tool output; deterministic code handles routing, halt detection, verification, and logging; a named human analyst reviews findings and is accountable for the output. Covers the full security operations lifecycle across multiple domains: penetration testing, threat hunting, digital forensics, hardening, and others.
Audit pipeline
The quality filtering system that determines which sessions are fit for the training pipeline. Tier 1 performs deterministic structural checks: tool presence, output volume, plausible timing, exit code validity. Tier 2 applies an LLM-as-judge scoring pass across four criteria — findings grounded in tool output, appropriate tool selection, genuine completion, scope adherence. Sessions must clear both tiers before entering fine-tuning data.
Domain
The top-level capability loaded for a session — penetration testing, threat hunting, digital forensics, and so on. Each domain is a self-contained skill pack with specialized guidance, tools, and evaluation criteria. Only one domain runs per session; the domain determines what the model receives as context for that session and what the code layer uses to evaluate completion.
Eval harness
The quality measurement system: a defined set of objectives run against real vulnerable targets, producing per-objective pass/fail results with independent ground-truth verification. Not a unit test suite. A live end-to-end measurement that exercises the full agent loop against the actual task distribution. ARCHER's baseline: 94% pass rate across 51 active objectives against Metasploitable2, BWA, bee-box, and Juice Shop.
Fine-tuning
The process of training a pre-trained model on domain-specific data to specialize its behavior for a particular task distribution. ARCHER uses QLoRA fine-tuning: a small set of additional weight matrices (the LoRA adapter) is trained on ARCHER's operational sessions without modifying the base model. Fine-tuning moves the model from general security knowledge toward the specific tasks ARCHER actually runs — the operational distribution, not a benchmark.
LoRA adapter
The artifact produced by fine-tuning. Rather than retraining all parameters of the base model, LoRA trains two small matrices per layer whose product approximates the behavioral update. The adapter is small, portable, and reversible — the base model is unchanged. ARCHER's adapter accumulates domain knowledge from operational sessions; the base model provides the general capability that the adapter specializes.
Objective
A specific, measurable task defined in the eval harness: a task string, a target system, a success criterion verified by independent code-layer check, and a command budget. Objectives are the unit of quality measurement — not sessions, not runs. A session either achieves the objective or it does not; the rate at which sessions achieve a given objective is the primary quality signal.
Router
The system that maps a task string to the correct skill and guidance set at session start. Routes first through a trained classifier (TF-IDF + logistic regression), falling back to keyword scoring when classifier confidence is low. A routing miss corrupts the entire session — the model receives wrong tools, wrong guidance, and wrong constraints with no recovery path within the session.
Session
A single ARCHER agent run from task input to terminal exit. Begins when the agent receives a task and ends when the model signals completion, the code layer halts it at the command ceiling, or an error or timeout fires. Every session produces a complete audit log of every command issued, every output returned, and every finding extracted. Qualifying sessions — those that pass quality verification — become training candidates.
Failure Modes¶
Compound failure
An objective that is failing because of two or more independent structural defects simultaneously. Compound failures take disproportionately more fix iterations than single-defect failures: closing one defect leaves the others active, and the objective continues to fail — making it appear the fix didn't work. The ARCHER failure mode inventory identified T53 (ligolo pivot) as the canonical example: four independent failure classes were active simultaneously, each requiring a separate diagnosis and fix.
Failure class
A named category of recurring root cause — a structural defect that manifests under different issue numbers but shares the same underlying cause. Eighteen months of eval-driven ARCHER development produced 15 failure classes covering 158 issue instances. The key insight: the same root cause was diagnosed and fixed independently multiple times because each instance looked different on the surface. Naming the class makes the pattern visible and enables class-level remediation rather than per-instance patching. Full taxonomy in ARCHER Failure Mode Inventory (Centaur Security Labs, pending release).
One-bug-one-fix trap
The dominant development anti-pattern in AI agent systems: fixing a failure in the specific objective that surfaced it while leaving adjacent objectives with the identical structural defect. The fix closes one issue; the root cause remains active across the rest of the codebase. The trap is hard to avoid because failures surface one at a time, and the natural response is to fix what's in front of you. The countermeasure is class-level diagnosis: before closing any fix, grep for the same pattern in adjacent code and file bugs for every instance found, not just the one that triggered the investigation.
Range lock-in
A training failure mode in which an agent learns to solve a specific target or application rather than the underlying vulnerability class. Produced by hints and training data that are too app-specific: the model learns "on DVWA, send this curl command" rather than "for stored XSS, inject into user-controlled input fields." The result is an agent that passes eval against the training target but fails against any variant. The fix is the two-layer rule: every app-specific hint must have a generic companion that teaches the transferable pattern using placeholders. Examined in depth in Range Lock-In (Centaur Security Labs, pending release).
Recurrence cost
The development overhead produced when a structural defect is diagnosed and fixed multiple times independently rather than once at the class level. In ARCHER's development record — 130+ issues diagnosed across five weeks of intensive iteration — an estimated 40–60 issues (25–38% of the total) could have been prevented by earlier class-level recognition. Per-class examples: the premature objective-achieved pattern recurred 23 times; the missing verification step pattern recurred 25 times before a systematic audit was proposed. Recurrence cost is invisible in per-issue tracking — it only becomes visible when issues are reviewed as a body.
Terms are defined here once. For full technical treatment, follow the cross-references to the research papers and build journal.