The Work¶

Naming

Centaur Agent is the public product name. ARCHER is the internal development codename used throughout this site's build journal and the private repository. Centaur Eval is the evaluation framework (internal codename: AgentEval). All references to ARCHER and AgentEval in development logs refer to these products.

Centaur Agent is a locally hosted AI agent for security operations. It runs on a laptop GPU, operates entirely within your network/security boundary, and is built on a strict division of labor: AI handles command generation, output interpretation, and multi-turn reasoning; deterministic code handles routing, halt detection, safety constraints, and audit trail; a named human analyst reviews findings and is accountable for the output.

That architecture is what makes ARCHER deployable in environments where the output carries legal or evidentiary weight.

Current Status¶

The penetration testing domain is active and validated. Nine skill packs cover the full penetration testing lifecycle — reconnaissance, vulnerability assessment, web exploitation, network exploitation, post-exploitation, pivoting, privilege escalation, and Active Directory attacks.

Validated against real targets at a 94% pass rate across 67 active objectives (2026-05-09 baseline run, 29-objective set × 3 runs = 87 sessions). Failures are documented failures, not uncaught false positives — the system is designed to catch its own errors.

What Makes It Different¶

Local-first, no cloud dependency. ARCHER runs on an RTX 4060 Mobile with 8GB VRAM. The model runs on the same machine as the agent. No data leaves the network.

Auditable by design. Every session produces a complete, timestamped log: each command issued, raw output returned, findings linked to specific tool output. A finding is traceable to confirmed target state — not to the model's probability distribution over plausible outputs. That chain of evidence is what regulated environments governed by NIS2, DORA, and similar frameworks actually require.

Fine-tuned on its own operational data. ARCHER collects training data from its own evaluation runs and fine-tunes on that data. The model improves against the specific task distribution it actually sees — not a generic security benchmark.

Planned Domains¶

System Hardening, Threat Hunting, Digital Forensics, Malware Analysis, Threat Intelligence, and CTF/training environments. Each domain will be built to its governing industry standards and validated against real targets before release.

Measuring Progress¶

The Benchmark Dashboard tracks ARCHER's performance across all active objectives over time. It is generated automatically after each evaluation run, giving you a raw and uncensored view into the wins and failures of ARCHER as it's built.

Progress is monitored across several datasets:

Evaluation results — Every eval run against Metasploitable 2 produces a timestamped CSV recording pass/fail, skill routed, commands issued, and halt outcome for each objective. These accumulate into a trend archive used to detect regressions and track improvement over time.

Router labels — Each eval run logs how the skill router classified the incoming task, what confidence it had, and whether a classifier or keyword scorer made the call. This dataset feeds the TF-IDF + logistic regression classifier that handles routing at inference time.

Fine-tune data — Sessions ending in a confirmed objective pass or a disciplined halt produce structured training examples, one per skill domain. These are the raw material for the QLoRA fine-tuning pipeline that moves the base model toward ARCHER's actual task distribution.

Tier 2 audit scores — After each collection run, a lightweight LLM-as-judge pass scores every session 0–3 on output quality. Sessions scoring below 2 are excluded from fine-tuning. This dataset tracks score distribution per skill and surfaces calibration drift in the judge itself.

Operating Principles¶

The decisions behind how ARCHER is built — why failures are published rather than filtered, why roles cannot verify their own work, why training data gates exist at the thresholds they do — are documented in the Guiding Principles. Each principle was extracted from a real failure mode. That is what makes them constraints rather than aspirations.

Following the Build¶

The Build Journal documents the full development story — design decisions, hard lessons, and what it actually looks like to build a production AI security tool from the ground up.