System Overview¶

ARCHER is a local-first AI security operations agent built on a structured agent loop. It routes tasks to modular, domain-specific play packs, injects tactical guidance, runs a local LLM, the LLM analyzes the problem to determine next steps, executes the resulting commands, and repeats until the overarching objective is achieved or the halt discipline fires (halt discipline exists to ensure the LLM doesn't go beyond the initial request).

The Agent Loop¶

Task (natural language)
    ↓
Skill Router - routes to the correct domain and category
    ↓
hints_fn - injects tool guidance, target context, tactical SOPs
    ↓
System Prompt - domain addendum + task + accumulated findings
    ↓
LLM (qwen3:14b via Ollama)
    ↓
Command extraction + validation
    ↓
Command execution (local or via archer-kali container)
    ↓
Output pre-processing → structured findings → FindingsAccumulator
    ↓
halt_fn - has the objective been achieved?
    ↓
Loop continues or session ends
    ↓
(multi-phase) EngagementController sequences next phase with injected findings

Every command executed, every finding recorded, and every session outcome is written to an append-only JSONL log that serves as the immutable session record, this record acts as ground truth for command/output history and is the foundation for any diagnostic work that might be needed.

Core Components¶

Component	Description
`ARCHER.py`	Thin entry / CLI (~1.2k lines post-#980). Parses args, loads one domain, dispatches to the loop.
`archer_platform/agent.py`	Agent loop, `PlayRegistry`, prompt assembly, halt/OA gates (~4.5k lines, extracted #980).
`archer_router.py`	Skill routing: classifier → keyword scorer → `bonus_fn` (#973 P3a).
`archer_playbook.py`	Domain-scoped learned-command replay + variable substitution (#973 P3b).
`archer_executor.py`	Execution layer. `execute_command`, `execute_with_sudo`, command validation, `EXEC_TARGET` routing, `_strip_nmap_fingerprints`. Extracted from ARCHER.py — #972.
`archer_findings.py`	Engagement state layer. `EngagementState` dataclass, `FindingsAccumulator` (regex extraction pipeline, typed accumulation, context injection), `save_engagement_state` / `load_engagement_state` for persistence. — #960 #961.
`archer_engagement.py`	Multi-phase coordinator. `EngagementController` sequences PTES phases, injects prior findings into each task via `FindingsAccumulator.inject()`, writes `summary.md` on completion. — #959.
`plays/Domain-Subdomain.py`	Play pack modules. One file per domain, loaded on demand.
`testenv/eval_harness.py`	Quality harness. 75 active objectives against real targets.
`scripts/train_classifier.py`	Trains the TF-IDF+LR skill router classifier.
`scripts/finetune.py`	QLoRA fine-tuning pipeline (RunPod A100).
`docker/`	Container build artifacts for the `archer-kali` deployment target.

Key Design Principles¶

Local-first. The model runs on the host machine via Ollama. No data leaves the controlled environment during inference.

Deterministic where possible. Routing, parsing, validation, halt detection, and known-good command replay are all handled by deterministic code. The LLM is responsible for command generation, output interpretation, and next-step reasoning - and nothing else.

Single-domain isolation. Only one skill domain can be loaded per session. This is enforced mechanically - a RuntimeError fires on any attempt to load a second domain. Domains are designed to be loaded exclusively; their behavioral guidance would conflict if merged.

Behavioral equivalence. Every default-off flag must produce byte-identical output to the pre-change version when the flag is off. Changes to behavior are explicit, not incidental.

For detailed architecture documentation, see the sub-pages in this section: System Map (charted overview of every subsystem) · Play Packs · Skill Domains · Model Routing · V1 to V2 · Eval Harness · Deployment.