Skip to content

System Overview

ARCHER is a local-first AI security operations agent built on a structured agent loop. It routes tasks to modular, domain-specific skill packs, injects tactical guidance, runs a local LLM, the LLM analyzes the problem to determine next steps, executes the resulting commands, and repeats until the overarching objective is achieved or the halt discipline fires (halt discipline exists to ensure the LLM doesn't go beyond the initial request).

The Agent Loop

Task (natural language)
Skill Router - routes to the correct domain and category
hints_fn - injects tool guidance, target context, tactical SOPs
System Prompt - domain addendum + task + accumulated findings
LLM (qwen3:14b via Ollama)
Command extraction + validation
Command execution (local or via archer-kali container)
Output pre-processing → structured findings
halt_fn - has the objective been achieved?
Loop continues or session ends

Every command executed, every finding recorded, and every session outcome is written to an append-only JSONL log that serves as the immutable session record, this record acts as ground truth for command/output history and is the foundation for any diagnostic work that might be needed.

Core Components

Component Description
ARCHER.py Main agent (~3,500 lines). Agent loop, skill registry, playbook, CLI.
skills/Domain-Subdomain.py Skill pack modules. One file per domain, loaded on demand.
testenv/eval_harness.py Quality harness. 67 active objectives against real targets.
scripts/train_classifier.py Trains the TF-IDF+LR skill router classifier.
scripts/finetune.py QLoRA fine-tuning pipeline (RunPod A100).
docker/ Container build artifacts for the archer-kali deployment target.

Key Design Principles

Local-first. The model runs on the host machine via Ollama. No data leaves the controlled environment during inference.

Deterministic where possible. Routing, parsing, validation, halt detection, and known-good command replay are all handled by deterministic code. The LLM is responsible for command generation, output interpretation, and next-step reasoning - and nothing else.

Single-domain isolation. Only one skill domain can be loaded per session. This is enforced mechanically - a RuntimeError fires on any attempt to load a second domain. Domains are designed to be loaded exclusively; their behavioral guidance would conflict if merged.

Behavioral equivalence. Every default-off flag must produce byte-identical output to the pre-change version when the flag is off. Changes to behavior are explicit, not incidental.

For detailed architecture documentation, see the sub-pages in this section: Skill Domains · V1 to V2 · Eval Harness · Deployment.