Play Pack Architecture¶
Scope: runtime architecture for the play system. Answers "how does a play pack work?" and "what does the code expect from a pack?" For data flow and cross-file dependencies see ARCHITECTURE.md. For adding a new domain end-to-end see CONTRIBUTING.md.
Agent Loop (run_agent_mode)¶
run_agent_mode(session_factory, ...) is the unified session loop. session_factory is a
callable that takes a system-prompt string and returns a ChatSession-compatible object
(LocalChatSession for Ollama, cloud sessions from archer_providers.py).
Each turn: model emits [THOUGHT] / bash / [FINDINGS] / [OBJECTIVE_ACHIEVED], ARCHER extracts
the bash block, validates it, runs it, pre-processes output through skill analyzers, and feeds
structured output back. Loop ends on strict [OBJECTIVE_ACHIEVED] token (parsed by
_has_objective_achieved()) or when halt-discipline fires.
Play System (PlayRegistry)¶
PlayRegistry.load_domain(name) enforces single-domain-per-session (raises RuntimeError on a
second call). It imports plays/{name}.py, merges its SKILL_CATEGORIES, TARGET_SIGNATURES,
TARGET_NOUN_OVERLAPS, TARGET_NOUNS_RECOGNIZED into module-level globals, and captures
SYSTEM_PROMPT_ADDENDUM. Core skills (in ARCHER.py) win on conflict.
Pack isolation invariant: Pack files must not import from ARCHER. Everything they need is
passed in by dispatchers at call time. If a pack needs something from ARCHER, read it from the
config dict passed to halt_fn/hints_fn/bonus_fn.
Dispatcher Pattern¶
Three functions per skill category, wired in _register_pack_handlers() at the bottom of each
pack:
halt_fn(command_count: int, findings_text: str, config: dict) -> bool
hints_fn(task: str, config: dict, available_tools: dict, target_signatures: dict) -> list[str]
bonus_fn(task_lower: str, has_context, config: dict) -> int
halt_fn¶
The universal pre-checks in should_halt_objective() run before halt_fn:
| Condition | Result |
|---|---|
command_count < 1 |
always False |
| error indicators present | always False |
command_count < min_commands |
always False |
command_count >= max_commands |
always True (force halt) |
halt_fn only needs to encode "is the work substantively done?" — not count-based thresholds.
hints_fn¶
Receives a shallow copy of the SKILL_CATEGORIES entry with one injected key:
config["exec_target"]—strcontainer name when--kaliis active,Noneotherwise.
Use this to detect kali/container mode without importing ARCHER:
def _sudo_available(config: dict) -> bool:
return config.get("exec_target") is not None # container runs as root
get_play_hints also injects a tool-enforcement directive when the task explicitly names a tool
via "with <tool>" or "using <tool>" phrasing. Fires only for tools in tools_available with
len > 2. Prevents the model substituting an equivalent tool (e.g. gobuster when the task says
"with ffuf").
bonus_fn¶
Returns an integer score delta applied on top of the keyword scorer result. Use to break ties. Typical values: 5–15 for a strong signal, 10–15 for an explicit tool name match.
SKILL_CATEGORIES Entry Shape¶
All fields required. halt_fn/hints_fn/bonus_fn are added by _register_pack_handlers()
— do not put them in the dict directly.
"skill_name": {
"description": str, # human-readable; used in logs and routing telemetry
"keywords": list[str], # increase routing score
"exclude_keywords": list[str],# decrease routing score (competitors, anti-patterns)
"min_commands": int, # halt_fn cannot fire below this count
"max_commands": int, # force-halt at this count
"tools_available": list[str], # told to the model; tool-enforcement directive reads this
"halt_requires": str, # "structured_data" | "comprehensive" | "objective_proof"
"completion_indicators": list[str], # strings that appear in findings on completion
"command_timeout": int, # seconds; normal commands
"sudo_command_timeout": int, # seconds; privileged commands
}
Skill Router (detect_skill_category)¶
Three-tier routing chain:
-
Classifier (
~/.archer_classifier/router_classifier.pkl) — TF-IDF + LR pipeline loaded at startup. If present and softmax confidence ≥ 0.5 for a skill in the current domain, routes immediately. Bypass with--no-classifier. -
Keyword scorer — keyword/exclude scoring + per-category
bonus_fn. Fallback when classifier is absent, confidence < 0.5, or predicted skill not in loaded domain. -
LLM gate — if keyword score gap ≤ 2 and
_ROUTING_MODELis set, a single non-streaming Ollama call resolves ambiguity. Skips if model is not pre-warmed (avoids cold-start penalty).
All decisions appended to ~/.archer_routing_log.jsonl with classifier_used, llm_used,
score_gap fields for training telemetry. Eval harness writes eval_label entries
(label_confidence=high) after every run; --ambiguous mode generates additional entries from
underspecified task phrasings.
Docker Exec Target (EXEC_TARGET)¶
Module-level global EXEC_TARGET = None. Set at startup from --kali (hardcodes archer-kali)
or --exec-target <container>. When set:
execute_commanduses['docker', 'exec', '-i', EXEC_TARGET, 'bash', '-c', cmd]instead of local bash.execute_with_sudobypasses all password machinery and callsexecute_commanddirectly (container is root). Stripssudoprefix before dispatch._ensure_container_running(container)is called at startup — inspects state, starts if stopped, aborts on failure.--prep-sudois suppressed with a warning (irrelevant in container-root context).- Container requirements:
--network host,--cap-add NET_ADMIN NET_RAW NET_BROADCAST(required for nmap raw socket access — nmap's file capabilities includecap_net_adminwhich must be in the bounding set).
archer-kali Container Topology¶
Production deployment: ARCHER runs inside archer-kali; Ollama runs on host bare metal.
- ARCHER code bind-mounted from host repo (
-v /path/to/ARCHER:/opt/archer) — edits on host are live immediately, no rebuild needed. - Container reaches Ollama via
--network host→localhost:11434. Requires Ollama to listen on0.0.0.0(not127.0.0.1). - Rebuild image (
./docker/run.sh) only when: tool list changes, Python deps change, or base image needs updating. - The
--kaliflag is for the host-routes-through-container topology. When running ARCHER directly inside the container, use no flag — commands execute locally against the container's Kali tools.
Playbook (~/.archer_playbook.db)¶
Two tables: playbook (task_pattern + domain → winning command, weight, required/optional
variable metadata) and session_metrics (one row per session). UNIQUE(task_pattern, domain) —
same task can exist per domain. Strict mode: no domain loaded → no playbook ops. Migration is
wipe-and-recreate with PRAGMA checks for idempotence.
IPv4 addresses in both task_pattern and winning_command are abstracted to {ip_address} at
save time via _abstract_command(), enabling cross-target reuse without target-specific commands
replaying against other hosts.
Variable Substitution¶
_extract_required_vars(winning_cmd) finds {VAR} placeholders in saved commands.
extract_all_variables(task_query) derives values from the user's current query. Substitution is
skipped entirely (not errored) when a winning command has no placeholders — see
test_substitution_skip_when_no_vars.py.
Established Patterns¶
These patterns are load-bearing. Do not change them without explicit agreement — changes affect every play pack and every eval run.
| Pattern | What it does |
|---|---|
halt_fn / hints_fn / bonus_fn dispatcher |
Per-skill behavioral hooks; contracts above |
hints_fn exec_target injection |
Lets packs detect --kali mode without importing ARCHER |
Module-level registries (TARGET_SIGNATURES, TARGET_NOUN_OVERLAPS, TARGET_NOUNS_RECOGNIZED) |
Pack target fingerprinting; merged into globals by PlayRegistry |
PlayRegistry.load_domain() single-domain enforcement |
Raises RuntimeError on second call; invariant is load-bearing |
SYSTEM_PROMPT_ADDENDUM (loaded from pack, injected at session start) |
Per-domain prompt extension; 150–300 char budget |
--do / --domain CLI with shortname mapping |
pentest → penetration; --sd for subdomain |
[CLARIFY] token + strict [OBJECTIVE_ACHIEVED] parser |
_has_objective_achieved() parses exact token; no fuzzy match |
Per-skill timeouts (command_timeout / sudo_command_timeout) |
Set in each category dict; governs execute_command |
Domain-scoped playbook with current_domain() |
Reads PLAYS.loaded_domain; no domain → no playbook ops |