Play Pack Architecture¶

Scope: runtime architecture for the play system. Answers "how does a play pack work?" and "what does the code expect from a pack?" For data flow and cross-file dependencies see ARCHITECTURE.md. For adding a new domain end-to-end see CONTRIBUTING.md.

Agent Loop (`run_agent_mode`)¶

run_agent_mode(session_factory, ...) is the unified session loop. session_factory is a callable that takes a system-prompt string and returns a ChatSession-compatible object (LocalChatSession for Ollama, cloud sessions from archer_providers.py).

Each turn: model emits [THOUGHT] / bash / [FINDINGS] / [OBJECTIVE_ACHIEVED], ARCHER extracts the bash block, validates it, runs it, pre-processes output through skill analyzers, and feeds structured output back. Loop ends on strict [OBJECTIVE_ACHIEVED] token (parsed by _has_objective_achieved()) or when halt-discipline fires.

Play System (`PlayRegistry`)¶

PlayRegistry.load_domain(name) enforces single-domain-per-session (raises RuntimeError on a second call). It imports plays/{name}.py, merges its SKILL_CATEGORIES, TARGET_SIGNATURES, TARGET_NOUN_OVERLAPS, TARGET_NOUNS_RECOGNIZED into module-level globals, and captures SYSTEM_PROMPT_ADDENDUM. Core skills (in ARCHER.py) win on conflict.

Pack isolation invariant: Pack files must not import from ARCHER. Everything they need is passed in by dispatchers at call time. If a pack needs something from ARCHER, read it from the config dict passed to halt_fn/hints_fn/bonus_fn.

Dispatcher Pattern¶

Three functions per skill category, wired in _register_pack_handlers() at the bottom of each pack:

halt_fn(command_count: int, findings_text: str, config: dict) -> bool
hints_fn(task: str, config: dict, available_tools: dict, target_signatures: dict) -> list[str]
bonus_fn(task_lower: str, has_context, config: dict) -> int

`halt_fn`¶

The universal pre-checks in should_halt_objective() run before halt_fn:

Condition	Result
`command_count < 1`	always `False`
error indicators present	always `False`
`command_count < min_commands`	always `False`
`command_count >= max_commands`	always `True` (force halt)

halt_fn only needs to encode "is the work substantively done?" — not count-based thresholds.

`hints_fn`¶

Receives a shallow copy of the SKILL_CATEGORIES entry with one injected key:

config["exec_target"] — str container name when --kali is active, None otherwise.

Use this to detect kali/container mode without importing ARCHER:

def _sudo_available(config: dict) -> bool:
    return config.get("exec_target") is not None  # container runs as root

get_play_hints also injects a tool-enforcement directive when the task explicitly names a tool via "with <tool>" or "using <tool>" phrasing. Fires only for tools in tools_available with len > 2. Prevents the model substituting an equivalent tool (e.g. gobuster when the task says "with ffuf").

`bonus_fn`¶

Returns an integer score delta applied on top of the keyword scorer result. Use to break ties. Typical values: 5–15 for a strong signal, 10–15 for an explicit tool name match.

`SKILL_CATEGORIES` Entry Shape¶

All fields required. halt_fn/hints_fn/bonus_fn are added by _register_pack_handlers() — do not put them in the dict directly.

"skill_name": {
    "description": str,           # human-readable; used in logs and routing telemetry
    "keywords": list[str],        # increase routing score
    "exclude_keywords": list[str],# decrease routing score (competitors, anti-patterns)
    "min_commands": int,          # halt_fn cannot fire below this count
    "max_commands": int,          # force-halt at this count
    "tools_available": list[str], # told to the model; tool-enforcement directive reads this
    "halt_requires": str,         # "structured_data" | "comprehensive" | "objective_proof"
    "completion_indicators": list[str],  # strings that appear in findings on completion
    "command_timeout": int,       # seconds; normal commands
    "sudo_command_timeout": int,  # seconds; privileged commands
}

Skill Router (`detect_skill_category`)¶

Three-tier routing chain:

Classifier (~/.archer_classifier/router_classifier.pkl) — TF-IDF + LR pipeline loaded at startup. If present and softmax confidence ≥ 0.5 for a skill in the current domain, routes immediately. Bypass with --no-classifier.
Keyword scorer — keyword/exclude scoring + per-category bonus_fn. Fallback when classifier is absent, confidence < 0.5, or predicted skill not in loaded domain.
LLM gate — if keyword score gap ≤ 2 and _ROUTING_MODEL is set, a single non-streaming Ollama call resolves ambiguity. Skips if model is not pre-warmed (avoids cold-start penalty).

All decisions appended to ~/.archer_routing_log.jsonl with classifier_used, llm_used, score_gap fields for training telemetry. Eval harness writes eval_label entries (label_confidence=high) after every run; --ambiguous mode generates additional entries from underspecified task phrasings.

Docker Exec Target (`EXEC_TARGET`)¶

Module-level global EXEC_TARGET = None. Set at startup from --kali (hardcodes archer-kali) or --exec-target <container>. When set:

execute_command uses ['docker', 'exec', '-i', EXEC_TARGET, 'bash', '-c', cmd] instead of local bash.
execute_with_sudo bypasses all password machinery and calls execute_command directly (container is root). Strips sudo prefix before dispatch.
_ensure_container_running(container) is called at startup — inspects state, starts if stopped, aborts on failure.
--prep-sudo is suppressed with a warning (irrelevant in container-root context).
Container requirements: --network host, --cap-add NET_ADMIN NET_RAW NET_BROADCAST (required for nmap raw socket access — nmap's file capabilities include cap_net_admin which must be in the bounding set).

archer-kali Container Topology¶

Production deployment: ARCHER runs inside archer-kali; Ollama runs on host bare metal.

ARCHER code bind-mounted from host repo (-v /path/to/ARCHER:/opt/archer) — edits on host are live immediately, no rebuild needed.
Container reaches Ollama via --network host → localhost:11434. Requires Ollama to listen on 0.0.0.0 (not 127.0.0.1).
Rebuild image (./docker/run.sh) only when: tool list changes, Python deps change, or base image needs updating.
The --kali flag is for the host-routes-through-container topology. When running ARCHER directly inside the container, use no flag — commands execute locally against the container's Kali tools.

Playbook (`~/.archer_playbook.db`)¶

Two tables: playbook (task_pattern + domain → winning command, weight, required/optional variable metadata) and session_metrics (one row per session). UNIQUE(task_pattern, domain) — same task can exist per domain. Strict mode: no domain loaded → no playbook ops. Migration is wipe-and-recreate with PRAGMA checks for idempotence.

IPv4 addresses in both task_pattern and winning_command are abstracted to {ip_address} at save time via _abstract_command(), enabling cross-target reuse without target-specific commands replaying against other hosts.

Variable Substitution¶

_extract_required_vars(winning_cmd) finds {VAR} placeholders in saved commands. extract_all_variables(task_query) derives values from the user's current query. Substitution is skipped entirely (not errored) when a winning command has no placeholders — see test_substitution_skip_when_no_vars.py.

Established Patterns¶

These patterns are load-bearing. Do not change them without explicit agreement — changes affect every play pack and every eval run.

Pattern	What it does
`halt_fn` / `hints_fn` / `bonus_fn` dispatcher	Per-skill behavioral hooks; contracts above
`hints_fn` `exec_target` injection	Lets packs detect `--kali` mode without importing ARCHER
Module-level registries (`TARGET_SIGNATURES`, `TARGET_NOUN_OVERLAPS`, `TARGET_NOUNS_RECOGNIZED`)	Pack target fingerprinting; merged into globals by `PlayRegistry`
`PlayRegistry.load_domain()` single-domain enforcement	Raises `RuntimeError` on second call; invariant is load-bearing
`SYSTEM_PROMPT_ADDENDUM` (loaded from pack, injected at session start)	Per-domain prompt extension; 150–300 char budget
`--do` / `--domain` CLI with shortname mapping	`pentest` → `penetration`; `--sd` for subdomain
`[CLARIFY]` token + strict `[OBJECTIVE_ACHIEVED]` parser	`_has_objective_achieved()` parses exact token; no fuzzy match
Per-skill timeouts (`command_timeout` / `sudo_command_timeout`)	Set in each category dict; governs `execute_command`
Domain-scoped playbook with `current_domain()`	Reads `PLAYS.loaded_domain`; no domain → no playbook ops

Play Pack Architecture¶

Agent Loop (run_agent_mode)¶

Play System (PlayRegistry)¶

Dispatcher Pattern¶

halt_fn¶

hints_fn¶

bonus_fn¶

SKILL_CATEGORIES Entry Shape¶

Skill Router (detect_skill_category)¶

Docker Exec Target (EXEC_TARGET)¶