Skip to content

Play Pack Architecture

Scope: runtime architecture for the play system. Answers "how does a play pack work?" and "what does the code expect from a pack?" For data flow and cross-file dependencies see ARCHITECTURE.md. For adding a new domain end-to-end see CONTRIBUTING.md.


Agent Loop (run_agent_mode)

run_agent_mode(session_factory, ...) is the unified session loop. session_factory is a callable that takes a system-prompt string and returns a ChatSession-compatible object (LocalChatSession for Ollama, cloud sessions from archer_providers.py).

Each turn: model emits [THOUGHT] / bash / [FINDINGS] / [OBJECTIVE_ACHIEVED], ARCHER extracts the bash block, validates it, runs it, pre-processes output through skill analyzers, and feeds structured output back. Loop ends on strict [OBJECTIVE_ACHIEVED] token (parsed by _has_objective_achieved()) or when halt-discipline fires.


Play System (PlayRegistry)

PlayRegistry.load_domain(name) enforces single-domain-per-session (raises RuntimeError on a second call). It imports plays/{name}.py, merges its SKILL_CATEGORIES, TARGET_SIGNATURES, TARGET_NOUN_OVERLAPS, TARGET_NOUNS_RECOGNIZED into module-level globals, and captures SYSTEM_PROMPT_ADDENDUM. Core skills (in ARCHER.py) win on conflict.

Pack isolation invariant: Pack files must not import from ARCHER. Everything they need is passed in by dispatchers at call time. If a pack needs something from ARCHER, read it from the config dict passed to halt_fn/hints_fn/bonus_fn.


Dispatcher Pattern

Three functions per skill category, wired in _register_pack_handlers() at the bottom of each pack:

halt_fn(command_count: int, findings_text: str, config: dict) -> bool
hints_fn(task: str, config: dict, available_tools: dict, target_signatures: dict) -> list[str]
bonus_fn(task_lower: str, has_context, config: dict) -> int

halt_fn

The universal pre-checks in should_halt_objective() run before halt_fn:

Condition Result
command_count < 1 always False
error indicators present always False
command_count < min_commands always False
command_count >= max_commands always True (force halt)

halt_fn only needs to encode "is the work substantively done?" — not count-based thresholds.

hints_fn

Receives a shallow copy of the SKILL_CATEGORIES entry with one injected key:

  • config["exec_target"]str container name when --kali is active, None otherwise.

Use this to detect kali/container mode without importing ARCHER:

def _sudo_available(config: dict) -> bool:
    return config.get("exec_target") is not None  # container runs as root

get_play_hints also injects a tool-enforcement directive when the task explicitly names a tool via "with <tool>" or "using <tool>" phrasing. Fires only for tools in tools_available with len > 2. Prevents the model substituting an equivalent tool (e.g. gobuster when the task says "with ffuf").

bonus_fn

Returns an integer score delta applied on top of the keyword scorer result. Use to break ties. Typical values: 5–15 for a strong signal, 10–15 for an explicit tool name match.


SKILL_CATEGORIES Entry Shape

All fields required. halt_fn/hints_fn/bonus_fn are added by _register_pack_handlers() — do not put them in the dict directly.

"skill_name": {
    "description": str,           # human-readable; used in logs and routing telemetry
    "keywords": list[str],        # increase routing score
    "exclude_keywords": list[str],# decrease routing score (competitors, anti-patterns)
    "min_commands": int,          # halt_fn cannot fire below this count
    "max_commands": int,          # force-halt at this count
    "tools_available": list[str], # told to the model; tool-enforcement directive reads this
    "halt_requires": str,         # "structured_data" | "comprehensive" | "objective_proof"
    "completion_indicators": list[str],  # strings that appear in findings on completion
    "command_timeout": int,       # seconds; normal commands
    "sudo_command_timeout": int,  # seconds; privileged commands
}

Skill Router (detect_skill_category)

Three-tier routing chain:

  1. Classifier (~/.archer_classifier/router_classifier.pkl) — TF-IDF + LR pipeline loaded at startup. If present and softmax confidence ≥ 0.5 for a skill in the current domain, routes immediately. Bypass with --no-classifier.

  2. Keyword scorer — keyword/exclude scoring + per-category bonus_fn. Fallback when classifier is absent, confidence < 0.5, or predicted skill not in loaded domain.

  3. LLM gate — if keyword score gap ≤ 2 and _ROUTING_MODEL is set, a single non-streaming Ollama call resolves ambiguity. Skips if model is not pre-warmed (avoids cold-start penalty).

All decisions appended to ~/.archer_routing_log.jsonl with classifier_used, llm_used, score_gap fields for training telemetry. Eval harness writes eval_label entries (label_confidence=high) after every run; --ambiguous mode generates additional entries from underspecified task phrasings.


Docker Exec Target (EXEC_TARGET)

Module-level global EXEC_TARGET = None. Set at startup from --kali (hardcodes archer-kali) or --exec-target <container>. When set:

  • execute_command uses ['docker', 'exec', '-i', EXEC_TARGET, 'bash', '-c', cmd] instead of local bash.
  • execute_with_sudo bypasses all password machinery and calls execute_command directly (container is root). Strips sudo prefix before dispatch.
  • _ensure_container_running(container) is called at startup — inspects state, starts if stopped, aborts on failure.
  • --prep-sudo is suppressed with a warning (irrelevant in container-root context).
  • Container requirements: --network host, --cap-add NET_ADMIN NET_RAW NET_BROADCAST (required for nmap raw socket access — nmap's file capabilities include cap_net_admin which must be in the bounding set).

archer-kali Container Topology

Production deployment: ARCHER runs inside archer-kali; Ollama runs on host bare metal.

  • ARCHER code bind-mounted from host repo (-v /path/to/ARCHER:/opt/archer) — edits on host are live immediately, no rebuild needed.
  • Container reaches Ollama via --network hostlocalhost:11434. Requires Ollama to listen on 0.0.0.0 (not 127.0.0.1).
  • Rebuild image (./docker/run.sh) only when: tool list changes, Python deps change, or base image needs updating.
  • The --kali flag is for the host-routes-through-container topology. When running ARCHER directly inside the container, use no flag — commands execute locally against the container's Kali tools.

Playbook (~/.archer_playbook.db)

Two tables: playbook (task_pattern + domain → winning command, weight, required/optional variable metadata) and session_metrics (one row per session). UNIQUE(task_pattern, domain) — same task can exist per domain. Strict mode: no domain loaded → no playbook ops. Migration is wipe-and-recreate with PRAGMA checks for idempotence.

IPv4 addresses in both task_pattern and winning_command are abstracted to {ip_address} at save time via _abstract_command(), enabling cross-target reuse without target-specific commands replaying against other hosts.


Variable Substitution

_extract_required_vars(winning_cmd) finds {VAR} placeholders in saved commands. extract_all_variables(task_query) derives values from the user's current query. Substitution is skipped entirely (not errored) when a winning command has no placeholders — see test_substitution_skip_when_no_vars.py.


Established Patterns

These patterns are load-bearing. Do not change them without explicit agreement — changes affect every play pack and every eval run.

Pattern What it does
halt_fn / hints_fn / bonus_fn dispatcher Per-skill behavioral hooks; contracts above
hints_fn exec_target injection Lets packs detect --kali mode without importing ARCHER
Module-level registries (TARGET_SIGNATURES, TARGET_NOUN_OVERLAPS, TARGET_NOUNS_RECOGNIZED) Pack target fingerprinting; merged into globals by PlayRegistry
PlayRegistry.load_domain() single-domain enforcement Raises RuntimeError on second call; invariant is load-bearing
SYSTEM_PROMPT_ADDENDUM (loaded from pack, injected at session start) Per-domain prompt extension; 150–300 char budget
--do / --domain CLI with shortname mapping pentestpenetration; --sd for subdomain
[CLARIFY] token + strict [OBJECTIVE_ACHIEVED] parser _has_objective_achieved() parses exact token; no fuzzy match
Per-skill timeouts (command_timeout / sudo_command_timeout) Set in each category dict; governs execute_command
Domain-scoped playbook with current_domain() Reads PLAYS.loaded_domain; no domain → no playbook ops