Deployment Topology¶

Production Architecture¶

+-------------------------------------+
|           Host Machine              |
|                                     |
|  +--------------+  +-------------+ |
|  |    Ollama    |  |  ARCHER.py  | |  <- bind-mounted from host repo
|  |  qwen3:14b   |  |  (agent)    | |
|  |  port 11434  |  +------+------+ |
|  +------+-------+         |        |
|         |                 v        |
|         |          +-------------+ |
|         |          | archer-kali | |
|         |          |  (Docker)   | |
|         +--------->| Kali tools  | |
|   localhost:11434  | --net host  | |
|                    +-------------+ |
+-------------------------------------+

Ollama runs on the host bare metal with direct GPU driver access. It is not containerized.

archer-kali is a Docker container running Kali Linux headless with the full penetration testing toolset. ARCHER executes commands inside the container via docker exec, giving it access to all Kali tools without requiring them on the host.

ARCHER source code is bind-mounted from the host repository into the container at /opt/archer. Edits on the host are live in the container immediately - no rebuild required for code changes.

Why This Topology¶

GPU access. Containerizing Ollama adds complexity and potential overhead for GPU passthrough. Host Ollama has direct driver access and a working configuration. The container reaches it via --network host → localhost:11434.

Tool isolation. Running the penetration testing toolset inside Kali (not the host) keeps the host clean and ensures ARCHER always has access to a consistent, known-good Kali environment regardless of host configuration.

Fast development iteration. Bind-mounting the source means a code change on the host is testable in the container immediately, without a rebuild cycle that would otherwise take 20-40 minutes.

Container Requirements¶

The container must be started with:

docker run \
  --network host \
  --cap-add NET_ADMIN \
  --cap-add NET_RAW \
  --cap-add NET_BROADCAST \
  -v /path/to/ARCHER:/opt/archer \
  archer-kali

NET_ADMIN, NET_RAW, and NET_BROADCAST are required for nmap's raw socket access. Without them, nmap falls back to slower, less reliable scan modes.

Running ARCHER¶

Inside the container (production path):

docker exec -it archer-kali python3 /opt/archer/ARCHER.py \
  -a -y -local qwen3:14b --do pentest "your task"

From the host, routing through the container (--kali flag):

python3 ARCHER.py -a -y -local qwen3:14b --kali --do pentest "your task"

The --kali flag tells ARCHER to route all command execution through docker exec archer-kali bash -c <cmd>. In this mode, ARCHER runs on the host but all commands execute inside the container.

Ollama Requirements¶

Ollama must be configured to listen on 0.0.0.0, not 127.0.0.1, for the container to reach it via --network host. Set OLLAMA_HOST=0.0.0.0 in the Ollama service configuration.

Rebuild Triggers¶

Rebuild the Docker image (./docker/run.sh) when: - The tool list in docker/Dockerfile changes - Python dependencies in requirements.txt change - The base Kali image needs updating

Do not rebuild for source code changes - the bind mount handles those immediately.

RunPod Topology¶

ARCHER's local topology requires an NVIDIA GPU with sufficient VRAM to run qwen3:14b (~8 GB). RunPod makes this accessible without dedicated hardware by running Ollama on a GPU pod while ARCHER and the Kali toolset run locally (or on a CPU pod).

+-------------------------------------+     +----------------------------------+
|       Local Machine / CPU Pod       |     |        RunPod GPU Pod            |
|                                     |     |                                  |
|  +--------------+  +-------------+  |     |  +------------------------+  |
|  |  ARCHER.py   |  | archer-kali |  |     |  |        Ollama          |  |
|  |   (agent)    |  |  (Docker)   |  |     |  |      qwen3:14b         |  |
|  |              |  |  Kali tools |  |     |  |  port 11434 exposed    |  |
|  +------+-------+  +------+------+  |     |  +------------------------+  |
|         +----------+                |     |                               |
|                    |                |     +--------------+----------------+
|         OLLAMA_HOST=                |                   |
|   https://<pod>-11434.proxy.runpod.net <-------- HTTP proxy
+-------------------------------------+

What runs where:

Component	Location	Notes
Ollama + model	RunPod GPU pod	RTX 4090 / A40 recommended; 3090 minimum for Q4 qwen3:14b
ARCHER.py (agent)	Local or CPU pod	CPU-only; no GPU needed
archer-kali (Docker)	Local or CPU pod	Kali tools; same `--kali` flag as local topology
Eval targets	VPS in same region, or local VMs	See eval topology options below

Connecting ARCHER to a RunPod Ollama Instance¶

The ollama Python library reads OLLAMA_HOST from the environment. No ARCHER code changes are needed — set the variable before running ARCHER:

export OLLAMA_HOST=https://<your-pod-id>-11434.proxy.runpod.net

Verify the connection before running evals:

curl -s $OLLAMA_HOST/api/tags | python3 -m json.tool | grep name

Expected output: a list of pulled models including qwen3:14b.

RunPod Pod Setup¶

Create a GPU pod. Recommended specs:

Use case	GPU	VRAM	Approx. cost
Inference only	RTX 3090	24 GB	~$0.34/hr
Inference + headroom	RTX 4090	24 GB	~$0.74/hr
High throughput	A40	48 GB	~$0.76/hr

Prices are approximate RunPod community cloud rates. Secure cloud rates are higher.

Expose port 11434. In the pod configuration, add 11434 to the "Expose HTTP Ports" list. RunPod creates a proxy URL in the form https://<pod-id>-11434.proxy.runpod.net.
Install Ollama and pull the model (one-time, or use a template pod):

# Inside the pod terminal
curl -fsSL https://ollama.com/install.sh | sh

# Ollama must listen on 0.0.0.0 for the proxy to reach it
OLLAMA_HOST=0.0.0.0:11434 ollama serve &
sleep 5

# Pull the model (~9 GB — takes 3-10 min depending on pod network)
ollama pull qwen3:14b

Test from local:

export OLLAMA_HOST=https://<pod-id>-11434.proxy.runpod.net
curl -s $OLLAMA_HOST/api/tags

Eval Target Options for Cloud Deployment¶

Running evals remotely requires accessible targets. Two viable options:

Option A — VPS in the same region (recommended):

Deploy Metasploitable2 on a VPS (e.g. DigitalOcean, Linode, Vultr) in the same datacenter region as the RunPod GPU pod. ARCHER connects to the VPS IP directly.

Considerations: - ~$6-12/month for a 1-2 vCPU VPS - Target IP is a public IP — update TARGET in eval_harness.py or pass --target <ip> - Firewall the VPS to accept inbound connections only from ARCHER's egress IP

Option B — RunPod network-local VMs (advanced):

RunPod supports multi-pod private networks. Spin up a second pod running Metasploitable2 (Docker image available) on the same private network. ARCHER on the GPU pod reaches the target pod via private IP.

Considerations: - More complex to configure; requires a RunPod network volume or pod-to-pod networking - Keeps all traffic off the public internet - Useful for eval runs that require multiple targets simultaneously

Note on eval objectives with localhost targets: Several objectives (PT-WEBEX-02/Juice Shop at localhost:3000, PT-WEBEX-03/DVWA) assume the target runs on the same machine as ARCHER. These require the target to be co-located with ARCHER's execution environment, or the task string updated to use the VPS/pod IP. Tracked in #467 for Coder assessment.

Cost Estimate¶

Rough estimate for running eval_harness.py --runs 3 against the full objective set (~50 objectives, RTX 3090 at $0.34/hr):

Run type	Sessions	Est. GPU time	Approx. cost
Smoke set (5 objectives)	15	~15 min	~$0.09
Full sweep (50 objectives × 3)	150	~2.5 hr	~$0.85
Data collection run (overnight)	300+	~6 hr	~$2.00

GPU time is dominated by inference per session (system prompt + 4-8 model turns). Command execution time (nmap scans, msfconsole) is CPU-bound and does not consume GPU billing.

These are estimates. Actual cost depends on model inference speed for the selected GPU and the number of commands per session. Monitor with RunPod's billing dashboard.