Deployment Topology¶
Production Architecture¶
┌─────────────────────────────────────┐
│ Host Machine │
│ │
│ ┌──────────────┐ ┌─────────────┐ │
│ │ Ollama │ │ ARCHER.py │ │ ← bind-mounted from host repo
│ │ qwen3:14b │ │ (agent) │ │
│ │ port 11434 │ └──────┬──────┘ │
│ └──────┬───────┘ │ │
│ │ ┌──────▼──────┐ │
│ │ │ archer-kali │ │
│ │ │ (Docker) │ │
│ └──────────┤ Kali tools │ │
│ localhost:11434 │ --net host │ │
│ └─────────────┘ │
└─────────────────────────────────────┘
Ollama runs on the host bare metal with direct GPU driver access. It is not containerized.
archer-kali is a Docker container running Kali Linux headless with the full penetration testing toolset. ARCHER executes commands inside the container via docker exec, giving it access to all Kali tools without requiring them on the host.
ARCHER source code is bind-mounted from the host repository into the container at /opt/archer. Edits on the host are live in the container immediately - no rebuild required for code changes.
Why This Topology¶
GPU access. Containerizing Ollama adds complexity and potential overhead for GPU passthrough. Host Ollama has direct driver access and a working configuration. The container reaches it via --network host → localhost:11434.
Tool isolation. Running the penetration testing toolset inside Kali (not the host) keeps the host clean and ensures ARCHER always has access to a consistent, known-good Kali environment regardless of host configuration.
Fast development iteration. Bind-mounting the source means a code change on the host is testable in the container immediately, without a rebuild cycle that would otherwise take 20-40 minutes.
Container Requirements¶
The container must be started with:
docker run \
--network host \
--cap-add NET_ADMIN \
--cap-add NET_RAW \
--cap-add NET_BROADCAST \
-v /path/to/ARCHER:/opt/archer \
archer-kali
NET_ADMIN, NET_RAW, and NET_BROADCAST are required for nmap's raw socket access. Without them, nmap falls back to slower, less reliable scan modes.
Running ARCHER¶
Inside the container (production path):
docker exec -it archer-kali python3 /opt/archer/ARCHER.py \
-a -y -local qwen3:14b --do pentest "your task"
From the host, routing through the container (--kali flag):
The --kali flag tells ARCHER to route all command execution through docker exec archer-kali bash -c <cmd>. In this mode, ARCHER runs on the host but all commands execute inside the container.
Ollama Requirements¶
Ollama must be configured to listen on 0.0.0.0, not 127.0.0.1, for the container to reach it via --network host. Set OLLAMA_HOST=0.0.0.0 in the Ollama service configuration.
Rebuild Triggers¶
Rebuild the Docker image (./docker/run.sh) when:
- The tool list in docker/Dockerfile changes
- Python dependencies in requirements.txt change
- The base Kali image needs updating
Do not rebuild for source code changes - the bind mount handles those immediately.
RunPod Topology¶
ARCHER's local topology requires an NVIDIA GPU with sufficient VRAM to run qwen3:14b (~8 GB). RunPod makes this accessible without dedicated hardware by running Ollama on a GPU pod while ARCHER and the Kali toolset run locally (or on a CPU pod).
┌─────────────────────────────────────┐ ┌──────────────────────────────────┐
│ Local Machine / CPU Pod │ │ RunPod GPU Pod │
│ │ │ │
│ ┌──────────────┐ ┌─────────────┐ │ │ ┌──────────────────────────┐ │
│ │ ARCHER.py │ │ archer-kali │ │ │ │ Ollama │ │
│ │ (agent) │ │ (Docker) │ │ │ │ qwen3:14b │ │
│ │ │ │ Kali tools │ │ │ │ port 11434 exposed │ │
│ └──────┬───────┘ └──────┬──────┘ │ │ └──────────────────────────┘ │
│ └────────────┬────┘ │ │ │
│ │ │ └──────────────────┬───────────────┘
│ OLLAMA_HOST= │ │
│ https://<pod>-11434.proxy.runpod.net ◄──────────── HTTP proxy
└─────────────────────────────────────┘
What runs where:
| Component | Location | Notes |
|---|---|---|
| Ollama + model | RunPod GPU pod | RTX 4090 / A40 recommended; 3090 minimum for Q4 qwen3:14b |
| ARCHER.py (agent) | Local or CPU pod | CPU-only; no GPU needed |
| archer-kali (Docker) | Local or CPU pod | Kali tools; same --kali flag as local topology |
| Eval targets | VPS in same region, or local VMs | See eval topology options below |
Connecting ARCHER to a RunPod Ollama Instance¶
The ollama Python library reads OLLAMA_HOST from the environment. No ARCHER code changes
are needed — set the variable before running ARCHER:
Verify the connection before running evals:
Expected output: a list of pulled models including qwen3:14b.
RunPod Pod Setup¶
- Create a GPU pod. Recommended specs:
| Use case | GPU | VRAM | Approx. cost |
|---|---|---|---|
| Inference only | RTX 3090 | 24 GB | ~$0.34/hr |
| Inference + headroom | RTX 4090 | 24 GB | ~$0.74/hr |
| High throughput | A40 | 48 GB | ~$0.76/hr |
Prices are approximate RunPod community cloud rates. Secure cloud rates are higher.
-
Expose port 11434. In the pod configuration, add
11434to the "Expose HTTP Ports" list. RunPod creates a proxy URL in the formhttps://<pod-id>-11434.proxy.runpod.net. -
Install Ollama and pull the model (one-time, or use a template pod):
# Inside the pod terminal
curl -fsSL https://ollama.com/install.sh | sh
# Ollama must listen on 0.0.0.0 for the proxy to reach it
OLLAMA_HOST=0.0.0.0:11434 ollama serve &
sleep 5
# Pull the model (~9 GB — takes 3-10 min depending on pod network)
ollama pull qwen3:14b
- Test from local:
Eval Target Options for Cloud Deployment¶
Running evals remotely requires accessible targets. Two viable options:
Option A — VPS in the same region (recommended):
Deploy Metasploitable2 on a VPS (e.g. DigitalOcean, Linode, Vultr) in the same datacenter region as the RunPod GPU pod. ARCHER connects to the VPS IP directly.
Considerations:
- ~$6-12/month for a 1-2 vCPU VPS
- Target IP is a public IP — update TARGET in eval_harness.py or pass --target <ip>
- Firewall the VPS to accept inbound connections only from ARCHER's egress IP
Option B — RunPod network-local VMs (advanced):
RunPod supports multi-pod private networks. Spin up a second pod running Metasploitable2 (Docker image available) on the same private network. ARCHER on the GPU pod reaches the target pod via private IP.
Considerations: - More complex to configure; requires a RunPod network volume or pod-to-pod networking - Keeps all traffic off the public internet - Useful for eval runs that require multiple targets simultaneously
Note on eval objectives with localhost targets: Several objectives (PT-WEBEX-02/Juice Shop at
localhost:3000, PT-WEBEX-03/DVWA) assume the target runs on the same machine as ARCHER. These
require the target to be co-located with ARCHER's execution environment, or the task string
updated to use the VPS/pod IP. Tracked in #467 for Coder assessment.
Cost Estimate¶
Rough estimate for running eval_harness.py --runs 3 against the full objective set (~50
objectives, RTX 3090 at $0.34/hr):
| Run type | Sessions | Est. GPU time | Approx. cost |
|---|---|---|---|
| Smoke set (5 objectives) | 15 | ~15 min | ~$0.09 |
| Full sweep (50 objectives × 3) | 150 | ~2.5 hr | ~$0.85 |
| Data collection run (overnight) | 300+ | ~6 hr | ~$2.00 |
GPU time is dominated by inference per session (system prompt + 4-8 model turns). Command execution time (nmap scans, msfconsole) is CPU-bound and does not consume GPU billing.
These are estimates. Actual cost depends on model inference speed for the selected GPU and the number of commands per session. Monitor with RunPod's billing dashboard.