Skip to content

Investigative Provenance

Investigative provenance is the property that makes a security finding actionable in a professional context: every claim can be traced back to specific evidence, every step in the investigation is documented, and the entire record is reproducible by an independent analyst.

Most AI security tools produce output without provenance. The finding exists; the evidence behind it does not. ARCHER is built so provenance is the default output, not an optional export.

What Provenance Looks Like in ARCHER

Source Linkage

Every finding in an ARCHER session links to the specific command that produced it and the specific output that contained the evidence. The session log contains:

[command]  nmap -sV 192.168.56.103 -p 21
[output]   21/tcp open ftp vsftpd 2.3.4
[finding]  CRITICAL: vsftpd 2.3.4 detected - CVE-2011-2523 backdoor available
[evidence] output line: "21/tcp open ftp vsftpd 2.3.4"

The finding does not exist without the evidence. The evidence does not exist without the command. The command does not exist without the timestamp and the session context.

Chronological Record

The session log is written in the order events occurred. An analyst reviewing the log can follow the exact investigative path: what was checked first, what it revealed, what was checked next, and why. This chronological record is the basis for both independent reproduction and regulatory reporting.

Dead End Documentation

When ARCHER investigates a path and finds nothing, that absence is recorded explicitly:

[command]  enum4linux -a 192.168.56.103
[output]   [E] Can't find workgroup/domain
[note]     SMB enumeration attempted; workgroup not discoverable via enum4linux

A finding report that omits this is incomplete. It presents an investigation that covered only the areas where something was found, which is not an honest representation of the investigation's scope.

Confidence Transparency

Where ARCHER makes an inference rather than reporting a deterministic fact, the confidence level and the supporting signals are stated explicitly. The distinction between "hash match to known-bad signature" (deterministic) and "behavioral pattern consistent with lateral movement" (inference) is preserved in the output.

The Immutable Log

ARCHER writes session events to an append-only JSONL file. The file is opened in append mode; each event is written as it occurs. There is no mechanism in the agent to read back the log and modify earlier entries.

This property - immutability by architecture, not by policy - is what makes the log suitable as a forensic record. A log that could be modified after the fact is not an audit trail; it's a record of what someone decided to write down.

Why Provenance Is Architectural, Not Optional

Provenance cannot be added after the fact. A system that generates findings first and then attempts to reconstruct the evidence trail produces an ex post facto reconstruction, not a genuine investigation record. ARCHER generates the evidence record as the investigation proceeds - the log is written turn by turn, not summarized at the end.

This means the provenance property is a consequence of the agent loop design, not a reporting feature. It cannot be disabled. Every session produces a complete investigative record because that's how the agent runs.