Build Journal¶

This series documents what it actually looks like to build AI security tools — the design decisions, the hard lessons, and the methodology that makes the difference between a demo and a deployment.

Subscribe via RSS →

Centaur Agent¶

Naming note: Articles in this section use the internal development codename ARCHER. The product will be publicly released as Centaur Agent. The evaluation framework referred to as AgentEval will be released as Centaur Eval. Codenames are preserved in these articles as written — they are historical records of the build, not marketing copy.

A local-first AI security operations agent built on a laptop GPU, fine-tuned on its own operational data, and designed from the first commit to meet the evidentiary standards that regulated environments actually require.

Why AI Security Tools Fail¶

The demo works. The model is capable. So why does the tool fall apart before it reaches production? What I found when I started pulling on that thread.

Architecting for Constraints¶

8GB of VRAM sounds like a limitation. It turned out to be one of the best design decisions in the project — here's why constraints clarify thinking.

The Day I Found 17 Bugs in My Training Data¶

In one session, seventeen bugs surfaced in the training pipeline — each one a lesson in how silently bad data compounds before you ever see it in model behavior.

The Path to Production¶

Getting it to work is one challenge. Getting it to work in a regulated environment where the output carries legal weight is a different one entirely — and a more interesting one.

Not Vibe Coding¶

3,500 lines of AI-assisted code in two weeks, with full architectural control intact. Here's exactly how that process worked and what I learned about the right way to use AI as a coding partner.

Teaching an AI to Think Like a Pentester¶

What does it actually take to go from a generalist model to a specialist trained on its own operational data? The journey from "impressive" to "deployable."

Firm Principles¶

Seven things I only learned by building something real with AI — the lessons that don't show up until the system is running against actual targets.

The Practitioner's Experiment¶

A senior security practitioner with no engineering background built a production AI agent in two weeks. What that process revealed about AI-assisted development done deliberately.

When the Agent Knows but Won't Act¶

For three sessions, ARCHER knew the exact fix for a failing objective and never mentioned it. What that pattern looks like, why it happens, and how to design around it.

Committed but Not Verified¶

Three objectives were broken for weeks while their fixes sat in closed commits. Here's the failure chain, and the four process rules that came out of it.

0.7 Was Wrong¶

The classifier shipped with a 0.7 confidence threshold. The routing log said it was filtering out correct answers. Here's what the data showed — and what it taught me about treating ML parameters as hypotheses.

Building the Range¶

A headless security range on a laptop: Metasploitable2, DVWA, bWAPP, a Windows 11 Defender evasion target, all containerized and networked across isolated segments. What it took to build it and why the architecture decisions matter more than the hardware.

The Morning I Thought My Eval Broke¶

T2 pass rate: 0%. Four sessions ran, all failed. Twenty minutes of debugging later, the system was fine — the sample size was four. What daily metrics hide in low-volume AI evaluation environments, and the three objectives that were actually failing underneath the noise.

The Duplicate That Drifted¶

A list that should exist once existed twice. The copy knew it was a copy — there was a comment in the code. By the time I found it, the two had already diverged. Why solo projects accumulate this class of structural debt silently, and why it's more dangerous in eval pipelines than in production code.

Why I Added a Speed Limit to My Own Codebase¶

Six unverified hint fixes, two unverified shared-utility fixes, eight total — and work stops until the Auditor runs evals. That sounds like bureaucracy. It's actually about preserving the ability to attribute regressions when something breaks.

I Was Training My Agent to Solve DVWA¶

ARCHER's SQL injection pass rate was solid. The hints were correct. The training data looked clean. Every single session was teaching the model to log in at /dvwa/login.php and probe the id parameter. On a different application, every one of those lessons was wrong.

The Agent That Knew and Didn't Act¶

The session log showed correct enumeration, correct identification of the privilege escalation path, and the exact command that would have worked. Then the session ended. The objective failed. The agent knew. It just didn't act.

Pass Rate Was Lying to Me¶

For months, aggregate pass rate was the number. It went up; things were better. Then I found a session where the agent completed the objective and kept running past success until it hit the budget limit — recorded as a fail. Three failure modes were hiding in the same aggregate.

The Pipeline That Reviews Itself¶

ARCHER is built by three AI instances running simultaneously in defined roles: one writes code, one runs evals and holds the behavioral gate, one manages research and documentation. The development process is itself an agent system — with the same concerns about role separation, behavioral verification, and training data integrity that the agent has.

Coordination Debt¶

Three AI agents, same codebase, all doing their jobs correctly — and one created a conflict none of them could see. What a filing collision between parallel instances revealed about the coordination infrastructure multi-agent systems need and almost never get.

Ninety-Eight Runs to Fix One Objective¶

PT-PIVOT-04 ran for 34 evaluation batches at 42% pass rate. Not failing — 42%. The investigation eliminated three wrong hypotheses before finding two root causes that had nothing to do with each other. One blank keystroke determined pass or fail. The path to those two lines is the part worth documenting.

Cognitive Debt in Eval-Driven Systems¶

When the system that checks the agent shares its substrate with the agent it checks, when hints pass eval but no one can explain why, when training sessions score well on pass/fail but teach the wrong behavior — the debt is architectural, not disciplinary. Four specific failure modes. Four structural counter-measures.

Sagittarius¶

A distributed threat hunting platform: sensing at the edge, computing at the core, no cloud dependency. Build journal in progress.

Before the First Line of Code¶

The session that produced no application code at all — and was the most productive session in the project.

Articles are published as they complete review. Subscribe via RSS to be notified when new articles go live.