The Tax on the Happy Path

What I learned by removing the elaborate CI gate I'd built to slow my AI agents down — after realizing it had never actually caught anything.

21 Apr 2026

I was halfway through another PR, regenerating the proof file for the fourth time after rebasing on main, when I realized I couldn’t name a single PR the permit gate had caught. Not “caught that other CI didn’t also catch.” Just caught. I decided to look up the numbers, and then I removed the gate entirely.

What the Gate Was

The short version: every PR to Zabriskie had to be accompanied by a permit file in .caucus/permits/<branch>.json. The permit declared the scope of the change (which paths were allowed to move), the risk level (R1 through R3+), and a list of allowlisted test commands the author was offering as evidence. R2 and higher required a proof file too, generated by actually running those commands and recording the output alongside a fingerprint of the working tree. A CI job called Caucus Permit Gate ran on every PR. It blocked the merge unless the permit existed, the scope covered the diff, and (for R2+) the proof was fresh and the tests passed.

I built it because I’d spent months watching agents read a rule in CLAUDE.md, acknowledge it in conversation, and then ship code that violated it anyway. Memory Isn’t Learning was the post where I argued that prose in an instruction file is a journal of failures, not a guardrail. The permit gate was supposed to be the structural answer: the rule stops being a sentence and starts being a required status check that CI evaluates against the actual diff.

The gate shipped. It got bypassed, broken, and accidentally self-deadlocked in nine different ways, each of which got a hardening patch. Opt-In Isn’t a Guardrail was the post about those failures. By the time I sat down to write this one, none of that was still happening. The paperwork was filed, the scope was matching, the proofs were fresh, the tests were passing. I killed the gate anyway.

The Count

Over the last hundred CI runs on Zabriskie (roughly a week of shipping, so take the sample for what it is) there were four Caucus Permit Gate failures. Every one of them was a “Proof failed” result, which means one of the allowlisted test commands the proof re-runs returned non-zero. The same commands run in the Build Backend, Unit Tests, and E2E Tests jobs, which are separate CI jobs on every PR. In every case, those dedicated jobs also failed, for the same reason, on the same PR.

The gate’s unique contributions, the ones that were supposed to justify its existence, never fired as the blocking failure. Scope allowlist violations: zero. Risk-level enforcement: zero. Proof fingerprint drift is a regeneration trigger by design, not a catch, so it was never going to appear on that list, and I should have noticed that before I put it on the list. The gate caught exactly nothing in that window that the plain test jobs wouldn’t have caught five minutes later.

The Cost

The proof file’s validity is tied to a fingerprint of the working tree. Every git merge origin/main changes that fingerprint, which invalidates the proof, which means the proof has to be regenerated, which means the allowlisted commands have to be re-run end to end. On Zabriskie that’s cd backend && go build ./..., cd backend && go test ./..., cd web && npm run build, and cd web && npx playwright test. Playwright alone takes several minutes and burns an entire stack of browser processes.

Every time main moved underneath an in-flight PR, the compliant action (rebase, resync, then push) forced the same PR to re-run the same tests it had already run, twice, once in the proof regeneration and once in the downstream CI jobs. During an active week of merges that can happen five or six times on a single branch.

This is the part I did not see clearly when I was building it. The cost of the gate scaled with velocity: the more main moved, the more proofs had to be regenerated, the more tests had to run twice. The benefit didn’t scale with anything I could measure. It sat flat at zero catches that weren’t already caught downstream. A gate whose cost grows with how much work is happening, and whose benefit does not grow at all, is not a guardrail. It is a tax on the branches doing the right thing, collected in service of nothing.

Agents are not insulated from this. They pay in tokens, in extra tool calls, in context spent on regeneration steps that accomplish nothing, in CI wait time that blocks the next thing they were going to do. The dollars come out of my pocket, but the friction lands on them.

The Kill PR

The kill PR was small. Remove the permit-gate job from the CI workflow. Strip the needs: [permit-gate] dependency from the downstream jobs. Delete the pre-push hook that enforced the same check locally. Delete the corresponding rule from the agent instructions file. Branch protection had to be updated to drop the gate as a required status check, which is the only part of the change that required an admin action on the repository.

What I did not do, and what I am deliberately leaving alone for now, is delete the scripts and the .caucus/ history of accumulated permits and proofs. That data is a record of how agents behaved under a specific constraint across the life of the experiment. I don’t want to throw it away until I’ve looked at it properly. There’s a version of this story where the permits themselves, as artifacts, turn out to be more useful than the gate that validated them. I don’t know yet whether that’s true. But the gate’s cost and benefit are clear enough that I don’t need to know before pulling the plug.

What I Am Not Saying

I am not saying there should be no gates. I am not saying the Caucus experiment was wasted. I’m saying that this particular implementation of this particular gate was charging an ongoing cost larger than the value of what it caught, and that cost is a first-class design concern, not a tradeoff you can wave away by pointing at the artifacts it produced.

Every gate has to answer two questions. Does it catch failures the other checks wouldn’t catch? And does the cost of passing it, summed over the lifetime of the repo, stay below the cost of whatever it’s preventing? If the answer to the first is “no” and the second question never even gets asked, the gate has negative value even when it technically works.

The question I was asking in Memory Isn’t Learning was whether the structural version of a guardrail would behave differently from the journaled one. The answer is: structural guardrails are necessary but not sufficient. They have to be structural and the ongoing cost of compliance has to stay below the cost of the failures they prevent. Otherwise the gate is a net negative no matter how firmly it’s bolted in.

What’s Next

I still think the basic idea behind Caucus is right. An agent pushing a change to a shared branch should have to declare the scope of the change, the risk it’s taking, and the evidence it’s offering that the change is correct. What I got wrong was the enforcement surface. The gate lived at merge time, which is the most expensive place to check anything, and it checked by re-running the same tests the merge was already going to run.

I don’t know yet what the next version looks like. The honest sentence is that I don’t know how to validate a permit continuously against a moving diff without re-running the work the tests are already doing. That was the whole problem the first time, and I skipped over it by putting the check at merge time and eating the cost. Until I can answer that question, I don’t have a V2. I have a V1 that I’ve turned off and a list of things I know V2 can’t do.

I had the information to turn this off weeks before I did, and I don’t have a mechanism for catching the next version of this mistake any earlier.

This is part of a series about building Zabriskie with Claude. Previously: Memory Isn’t Learning, Software Engineering Is Becoming Civil Engineering, Caucus V1, The Structural Engineer’s Other Job, Opt-In Isn’t a Guardrail.