2026-05-08
The Fog of War chess engine behind Bichess had a result worth preserving, but the result was not the release.
The engine maintains belief particles because each player sees only the squares their own pieces can legally reach. By v0.7.22, the belief side was good enough to expose a different failure: some moves looked fine on average while still leaving a low-probability immediate king capture. The decision layer had been averaging that away as material uncertainty. v0.7.22 made it terminal risk: if more than 5% of supporting particles said a move left the king capturable next turn, the strategy filtered that move when safer alternatives existed.
The targeted receipt was sharp. On the same seed, v0.7.21 lost game 20 in 150 plies. v0.7.22 won it in 117 plies. The hard-fact check on the saved artifact found zero violations, and generic CSP reseeds stayed at zero. That was enough to call the behavior real locally.
It was not enough to make the benchmark reusable.
The trap was treating a source pin as the whole release. A fast research tree does not stay shaped like the day a result landed. Imports move, helper APIs change, generated traces get overwritten, and a future tournament loader has to answer a more concrete question than “what did the branch look like?” It needs the exact playable engine package that produced the claim.
So the checkpoint ended as two artifacts, not one. The repo now has immutable
engine snapshots for v0.7.22-king-risk and the existing v2-baseline, plus
zip archives for both and SHA-256 checksums in the checkpoint notes. The
versioned loader can load the pinned v0.7.22 package directly, while the live
worker path can keep iterating on the checked-in Python source.
That split matters because it separates research velocity from evidence preservation. The live tree is allowed to move. The artifact that backs a benchmark claim is not.
Benchmark results are claims about artifacts, not branches. If the artifact cannot be replayed after the source tree moves on, the claim is only a memory of a run that once happened.