2026-05-06

Built /admin/calibrate as a manual-labeling tool to seed a substrate-relationship corpus on caissaresearch.com. The downstream goal is a layer that captures how detected features interact (“the pin is severe because king safety is already compromised”); v1 was the first attempt at the labeling surface that grounds it.

v1 shipped with one card per position: a chessground board, a Q1 mini-summary, the top imbalance, the top tactical motif, and three buttons (matters / partial / wrong) on whether the lead feels right. Three free-text fields below: the actual lead if wrong, the key relationship between features, what’s missing.

Half an hour into the first calibration session, v1 was the wrong shape.

Three problems. First, the page rendered only the top-1 imbalance and top-1 motif. Real positions had two or three imbalances and two or three motifs firing simultaneously. The user was being asked to evaluate “is the lead right?” without seeing the rest of the lineup. If the lead felt wrong, there was no way to indicate which of the other firing features should have led, except in free text. Second, the binary Y/Partial/N framing collapsed information. A position where two imbalances both matter and a third is irrelevant doesn’t fit any of those buttons. Third, free-form correction text was hard to produce on the spot. Looking at a board for sixty seconds and writing the relationship between two features is a different cognitive task from clicking a button labeled “matters.”

v2 changes the page contract. Render the full list of imbalances and motifs that fired, not just the top-1. Per-feature triage with three inline buttons (matters / meh / wrong) next to each one. One free-form notes field per card at the bottom, written only when there’s something specific to say.

The reason this corpus exists is the substrate-relationship layer, and that layer lives in the interaction between features. v1 hid the pairs by surfacing only one feature per category. The corpus v1 would have produced was a corpus of single-feature judgments, and the relationship layer was missing from it before any data landed. A corpus is the set of judgments the labeler can express; anything the UI doesn’t render is categorically absent from the dataset, no matter how well-defined it is in the schema.

Designing the corpus and designing the labeling UI are the same step. Shipping the UI first locks in choices about the corpus before any data arrives.