Belief-state overlay inverted the discrete floor

2026-05-06

This is from a benchmark arena I’m building for word-association reasoning, modeled on Codenames. Two AI roles: a cluegiver picks a one-word clue meant to point a partner at hidden target cards while avoiding bystanders and traps; a guesser sees only the clue and ranks unrevealed board words. The current best engine pairs a GloVe-augmented cluegiver with a WordNet-only guesser that scores board words by shared-hypernym depth.

The fix should have been straightforward. v0 of a belief-state guesser layer had failed because its multiplicative bumps were too diffuse to flip argmax. v1 concentrated the same opp-mass on the top-N most-similar cards (where N is the opp’s clue count) and bumped each by 5x to 10x. Smoke at N=20 looked promising, with clearly different game outcomes. At N=360, v1 lost 96/360 = 26.7% against the engine-v1 baseline at 264/360 = 73.3%. -46.7pp pooled, all six seeds individually significant on the negative side.

The earlier embedding-guesser variant had lost 27pp by replacing the WordNet ranker with continuous cosine. v1 was structurally the same move dressed as a posterior overlay. The WordNet ranker scores by max-depth shared hypernym ancestor; words with no shared ancestor score exactly zero, and the guesser skips zero-scored words whenever any positive-scored word exists. That zero is the entire trap-aversion property of the engine.

v1 kept the WordNet base scores, then multiplied them by a posterior trained on prior opp clues. Cards highly similar (by GloVe cosine) to any past opp clue had p_opp boosted, which suppressed their final score. The intent: flip argmax on cards the opp had previously targeted. What it did: any card with a vague distributional similarity to a prior opp clue got suppressed, including cards that were genuine targets for the current clue. Past opp clues are noisy as a signal for the current clue’s target structure, and the noise drowned the trap-aversion zero.

L4 in v0/v1 form is dead. Future belief-state work would need to live on the cluegiver side (which has perfect info) or use a different architecture entirely, not a posterior overlay on the discrete graph. When a system component carries its load through a discrete reject floor, a continuous overlay stacked on top of it inverts the safety property. The thing that looked like a refinement was the same falsification shape as the previous attempt, with extra steps.