Agent skip threshold quietly broke prediction refinement

2026-03-25

Prediction convergence charts had gone flat. No band adjustments since the first prediction was made. Previously, predictions refined daily, updating baseline prices and tightening magnitude bands as the event approached. Now static.

The prediction pipeline writes immutable directional claims at three fixed horizons (1w/1m/1q) for tickers with active signals. A prior sprint had replaced the single-shot predict-event LLM call with a structured agent: research rubric scored -5 to +5, skip when the total lands between -2 and +2. Production confirmed the agent was running. 41 LLM calls on March 25 alone, $0.57/day, researching two upcoming-event tickers every two hours. 40 of 41 returned skip.

The skip path was designed for new predictions, where balanced evidence means no claim. It silently broke the refinement loop, which the same code path now controlled: when the agent skips, the existing prediction keeps its original baseline price and band. One ticker sat at its March 21 baseline price even though the stock had moved. There was no mechanism to update baseline or band without committing a full new prediction, and the rubric almost never let one through.

The smoking gun was the asymmetry between “is this the right initial claim” (rubric-appropriate) and “should we tighten the band given another day passed” (always yes, no rubric needed). Three forward paths sketched: split the two decisions and skip the rubric on band-only updates; restore single-shot for refinements only; or lower the skip threshold to -1/+1.

A gate calibrated for one decision will misbehave on every other decision that gets routed through it. When refactoring the path that produces X, audit every other thing that path was also producing. The gate inherits the new behavior whether you want it to or not.