2026-02-21

The brief synthesis pipeline had a hard 30-signal cap. With 150-300 quality-passed signals on a busy day, that cap was dropping 80-90% of real material. The harder problem was who got dropped: the relevance ranker filled slots with high-volume tickers, so low-volume watchlist names would routinely get zero coverage on the days they actually had news. The brief looked thorough and was structurally blind.

The pipeline pulls from 50+ financial news sources and produces a daily markdown brief per watchlist. Synthesis was a single Sonnet call that took the top-N signals and wrote prose. The reflex fix is to raise the cap. That trades coverage for narrative quality. More signals per call dilutes the LLM’s attention and the prose flattens. The real fix is upstream: the cap was the wrong knob.

Replaced the single Sonnet pass with a map-reduce shape. The map phase runs Haiku per-entity. Every ticker that had any signal gets its own intelligent summary into an intermediate record (headline, narrative, key data, sentiment, implication, cited sources). The reduce phase passes those summaries to Sonnet, which assembles into the existing brief schema unchanged. Routing is volume-aware: 1 signal passes through with no LLM call, 2-4 batch into one call, 5+ get an individual call.

Cost stayed flat, ~$0.04-0.05/brief vs $0.03 before, well inside the budget. LLM call count now scales with entity count (~15-20) instead of signal count (hundreds). Every watchlist ticker gets coverage, including the single-signal ones that used to be invisible.

The bottleneck was never “tokens per call.” It was “calls per coverage unit.” Once the cost function is written that way, partitioning by entity is the only partition that scales with what readers care about: every name they’re tracking, covered to the depth its news warrants.