2026-04-30
The user-facing status page sat on “Running stockfish backfill… 2/13 stages” for hours. The browser tab looked frozen. Refreshing didn’t move it, and nothing was erroring anywhere in the logs. The first hypothesis was a stale page. The second was a stuck process. Both were wrong.
The product is a chess analytics tool that ingests a player’s past games and runs Stockfish over each position to mine weakness patterns. Free tier is shallow; the paid tier wants deep coverage of the full corpus. SSH into the Railway container told a different story than the page did:
stockfish phase: 6350/137250 (cache=6043, errors=0) 4.00/s ETA 9.1h
The pipeline wasn’t stuck. The Railway box has one vCPU. At ~4 plies/sec, 137,250 positions is a 9.1-hour grind, and it had been making honest progress the whole time. The “frozen tab” was an actively running multi-hour backfill the UI had no shape to represent.
A CPU upgrade isn’t the fix. Doubling vCPU halves wall time. 16x gets the pathological case to 35 minutes, on a box you pay ~$200/mo for and idle the rest of the day. The work is embarrassingly parallel; each position eval is independent. The right abstraction is fan-out across cheap, ephemeral workers, not a fatter persistent box.
Plan is Function.map(positions) on Modal across 20 shards. ~22 minutes wall time, ~$2.60 per cold-start backfill, auto-triggered from ingest fire-and-forget. The user’s first page view shows whatever coverage exists; subsequent views get richer as cloud catches up. Now it’s a feature (“parallel deep-analysis, full corpus in ~20 minutes”), not an infra note.
The trap was treating “heavy user” as the edge case. The product strategy was always to attract users with deep histories, because more games means better analysis means stronger pitch, so the pathological case is the entire target market. When a user-facing pipeline does CPU-bound work proportional to corpus size, it’s an architectural smell. Either the work belongs out-of-band, or it has to be fanout-able. A single-box engine doesn’t survive contact with the user the product was designed to attract.