2026-05-04

The goal is not to build more queue machinery. The goal is maximum Stockfish throughput per dollar while keeping analysis quality high.

The quality contract stays fixed for the broad path: Stockfish 18, depth 18, MultiPV 3. Pathological positions can get special handling, but only with explicit provenance. The default should not silently become weaker just because one tail position is expensive.

The next bottlenecks are now ordered:

Bottleneck Signal Experiment
Stockfish compute Worker wall dominates everything else raise containers while tracking cost/position
Modal scheduling 200 or 300 containers barely improve throughput stop raising caps and tune batch shape
Modal Volume shards many small files make submit/list/drain slow compare 64, 128, 256 position shards
Drainer throughput compute finishes fast, drain wall dominates add partitioned drainers or rate-shaped drains
Railway DB writes endpoint write-back p95/max grows meter writes before scaling web servers

The efficient ladder is:

  1. Run a medium buffered job: 25-50 Chess.com games, cpu=0.25, max_containers=100, batch_size=128, capped around $0.25.
  2. Run the same shape at max_containers=200, then 300.
  3. If 300 still gives material gains, test whether the Modal account and platform allow a higher cap. If gains flatten, stop there.
  4. Sweep batch size at the best container cap: 64, 128, 256.
  5. Stress the drainer only if drain wall is more than 10-15% of compute wall.

The decision rules matter more than the knobs:

  • If 200 beats 100 materially, try 300.
  • If 300 beats 200 materially and cost per position stays stable, explore beyond 300.
  • If throughput gain is under 15-20%, stop raising container count.
  • If drain is small, do not optimize the drainer yet.
  • If drain dominates, parallelize write-back deliberately instead of letting compute workers stampede the web server.

The architecture target is compute burst, shaped materialization. Modal can run many workers. The server and DB should accept results at a rate they can actually sustain.