2026-05-06
Wired the WebSocket flow on a Fog of War chess server I’m building to a Postgres event log. The choice that matters is ordering. Every GameEvent writes to disk before any in-memory mutation. The broadcast only happens after the write returns.
Mechanically, appendEvent(roomId, seq, event) runs through room.pendingWrites, a per-room promise chain. The chain serializes writes so seq assignment is atomic with the insert. If the write throws, the room’s projection is not advanced, no fanout happens, and the WS handler responds {type: 'error', reason: 'persistence_failure'} to the originating client. Clients never see a state the database doesn’t have.
Apply-then-persist is tempting because it feels lower-latency. The cost is that a server restart hydrates from the log, and any event that fanned out but didn’t persist gets erased. Half the connected clients had the move. The server doesn’t. The next reconnect rewrites their reality.
Game-end records use ON CONFLICT (room_id) DO NOTHING. The terminal-projection check fires from inside the event-apply path, so a clock expiry that races with a king-capture event can both attempt to recordGameEnd for the same room. Idempotent-on-conflict means whichever lands first wins and the second is a no-op. No transactional gymnastics needed, because the games row is a derived aggregate, not the source of truth.
/health returns 503 if any persistence_failure landed in the last 60 seconds, 200 otherwise. The error list is in-memory and rolling. Railway uses the endpoint for replacement-container readiness, so a flapping database keeps the old container alive instead of cutting over to a fresh one with the same problem.
The reframe that took a minute: a single-server many-client topology is still a distributed system. The cluster is the server plus every connected client. The durability boundary has to live in front of every fanout, not after. Apply-then-persist works when the only consumer is the next read on the same process. It fails as soon as anyone else has already seen the state, because anyone else is a node you can’t roll back.