The default /log/ feed is reverse-chronological. This page groups entries by the shape of the lesson, so a problem youβre seeing today can find the closest prior receipt.
Some entries appear in more than one bucket when the lesson cuts across.
Tests and validation: when the green check meant nothing
Tautological tests, masked fixtures, statistical noise, validation that scored the wrong thing.
- Five real positions surfaced three substrate bugs the unit tests cleared β detector tests checking only the detectorβs own contract are tautological against that contract.
- Savepoint test fixture masked two production bugs β a 2.5x faster fixture turned green CI into evidence of nothing.
- When validation fails, ask which step failed β a 0/2/38 score answers a downstream question; first ask which stage failed.
- Diagnosed the wrong LLM failure mode from one trace β single-trace conclusion was 14 errors; aggregate showed 318.
- Did not promote exit_wide to default β +9pp leaderboard advantage was below the noise floor at nβ100.
- Held-out vocabulary halved the leverβs ROI β disjoint-by-construction held-out set cut +76.7pp to +33.3pp.
- Plausible-shaped errors in an LLM-drafted methodology doc β make primary-source verification a separate phase from drafting.
- PEAD validated as the one signal with real edge β surviving signal across 387 events; behavioral inefficiency.
- Paper trader was treating βrollβ as βcloseβ β the harness was the lie; -4.7% drag came from misbooked actions, not the strategy.
Silent failure modes
Things that broke without surfacing. The bug is invisible until you specifically look for it.
- Bluesky posts silently dropped for three weeks β optional dep + 0.03s failure duration; took three sessions to spot.
- A silent deploy-skip turned a 24-minute fix into a 10-hour outage β alert on the gate, not just on the build.
- Anthropic cache_control silently no-ops below model minimums β empirical floor sits well above the published 2,048-token threshold.
- Datetime separator mismatch hid every same-day signal β one character was load-bearing in a lexicographic compare.
- Engine fell back to alphabetical when belief collapsed β silent fallback path; surfacing it lifted win rate 89% β 98.5%.
- An authoritative seed must disable, not just enable β removing entries from a canonical list was invisible to the seed.
- Vite baked the bundle before the env vars existed β VITE_* inline at build time; runtime env was too late.
- LLM dedup was hallucinating signal IDs into joins β canonical_id values not present in any batch silently false-deduped 1,714 signals.
- $120 in Gemini credits, $5.11 in my cost log β schema missing the dimension that dominated the bill.
- Agent skip threshold quietly broke prediction refinement β gate inherited by the refinement loop silently froze every existing prediction.
Layer and boundary contracts
Where the trust, durability, or visibility line actually lives β and where it was assumed to live.
- Hidden info at the protocol, not the renderer β UI-layer enforcement is a comment, not a constraint.
- Persist before fanout, even when the cluster is one server β durability boundary belongs in front of the fanout.
- The labeling UI was part of the corpus definition β fields hidden from the labeler are absent from the dataset forever.
- ChatGPT highlight-copy returned only the visible message β virtualized DOM β rendered text; go to the API for full state.
- Entity resolver became the single gatekeeper β a resolver that falls back to raw input pollutes everything downstream.
- A public chess eval cache is not a replacement for provenance β same-shape rows, different contracts; read-through with source metadata.
- Next standalone passed health but failed hydration β HTTP 200 from /health while the browser app crashed; healthcheck was the wrong contract.
- Video embed ready is not video playing β reveal on observable playback, not iframe readiness.
- Scheduled video channels need one duration clock β the guide and player were telling time from different clocks.
- Stockfish MultiPV failed on one legal line, not the whole position β validate the result you actually use, not every byproduct.
Operational rollout and infra
Deploy ordering, fanout, image size, ladders. Keeping the rollout boring.
- Railway deploy deadlocked behind cold DB init β ensure_schema before health server; cold-start surfaced the invariant.
- Stockfish backfill wedged behind a single vCPU β fan-out across ephemeral workers, not a fatter persistent box.
- Buffered write-back turned Modal callbacks into drainable shards β durable shards first, controlled drainer back to the system of record.
- The Stockfish backfill finished because the rollout stayed boring β every scale step had a ledger, cap, metric, and stop condition.
- The next Stockfish target is throughput per dollar, not more knobs β short ladder: medium run, ceiling test, batch sweep, drainer stress.
- The web image was paying for an ML stack it did not use β 2.85 GB β 189.5 MB; image push 163s β 8s.
LLM systems and evaluation
Model behavior, tier choice, oracle limits, prompt and eval architecture.
- 11 LLM falsifications, then one tier swap flipped the verdict β cheap-tier results lie; ladder probe before declaring a null.
- The cleaner dedup architecture cost 4x β the messy stages were doing real grouping work that didnβt survive the rewrite.
- Predictions became immutable claims at fixed horizons β drop the update verbs; let the LLM gate its own output.
- The oracle had no concept of the regime under test β Stockfish-with-truth marked every FOW-aware move it wouldnβt pick as wrong.
- Belief-state overlay inverted the discrete floor β continuous suppression replaced a discrete reject floor; -46.7pp.
- Symmetric upgrade regressed; asymmetric stack was the right shape β continuous cosine has no βunrelatedβ floor; the discrete floor was the load-bearing piece.
- 140x more training tokens regressed the engine 32pp β both βmore dataβ levers null in one evening.
Heuristic ranking and scoring
Anchoring units to something external; choosing the right partition.
- Feature salience anchored to engine-eval delta, not hand-set values β counterfactual removal is the cheapest external anchor.
- Brief synthesis: map-reduce by entity, not by signal cap β change the partitioning unit, not the cap.
Distribution and product
Channel structure, GTM, semantic loops, source curation.
- Three channels, one audience β channel count is not signal count when the channels share an audience.
- Window TV grew a Mandarin immersion ladder β graded listening rows with learner level as a routing signal.
- aide started turning session logs into reusable project memory β reviewed semantic artifact loop: logs β proposed knowledge β runbooks.