2026-04-21
The exit_wide variant was leading the leaderboard by +9.13pp over the all_plays baseline. Welch’s t-test against baseline: raw p=0.126, Bonferroni-adjusted p=0.379 across 3 variants tested. Decided not to make exit_wide the post-freeze default. The exit-rule research track is dead, not deferred.
This is from an options trading system I built that runs multiple exit-rule variants in parallel against the same entry signals so we can compare TP/SL configurations on real (paper) trades. Options returns are high-kurtosis with per-trade SD ~35-48pp. With n ≈ 100 per arm (exit_wide: 91, baseline: 116), detecting a 9pp difference at α=0.05, power=0.80 requires n ≈ 310 per arm. Roughly 30% of the way there. The leaderboard lookup feels like evidence; the math says it isn’t yet.
Decomposing exit_wide further killed the story. The TP=150 rule fired only 1 time across 91 trades. The SL=-50 rule fired 12 times, all catastrophic at -42% average. The entire positive average came from the outcome_check bucket: 78 trades that survived 30 days without hitting any rule, averaging +30.7%. exit_wide’s “edge” is not “wider TP captures more upside.” It’s “looser SL leaves more trades alive long enough to appreciate.” That’s a don’t-cut-too-early effect, not an exit-rule win, and it doesn’t generalize to the variant’s branding. The same decomposition killed exit_trailing outright: 50 rule-triggered samples averaged -36.8%, decisive at the bucket level even at small n.
Two things had to be true to ship exit_wide as the default. First, the headline number had to clear the noise floor. It didn’t. Second, the headline number had to come from the rule itself rather than from an unrelated bucket dragged in under the variant’s name. Neither held. On distributions this fat-tailed, an apparent edge at n=100 isn’t a decision; it’s a draw from a wide posterior. Defaulting on it is gambling, not deciding, and “the rule averaged X” is a load-bearing claim only if the rule is what actually fired.