What is (+- 3.6)?mcostalba wrote:After 5811 games Mod- Orig: 1037 - 902 - 3872 +8 ELO (+- 3.6) LOS 97%
I find the 95% probability to be (+- 5.3)...
What is (+- 3.6)?mcostalba wrote:After 5811 games Mod- Orig: 1037 - 902 - 3872 +8 ELO (+- 3.6) LOS 97%
After 5811 games Mod- Orig: 1037 - 902 - 3872 +8 ELO (+- 3.6) LOS 97%
66% draw ratio is fairly high, which could reduce the relative error bars compared to a formula based purely on #games played. In any case, a binomial comparison with 1037:902 is 99.9% LOS, no? EDIT: What I mean is, the chance of something as lopsided as 1037:902 between two equal programs is only 1 in 998.What is (+- 3.6)? I find the 95% probability to be (+- 5.3)...
LOS of 97% comes from bayeseloBB+ wrote:After 5811 games Mod- Orig: 1037 - 902 - 3872 +8 ELO (+- 3.6) LOS 97%66% draw ratio is fairly high, which could reduce the relative error bars compared to a formula based purely on #games played. In any case, a binomial comparison with 1037:902 is 99.9% LOS, no? EDIT: What I mean is, the chance of something as lopsided as 1037:902 between two equal programs is only 1 in 998.What is (+- 3.6)? I find the 95% probability to be (+- 5.3)...
, it's me who is thankful!fruity wrote:SF 2.1 or whatever it will be called would be about 8 Elo weaker without your genius thread! So let me thank you.
Oh, indeed, Smooth scaling improves playing style, at some point I was using the s version alongside the default version with tolerable levels of redundancy, I think I can do so once we get stable builds (nobody has suggested Stockfish Learning code yet).Peter C wrote:As far as I can tell, smooth scaling makes it weaker (sometimes significantly), but has a habit of suggesting interesting analysis moves. I put it in there mostly just because I could and it can be handy for analysis, but it for sure has a negative Elo value (which is why it's off by default). The parameters for it could be tuned a bit, maybe we can get something useful from it.
Yes! One problem with Stockfish is that eventually after interactive analysis, several moves are tied in score, and one doesn't know where to continue. Going by number of positions isn't very efficient, and here one needs a different engine that helps Stockfish by guiding him into what move would be analyzed next (the one with the highest score).Jeremy Bernstein wrote:Uly, would you be interested in a version with lower granularity, as well?
.15/14 0:00 +0.16-- 1.Nf3 Nf6 2.e3 d5 3.d4 e6 4.Bd3 c5 5.O-O c4 6.Be2 Nc6 7.Nc3 Bd6 (509.807) 615 15/09 0:01 +0.32++ 1.e4 Nf6 2.e5 Nd5 3.Nf3 Nc6 4.d4 e6 5.Bd3 (928.123) 674 15/09 0:01 +0.40++ 1.e4 Nf6 2.e5 Nd5 3.Nf3 Nc6 4.d4 e6 5.Bd3 (1.152.385) 695 15/09 0:02 +0.56++ 1.e4 Nf6 2.e5 Nd5 3.Nf3 Nc6 4.d4 e6 5.Bd3 (1.424.940) 712 15/16 0:03 +0.44 1.e4 e5 2.Nf3 Bd6 3.Nc3 Nf6 4.Be2 Nc6 5.O-O O-O 6.d3 b6 7.d4 exd4 8.Nxd4 Bb7 (2.247.548) 730 16/18 0:04 +0.24-- 1.e4 e5 2.Nf3 Nf6 3.Nxe5 d6 4.Nf3 Nxe4 5.Bd3 Nf6 6.O-O Be7 7.Nc3 O-O 8.Re1 Nc6 9.Ng5 d5 (3.151.485) 733 16/26 0:05 +0.32 1.e4 e5 2.Nf3 Nf6 3.Nxe5 d6 4.Nf3 Nxe4 5.Qe2 Qe7 6.Nc3 Nxc3 7.dxc3 Qxe2+ 8.Bxe2 Nc6 9.Be3 Be7 10.O-O-O O-O 11.Kb1 Be6 12.Ng5 a6 13.Nxe6 fxe6 (4.052.349) 738
.17/26 0:05 +0.32 1.e4 e5 2.Nf3 Nf6 3.Nxe5 d6 4.Nf3 Nxe4 5.Qe2 Qe7 6.Nc3 Nxc3 7.dxc3 Qxe2+ 8.Bxe2 Nc6 9.Be3 Be7 10.O-O-O O-O 11.Kb1 Be6 12.Ng5 a6 13.Nxe6 fxe6 (4.252.062) 751
.18/26 0:06 +0.32 1.e4 e5 2.Nf3 Nf6 3.Nxe5 d6 4.Nf3 Nxe4 5.Qe2 Qe7 6.Nc3 Nxc3 7.dxc3 Qxe2+ 8.Bxe2 Nc6 9.Be3 Be7 10.O-O-O O-O 11.Kb1 Be6 12.Ng5 a6 13.Nxe6 fxe6 (4.601.681) 751
.19/25 0:07 +0.40++ 1.e4 e5 2.Nf3 Nf6 3.Nxe5 d6 4.Nf3 Nxe4 5.Qe2 Qe7 6.Nc3 Nxc3 7.dxc3 Qxe2+ 8.Bxe2 Nc6 9.Be3 Be7 10.O-O-O O-O 11.Kb1 Be6 12.Ng5 a6 13.h4 (5.619.232) 755 19/19 0:08 +0.48++ 1.e4 e5 2.Nf3 Nf6 3.d4 exd4 4.e5 Qe7 5.Be2 Ng4 6.Qxd4 d6 7.exd6 cxd6 8.Nc3 Nc6 9.Qa4 Qe6 10.O-O (6.811.131) 759 19/26 0:11 +0.32 1.e4 e5 2.Nf3 Nf6 3.d4 exd4 4.e5 Qe7 5.Be2 Ng4 6.Qxd4 h5 7.Nc3 Nc6 8.Qf4 Ncxe5 9.O-O c6 10.Re1 d6 11.h3 Nxf3+ 12.Qxf3 Ne5 13.Qf4 Be6 (8.935.601) 757
Try this build (64-bit). Should have a granularity of 2 now, but I can't say if it will perform any better. Also, if you notice a reversion to a granularity of 8 in endgame positions, please let me know.Uly wrote:Yes! One problem with Stockfish is that eventually after interactive analysis, several moves are tied in score, and one doesn't know where to continue. Going by number of positions isn't very efficient, and here one needs a different engine that helps Stockfish by guiding him into what move would be analyzed next (the one with the highest score).Jeremy Bernstein wrote:Uly, would you be interested in a version with lower granularity, as well?
Without granularity, the problem would be 1/4 less likely (or basically disappear, since with other engines moves that tie after interaction are very rare, unless it's a transposition).
It's simply part of the code (there is a variable called GrainSize which is used in a few places, and sometimes it's hard-coded). Presumably, Marco and the boys found that a granularity of 8 performed better -- maybe he could comment. Anyway, give it a run. I'm curious if the increased subtlety brings any improvement.Uly wrote:Thanks! Out of curiosity, what caused the granularity? It's expected that the more precise evaluation of positions would bring better results, so blurring the lines and making two moves that could really be 7 centipawns apart have the same score seems like a weird design choice.