value of LMR and null-move

hyatt · Post by **hyatt** » Wed Jul 14, 2010 1:58 am

Sentinel wrote:
hyatt wrote:Now I am lost. Did you disasble NM, FUTILITY and LMR in your test or just LMR?

My tests have been only without LMR (for stockfish)...
Both LMR and null move (but not futility pruning).
So it's a combined effect and that's the reason the difference is even higher on larger depths.
The bottom line is at 1'+1'' they should bring around 200elo combined in most of today's top programs (SF, Ippo, Rybka, even Crafty 23.3).

I will test only LMR once 4'+4'' test is finished (in 3 days time).

OK, now explain how that belongs in the discussion being held? We are trying to measure the effect of LMR only. And you provide bigger numbers than anyone else. I don't see how that furthers the discussion at all when we are talking apples, oranges and lemons. To compare, we have to compare something that makes some sort of sense. I was clear as to what I removed from Stockfish. I believe Ed was also just as clear about what he removed from Stockfish for his experiments. Throwing in null-move, with a different engine to boot, doesn't really help in the debate about what LMR is providing. We could have a discussion about turning both off, or about turning null-move off by itself, which is fine. But mixing things up is only confusing. I had thought your original numbers were surprising in light of the numbers Ed and I are seeing, and at first thought "perhaps LMR is significantly different in ip* and friends." Wrong assumption.

Sentinel · Post by **Sentinel** » Wed Jul 14, 2010 4:45 pm

hyatt wrote:We are trying to measure the effect of LMR only. And you provide bigger numbers than anyone else. I don't see how that furthers the discussion at all when we are talking apples, oranges and lemons. To compare, we have to compare something that makes some sort of sense. I was clear as to what I removed from Stockfish. I believe Ed was also just as clear about what he removed from Stockfish for his experiments. Throwing in null-move, with a different engine to boot, doesn't really help in the debate about what LMR is providing. We could have a discussion about turning both off, or about turning null-move off by itself, which is fine. But mixing things up is only confusing. I had thought your original numbers were surprising in light of the numbers Ed and I are seeing, and at first thought "perhaps LMR is significantly different in ip* and friends." Wrong assumption.

Well the title of the thread (your thread) is "value of LMR and null-move". Even your first post measured both (plus separate components).
I've started tests back then, not later in the thread when you suddenly decided to measure just LMR. And each time I stated clearly that the results are default vs. no LMR, no null move. I'm sorry you fill disappointed but you should have better read what I was writing...
And I will do just LMR test, but when I finish with the current one. For more I don't have enough computing power.

hyatt · Post by **hyatt** » Wed Jul 14, 2010 5:31 pm

Sentinel wrote:
hyatt wrote:We are trying to measure the effect of LMR only. And you provide bigger numbers than anyone else. I don't see how that furthers the discussion at all when we are talking apples, oranges and lemons. To compare, we have to compare something that makes some sort of sense. I was clear as to what I removed from Stockfish. I believe Ed was also just as clear about what he removed from Stockfish for his experiments. Throwing in null-move, with a different engine to boot, doesn't really help in the debate about what LMR is providing. We could have a discussion about turning both off, or about turning null-move off by itself, which is fine. But mixing things up is only confusing. I had thought your original numbers were surprising in light of the numbers Ed and I are seeing, and at first thought "perhaps LMR is significantly different in ip* and friends." Wrong assumption.
Well the title of the thread (your thread) is "value of LMR and null-move". Even your first post measured both (plus separate components).
I've started tests back then, not later in the thread when you suddenly decided to measure just LMR. And each time I stated clearly that the results are default vs. no LMR, no null move. I'm sorry you fill disappointed but you should have better read what I was writing...
And I will do just LMR test, but when I finish with the current one. For more I don't have enough computing power.

Yes, I had (in Crafty) measured each. Then we started to concentrate on LMR only, as per Ed's comments. Your tests are fine, but in every post I wrote I _clearly_ indicated what was being tested. The discussion started in another thread that was poorly titled for what was being discussed so the current discussion moved here (I think I started the thread).

For both LMR and NM, your numbers look pretty close to mine with Crafty. Why stockfish only drops off by 50 or so when LMR is removed is not known, yet...

Rebel · Post by **Rebel** » Wed Jul 14, 2010 8:52 pm

hyatt wrote: For both LMR and NM, your numbers look pretty close to mine with Crafty. Why stockfish only drops off by 50 or so when LMR is removed is not known, yet...

SF 1.8 vs SF 1.8 (no LMR) 15min blitz ended in +37 =56 -7 way above 50 elo.

You will argue of course but I think for this kind of search related test you need a decent TC so that the power of LMR can have its real influence as its strength comes from deeper searches.

Why not play 3000 games instead of 30,000 and multiply the TC with 10? Because of our subject at hand we are not interested in an exact elo with 2 decimals, an error-margin of +5/-5 elo is perfectly acceptable.

Ed

hyatt · Post by **hyatt** » Wed Jul 14, 2010 9:57 pm

Rebel wrote:
hyatt wrote: For both LMR and NM, your numbers look pretty close to mine with Crafty. Why stockfish only drops off by 50 or so when LMR is removed is not known, yet...
SF 1.8 vs SF 1.8 (no LMR) 15min blitz ended in +37 =56 -7 way above 50 elo.

You will argue of course but I think for this kind of search related test you need a decent TC so that the power of LMR can have its real influence as its strength comes from deeper searches.

Why not play 3000 games instead of 30,000 and multiply the TC with 10? Because of our subject at hand we are not interested in an exact elo with 2 decimals, an error-margin of +5/-5 elo is perfectly acceptable.

Ed

I can probably get away with a small number of games since the two programs are not going to be within 20 of each other. However, I am running 1+1 so this will turn into 10m+10s which will be pretty long. Will start it right now...

However, last time I tried this, I tried 10s+0.1s, 1m+1s and 5m+5s and did not see any significant difference in the spread between the two versions. Let's see what happens to stockfish, again with no book or anything...

hyatt · Post by **hyatt** » Wed Jul 14, 2010 10:48 pm

OK, I am now playing 10m+10s games (a good bit slower than 15+0 blitz, but it avoids any time scrambles). It is slow going, but I am simply playing stockfish-normal against stockfish-noLMR + 4 other opponents. And then stockfish-LMR against the other 4 opponents as well. Looks like this so far:

Code: Select all

Rank Name                  Elo    +    - games score oppo. draws
   1 Stockfish 1.8 64bit   2793   67   67    65   75%  2602   35% 
   2 Stockfish 1.8a 64bit  2728   66   66    62   60%  2626   37% 
   3 Toga2                 2649  114  114    20   35%  2763   30% 
   4 Glaurung 2.2          2590  132  132    14   25%  2765   36% 
   5 Fruit 2.1             2436  149  149    19   13%  2758    5% 
   6 Glaurung 1.1 SMP      2404  146  146    22    9%  2760    9%

Since this is about 2 games per hour per node, and I am currently running on 1/2 the cluster less 6 nodes (someone else is using those) that leaves a good 50 nodes or so to run. So about 100 games per hour or so. Should hopefully have this done by tomorrow, although I will post an update every now and then. Right now +65 separates the two.

hyatt · Post by **hyatt** » Thu Jul 15, 2010 12:56 am

results pretty stable, so far:

Code: Select all

Rank Name                  Elo    +    - games score oppo. draws
   1 Stockfish 1.8 64bit   2801   34   34   272   78%  2589   33% 
   2 Stockfish 1.8a 64bit  2742   32   32   268   66%  2612   34% 
   3 Glaurung 2.2          2579   58   58    86   22%  2771   36% 
   4 Toga2                 2549   65   65    78   19%  2772   26% 
   5 Fruit 2.1             2472   69   69    87   13%  2772   18% 
   6 Glaurung 1.1 SMP      2457   66   66   103   12%  2772   17%

1.8a has no LMR, 1.8 is normal. +59 so far.

As I said, I don't find any difference at short or long time controls for this particular algorithm. Except perhaps down in the +/-10 range.

About 4 or 5 hours into the test, played just over 500 games so far...

Should be done tomorrow around noonish or so...

Just remembered, the total games is a bit misleading, because each stockfish is playing everybody else, so some of the games are counted twice if you add up the total games played, since 1.8 plays 1.8a and that shows up in a game for both 1.8 and 1.8a. So It is not playing quite as many games per hour as the above totals would represent, not that it matters.

hyatt · Post by **hyatt** » Thu Jul 15, 2010 2:47 am

Next update:

Code: Select all

Rank Name                  Elo    +    - games score oppo. draws
   1 Stockfish 1.8 64bit   2807   25   25   481   78%  2589   34% 
   2 Stockfish 1.8a 64bit  2742   24   24   470   66%  2613   37% 
   3 Toga2                 2566   47   47   142   21%  2775   29% 
   4 Glaurung 2.2          2560   45   45   152   19%  2775   32% 
   5 Fruit 2.1             2475   52   52   151   13%  2775   20% 
   6 Glaurung 1.1 SMP      2450   51   51   178   11%  2775   17%

now almost 1,000 games down, +65 for LMR over noLMR...

hyatt · Post by **hyatt** » Thu Jul 15, 2010 4:12 am

ok, now over 1,200 games into this. The error bars are down to +/-22, which means we are closing in on "the truth". Looks like the difference is now +62:

Code: Select all

Rank Name                  Elo    +    - games score oppo. draws
   1 Stockfish 1.8 64bit   2812   23   23   625   79%  2587   34% 
   2 Stockfish 1.8a 64bit  2750   22   22   609   67%  2611   35% 
   3 Toga2                 2571   41   41   187   21%  2781   29% 
   4 Glaurung 2.2          2556   41   41   199   18%  2781   31% 
   5 Fruit 2.1             2479   46   46   199   13%  2782   19% 
   6 Glaurung 1.1 SMP      2433   47   47   231   10%  2782   15%

hyatt · Post by **hyatt** » Thu Jul 15, 2010 5:14 am

+61 now, it is simply not going to change much more. Realistically even a +10 jump now would be a big one. last update tonight:

Code: Select all

Rank Name                  Elo    +    - games score oppo. draws
   1 Stockfish 1.8 64bit   2814   21   21   727   79%  2587   34% 
   2 Stockfish 1.8a 64bit  2753   20   20   709   68%  2610   36% 
   3 Toga2                 2567   38   38   223   20%  2783   30% 
   4 Glaurung 2.2          2562   38   38   228   19%  2784   31% 
   5 Fruit 2.1             2482   43   43   231   13%  2784   20% 
   6 Glaurung 1.1 SMP      2423   44   44   270    9%  2784   14%

OpenChess

OpenChess

value of LMR and null-move

Re: value of LMR and null-move

Re: value of LMR and null-move

Re: value of LMR and null-move

Re: value of LMR and null-move

Re: value of LMR and null-move

Re: value of LMR and null-move

Re: value of LMR and null-move

Re: value of LMR and null-move

Re: value of LMR and null-move

Re: value of LMR and null-move