Page 2 of 4

Re: On "clone testing"

Posted: Tue Dec 28, 2010 3:49 am
by Sentinel
BB+ wrote:If my understanding is correct, it has to do with Stockfish polling only once every 30000 nodes, while IH does every 4K. This makes SF much more stable in short searches (in the case here, there is a nice point at which to break between 120000 and 150000 nodes, and usually it hits exactly that).
Yup, that is exactly the case, how well an engine respects TC.
Therefore, fixed depth tests are only meaningful.

Re: On "clone testing"

Posted: Tue Dec 28, 2010 3:53 am
by Sentinel
BB+ wrote:If my understanding is correct, the "go movetime 100" stability has to do with Stockfish polling only once every 30000 nodes, while IH does every 4K. This makes SF much more stable in short searches (in the case here, there is a nice point at which to break between 120000 and 150000 nodes, and usually it hits exactly that). Perhaps this is the sort of thing that should be "immediately obvious", as it were.
Another interesting thing. Strength-wise on ultra short TCs (in the range of couple of hundreds of polling intervals) faster polling (despite more time overhead) gives better results (why? because it gives more balanced move times during the game and smoother TM is always better than erratic one) ;).

Re: On "clone testing"

Posted: Tue Dec 28, 2010 4:19 am
by Sentinel
BB+ wrote:If my understanding is correct, the "go movetime 100" stability has to do with Stockfish polling only once every 30000 nodes, while IH does every 4K. This makes SF much more stable in short searches (in the case here, there is a nice point at which to break between 120000 and 150000 nodes, and usually it hits exactly that). Perhaps this is the sort of thing that should be "immediately obvious", as it were.
Just to conclude from this "immediately obvious" :).
Running Don's tool at anything below 10s per move gives basically more information on polling interval correlation between engines than anything else which makes all the tool results they are just publishing on CCC completely and utterly ridiculous.

Re: On "clone testing"

Posted: Tue Dec 28, 2010 4:31 am
by BB+
So you really think Naum 4.2 fitted the Rybka 2.2 polling mechanism? :lol:

Re: On "clone testing"

Posted: Tue Dec 28, 2010 4:43 am
by Sentinel
BB+ wrote:So you really think Naum 4.2 fitted the Rybka 2.2 polling mechanism? :lol:
If base score for self-testing is close (at 100ms for example), since their strength is already close, I would most probably say yes.

Re: On "clone testing"

Posted: Tue Dec 28, 2010 12:50 pm
by kingliveson
Dailey may be trying to have it both ways; on one hand, re-releasing his "clone detector," as "similar,"
starting yet another "clone detector" thread (edited out now), and saying it can be used to exonerate
programs, and on the other hand, surprised by poll showing people thinking it's an ultimate clone detecting tool. lol. It wouldn't suprise me if you get the same percentage voting yes on whether the earth is flat poll.

Anyways, it's a nice tool to have. When you think about it though, what is the difference between this
utility and Strategic Test Suite (STS) used to determine engines' strength? Already been said and debated, but again, it would make sense that given a postion, there is usually a "best move" and stronger engines will see that move. So there's definitely correlation between stregth and similarity.

There are hosts of flaws why this utility fails as a clone detector. For one, what is the cutoff percentage, 60, 75, or slightly greater? if one was to take a hex editor and modifies an engine's name for reasons I cannot understand, you could use this tool to check. The truth is, it's a blackbox method which can result in many false positives. Binary and source (if available) examination is probably the only way to truly make a determination. Excitement expressed by some of our friends who believe the alpha and omega clone detecting tool has arrived may be lack of technical background on the matter.

In anycase, I ran a small test using Dailey's "similar" tool and created a radar map that could potentially cause headaches if stared at too long:

Image


P.S. This tool confirms IvanHoe imperfect smp implementation. While other muilti-core engines use between 90-100% CPU resources, IvanHoe flirts around 50%. Can you say unecessary "if TITANIC_MODE for loop?!"

Re: On "clone testing"

Posted: Tue Dec 28, 2010 1:16 pm
by Jeremy Bernstein
kingliveson wrote:In anycase, I ran a small test using Dailey's "similar" tool and created a radar map that could potentially cause headaches if stared at too long:
Um, yeah. This sort of data is probably better visualized as a simple crosstable... The number of similar colors makes it kind of hard to figure out if Robbolito is similar to Fruit 2.1 or Houdini 1.01 (although I'm pretty sure it's Houdini), and some peaks are simply hidden behind others. Agreed that this isn't a cure-all, failsafe clone detector, though. The bumps between certain presumed close relatives, though, are difficult to ignore. Without more controls, though, it's hard to know what's statistically significant, even for the casual eyeballer.

Re: On "clone testing"

Posted: Tue Dec 28, 2010 3:04 pm
by kranium
wow...i don't really understand it
but it's beautiful!

have you ever considered a career in graphic design?

Re: On "clone testing"

Posted: Tue Dec 28, 2010 4:53 pm
by kingliveson
Get your magnifiers...hope the table help translates the plot.
Image

Re: On "clone testing"

Posted: Tue Dec 28, 2010 4:57 pm
by Sentinel
kingliveson wrote:Get your magnifiers...hope the table help translates the plot.
Since you put 100 for all the selftests which you would not get even with 1h per move TC. Did you actually run this, or you just filled in the table?