On "clone testing"

General discussion about computer chess...
Sentinel
Posts: 122
Joined: Thu Jun 10, 2010 12:49 am
Real Name: Milos Stanisavljevic

Re: On "clone testing"

Post by Sentinel » Tue Dec 28, 2010 3:49 am

BB+ wrote:If my understanding is correct, it has to do with Stockfish polling only once every 30000 nodes, while IH does every 4K. This makes SF much more stable in short searches (in the case here, there is a nice point at which to break between 120000 and 150000 nodes, and usually it hits exactly that).
Yup, that is exactly the case, how well an engine respects TC.
Therefore, fixed depth tests are only meaningful.

Sentinel
Posts: 122
Joined: Thu Jun 10, 2010 12:49 am
Real Name: Milos Stanisavljevic

Re: On "clone testing"

Post by Sentinel » Tue Dec 28, 2010 3:53 am

BB+ wrote:If my understanding is correct, the "go movetime 100" stability has to do with Stockfish polling only once every 30000 nodes, while IH does every 4K. This makes SF much more stable in short searches (in the case here, there is a nice point at which to break between 120000 and 150000 nodes, and usually it hits exactly that). Perhaps this is the sort of thing that should be "immediately obvious", as it were.
Another interesting thing. Strength-wise on ultra short TCs (in the range of couple of hundreds of polling intervals) faster polling (despite more time overhead) gives better results (why? because it gives more balanced move times during the game and smoother TM is always better than erratic one) ;).

Sentinel
Posts: 122
Joined: Thu Jun 10, 2010 12:49 am
Real Name: Milos Stanisavljevic

Re: On "clone testing"

Post by Sentinel » Tue Dec 28, 2010 4:19 am

BB+ wrote:If my understanding is correct, the "go movetime 100" stability has to do with Stockfish polling only once every 30000 nodes, while IH does every 4K. This makes SF much more stable in short searches (in the case here, there is a nice point at which to break between 120000 and 150000 nodes, and usually it hits exactly that). Perhaps this is the sort of thing that should be "immediately obvious", as it were.
Just to conclude from this "immediately obvious" :).
Running Don's tool at anything below 10s per move gives basically more information on polling interval correlation between engines than anything else which makes all the tool results they are just publishing on CCC completely and utterly ridiculous.

BB+
Posts: 1484
Joined: Thu Jun 10, 2010 4:26 am

Re: On "clone testing"

Post by BB+ » Tue Dec 28, 2010 4:31 am

So you really think Naum 4.2 fitted the Rybka 2.2 polling mechanism? :lol:

Sentinel
Posts: 122
Joined: Thu Jun 10, 2010 12:49 am
Real Name: Milos Stanisavljevic

Re: On "clone testing"

Post by Sentinel » Tue Dec 28, 2010 4:43 am

BB+ wrote:So you really think Naum 4.2 fitted the Rybka 2.2 polling mechanism? :lol:
If base score for self-testing is close (at 100ms for example), since their strength is already close, I would most probably say yes.

User avatar
kingliveson
Posts: 1388
Joined: Thu Jun 10, 2010 1:22 am
Real Name: Franklin Titus
Location: 28°32'1"N 81°22'33"W

Re: On "clone testing"

Post by kingliveson » Tue Dec 28, 2010 12:50 pm

Dailey may be trying to have it both ways; on one hand, re-releasing his "clone detector," as "similar,"
starting yet another "clone detector" thread (edited out now), and saying it can be used to exonerate
programs, and on the other hand, surprised by poll showing people thinking it's an ultimate clone detecting tool. lol. It wouldn't suprise me if you get the same percentage voting yes on whether the earth is flat poll.

Anyways, it's a nice tool to have. When you think about it though, what is the difference between this
utility and Strategic Test Suite (STS) used to determine engines' strength? Already been said and debated, but again, it would make sense that given a postion, there is usually a "best move" and stronger engines will see that move. So there's definitely correlation between stregth and similarity.

There are hosts of flaws why this utility fails as a clone detector. For one, what is the cutoff percentage, 60, 75, or slightly greater? if one was to take a hex editor and modifies an engine's name for reasons I cannot understand, you could use this tool to check. The truth is, it's a blackbox method which can result in many false positives. Binary and source (if available) examination is probably the only way to truly make a determination. Excitement expressed by some of our friends who believe the alpha and omega clone detecting tool has arrived may be lack of technical background on the matter.

In anycase, I ran a small test using Dailey's "similar" tool and created a radar map that could potentially cause headaches if stared at too long:

Image


P.S. This tool confirms IvanHoe imperfect smp implementation. While other muilti-core engines use between 90-100% CPU resources, IvanHoe flirts around 50%. Can you say unecessary "if TITANIC_MODE for loop?!"
Attachments
similarity.7z
similarity.data
(733.17 KiB) Downloaded 317 times
PAWN : Knight >> Bishop >> Rook >>Queen

Jeremy Bernstein
Site Admin
Posts: 1226
Joined: Wed Jun 09, 2010 7:49 am
Real Name: Jeremy Bernstein
Location: Berlin, Germany
Contact:

Re: On "clone testing"

Post by Jeremy Bernstein » Tue Dec 28, 2010 1:16 pm

kingliveson wrote:In anycase, I ran a small test using Dailey's "similar" tool and created a radar map that could potentially cause headaches if stared at too long:
Um, yeah. This sort of data is probably better visualized as a simple crosstable... The number of similar colors makes it kind of hard to figure out if Robbolito is similar to Fruit 2.1 or Houdini 1.01 (although I'm pretty sure it's Houdini), and some peaks are simply hidden behind others. Agreed that this isn't a cure-all, failsafe clone detector, though. The bumps between certain presumed close relatives, though, are difficult to ignore. Without more controls, though, it's hard to know what's statistically significant, even for the casual eyeballer.

kranium
Posts: 55
Joined: Mon Aug 02, 2010 10:49 pm
Real Name: Norman Schmidt

Re: On "clone testing"

Post by kranium » Tue Dec 28, 2010 3:04 pm

wow...i don't really understand it
but it's beautiful!

have you ever considered a career in graphic design?

User avatar
kingliveson
Posts: 1388
Joined: Thu Jun 10, 2010 1:22 am
Real Name: Franklin Titus
Location: 28°32'1"N 81°22'33"W

Re: On "clone testing"

Post by kingliveson » Tue Dec 28, 2010 4:53 pm

Get your magnifiers...hope the table help translates the plot.
Image
PAWN : Knight >> Bishop >> Rook >>Queen

Sentinel
Posts: 122
Joined: Thu Jun 10, 2010 12:49 am
Real Name: Milos Stanisavljevic

Re: On "clone testing"

Post by Sentinel » Tue Dec 28, 2010 4:57 pm

kingliveson wrote:Get your magnifiers...hope the table help translates the plot.
Since you put 100 for all the selftests which you would not get even with 1h per move TC. Did you actually run this, or you just filled in the table?

Post Reply