On "clone testing"

kingliveson · Post by **kingliveson** » Tue Dec 28, 2010 5:01 pm

Sentinel wrote:
kingliveson wrote:Get your magnifiers...hope the table help translates the plot.
Since you put 100 for all the selftests which you would not get even with 1h per move TC. Did you actually run this, or you just filled in the table?

It would be a mis-interpretation as it is a scale rather than self-test result.

Sentinel · Post by **Sentinel** » Tue Dec 28, 2010 5:14 pm

kingliveson wrote:
Sentinel wrote:
kingliveson wrote:Get your magnifiers...hope the table help translates the plot.
Since you put 100 for all the selftests which you would not get even with 1h per move TC. Did you actually run this, or you just filled in the table?
It would be a mis-interpretation as it is a scale rather than self-test result.

Problem is that rescaling makes results completely invalid.
The first necessary assumption for results to have any meaning (beside being polling mechanism correlation) is to find a way to have the same base (self-test) score for all.
Just to explain coz a lot of ppl seams not to understand the point.

Lets suppose a self-test score for Rybka 3 is 800, which means Rybka 3 misses 200 positions due to "polling mechanism effect".
For Robbolito self-test score is 700, which means Robbolito misses 300 positions due to "polling mechanism effect".
Now you run Robbo against Rybka 3 and get let's say 600. So they don't agree in 400 positions. How many of these 400 is just due to "polling mechanism effect"? How many of 600 were luckily chosen just due to "polling mechanism effect"?
We can never know.
You have uncertainty of "polling mechanism" of 20-30% and difference in actual results of less than 10%. And it's not only random error it's also highly skewed (systematic) mean value error therefore any error margins calculation is useless.

kingliveson · Post by **kingliveson** » Tue Dec 28, 2010 5:32 pm

Sentinel wrote:
kingliveson wrote:
Sentinel wrote:
kingliveson wrote:Get your magnifiers...hope the table help translates the plot.
Since you put 100 for all the selftests which you would not get even with 1h per move TC. Did you actually run this, or you just filled in the table?
It would be a mis-interpretation as it is a scale rather than self-test result.
Problem is that rescaling makes results completely invalid.
The first necessary assumption for results to have any meaning (beside being polling mechanism correlation) is to find a way to have the same base (self-test) score for all.
Just to explain coz a lot of ppl seams not to understand the point.

Lets suppose a self-test score for Rybka 3 is 800, which means Rybka 3 misses 200 positions due to "polling mechanism effect".
For Robbolito self-test score is 700, which means Robbolito misses 300 positions due to "polling mechanism effect".
Now you run Robbo against Rybka 3 and get let's say 600. So they don't agree in 400 positions. How many of these 400 is just due to "polling mechanism effect"? How many of 600 were luckily chosen just due to "polling mechanism effect"?
We can never know.
You have uncertainty of "polling mechanism" of 20-30% and difference in actual results of less than 10%. And it's not only random error it's also highly skewed (systematic) mean value error therefore any error margins calculation is useless.

You are talking orange and I, apple. No one said the results were gospel truth, but in order for the plot to match the output by the "similar" tool, it needs to be scaled to 100%.

Sentinel · Post by **Sentinel** » Tue Dec 28, 2010 5:49 pm

kingliveson wrote:You are talking orange and I, apple. No one said the results were gospel truth, but in order for the plot to match the output by the "similar" tool, it needs to be scaled to 100%.

I understand. It's nicer for a graph, however, why don't you give a table with non-scaled results?

kingliveson · Post by **kingliveson** » Tue Dec 28, 2010 6:13 pm

Sentinel wrote:
kingliveson wrote:You are talking orange and I, apple. No one said the results were gospel truth, but in order for the plot to match the output by the "similar" tool, it needs to be scaled to 100%.
I understand. It's nicer for a graph, however, why don't you give a table with non-scaled results?

It doesn't have anything to do with producing a nice graph. The plot is an actual representation of the output from the tool which was parsed and pasted onto the table.

Sentinel · Post by **Sentinel** » Tue Dec 28, 2010 6:22 pm

kingliveson wrote:It doesn't have anything to do with producing a nice graph. The plot is an actual representation of the output from the tool which was parsed and pasted onto the table.

I don't get you at all. You have 100 in the table and in the plot for self-test. Could you answer two simple questions?
Where did this 100 came from?
Do you have a real number or not, and in case you do, could you please post it?

kingliveson · Post by **kingliveson** » Tue Dec 28, 2010 6:27 pm

Sentinel wrote:
kingliveson wrote:It doesn't have anything to do with producing a nice graph. The plot is an actual representation of the output from the tool which was parsed and pasted onto the table.
I don't get you at all. You have 100 in the table and in the plot for self-test. Could you answer two simple questions?
Where did this 100 came from?
Do you have a real number or not, and in case you do, could you please post it?

X:\chess\similar>similar -r 25
------ Fruit 2.1 (time: 100 ms) ------
 66.85  Fruit Beta X1 (time: 100 ms)
 66.10  Fruit 2.3 (time: 100 ms)
 63.95  Strelka 2.0 B (time: 100 ms)
 62.10  Umko 1.1 x64 (time: 100 ms)
 61.80  Rybka 1.0 Beta 32-bit (time: 100 ms)
 60.95  Rybka 2.3.2a mp  (time: 100 ms)

How would you represent the following data a table and graph? Once you've answer this then you will see where the 100 comes from.

Edit: data was attached to this post.

Sentinel · Post by **Sentinel** » Tue Dec 28, 2010 6:36 pm

kingliveson wrote:
X:\chess\similar>similar -r 25 ------ Fruit 2.1 (time: 100 ms) ------ 66.85 Fruit Beta X1 (time: 100 ms) 66.10 Fruit 2.3 (time: 100 ms) 63.95 Strelka 2.0 B (time: 100 ms) 62.10 Umko 1.1 x64 (time: 100 ms) 61.80 Rybka 1.0 Beta 32-bit (time: 100 ms) 60.95 Rybka 2.3.2a mp (time: 100 ms)
How would you represent the following data a table and graph? Once you've answer this then you will see where the 100 comes from.

Lol, you can't even admit that you just filled 100 where the real number is something between 70 and 90. Try for example running Fruit 2.1(time: 100 ms) vs. Fruit 2.1 (time: 100 ms) and you might realize that your plot is nothing but computer generated random design as Norman humorously pointed out.

Edit: In case you don't know how to do it, just add another engine in you test called Fruit 2.1_identical_copy and see how many identical moves you would get with Fruit 2.1.

kingliveson · Post by **kingliveson** » Tue Dec 28, 2010 6:49 pm

Sentinel wrote:
kingliveson wrote:
X:\chess\similar>similar -r 25 ------ Fruit 2.1 (time: 100 ms) ------ 66.85 Fruit Beta X1 (time: 100 ms) 66.10 Fruit 2.3 (time: 100 ms) 63.95 Strelka 2.0 B (time: 100 ms) 62.10 Umko 1.1 x64 (time: 100 ms) 61.80 Rybka 1.0 Beta 32-bit (time: 100 ms) 60.95 Rybka 2.3.2a mp (time: 100 ms)
How would you represent the following data a table and graph? Once you've answer this then you will see where the 100 comes from.
Lol, you can't even admit that you just filled 100 where the real number is something between 70 and 90. Try for example running Fruit 2.1(time: 100 ms) vs. Fruit 2.1 (time: 100 ms) and you might realize that your plot is nothing but computer generated random design as Norman humorously pointed out.

Edit: In case you don't know how to do it, just add another engine in you test called Fruit 2.1_identical_copy and see how many identical moves you would get with Fruit 2.1.

It is obvious then that you have not read my post or just mis-understood. You are trying to prove to me the data is not accurate because an engine against itself will not score 100% -- that is another subject. The validity of the output to determine similarity, again, is another subject. How else can I make that clear?! The data plotted is actual representation of the output produced by the "similarity tool."

Sentinel · Post by **Sentinel** » Tue Dec 28, 2010 6:54 pm

kingliveson wrote:The data plotted is actual representation of the output produced by the "similarity tool."

No it's not. You have 24 data points in your plot that you've just invented.

OpenChess

OpenChess

On "clone testing"

Re: On "clone testing"

Re: On "clone testing"

Re: On "clone testing"

Re: On "clone testing"

Re: On "clone testing"

Re: On "clone testing"

Re: On "clone testing"

Re: On "clone testing"

Re: On "clone testing"

Re: On "clone testing"