It would be a mis-interpretation as it is a scale rather than self-test result.Sentinel wrote:Since you put 100 for all the selftests which you would not get even with 1h per move TC. Did you actually run this, or you just filled in the table?kingliveson wrote:Get your magnifiers...hope the table help translates the plot.
On "clone testing"
- kingliveson
- Posts: 1388
- Joined: Thu Jun 10, 2010 1:22 am
- Real Name: Franklin Titus
- Location: 28°32'1"N 81°22'33"W
Re: On "clone testing"
PAWN : Knight >> Bishop >> Rook >>Queen
Re: On "clone testing"
Problem is that rescaling makes results completely invalid.kingliveson wrote:It would be a mis-interpretation as it is a scale rather than self-test result.Sentinel wrote:Since you put 100 for all the selftests which you would not get even with 1h per move TC. Did you actually run this, or you just filled in the table?kingliveson wrote:Get your magnifiers...hope the table help translates the plot.
The first necessary assumption for results to have any meaning (beside being polling mechanism correlation) is to find a way to have the same base (self-test) score for all.
Just to explain coz a lot of ppl seams not to understand the point.
Lets suppose a self-test score for Rybka 3 is 800, which means Rybka 3 misses 200 positions due to "polling mechanism effect".
For Robbolito self-test score is 700, which means Robbolito misses 300 positions due to "polling mechanism effect".
Now you run Robbo against Rybka 3 and get let's say 600. So they don't agree in 400 positions. How many of these 400 is just due to "polling mechanism effect"? How many of 600 were luckily chosen just due to "polling mechanism effect"?
We can never know.
You have uncertainty of "polling mechanism" of 20-30% and difference in actual results of less than 10%. And it's not only random error it's also highly skewed (systematic) mean value error therefore any error margins calculation is useless.
- kingliveson
- Posts: 1388
- Joined: Thu Jun 10, 2010 1:22 am
- Real Name: Franklin Titus
- Location: 28°32'1"N 81°22'33"W
Re: On "clone testing"
You are talking orange and I, apple. No one said the results were gospel truth, but in order for the plot to match the output by the "similar" tool, it needs to be scaled to 100%.Sentinel wrote:Problem is that rescaling makes results completely invalid.kingliveson wrote:It would be a mis-interpretation as it is a scale rather than self-test result.Sentinel wrote:Since you put 100 for all the selftests which you would not get even with 1h per move TC. Did you actually run this, or you just filled in the table?kingliveson wrote:Get your magnifiers...hope the table help translates the plot.
The first necessary assumption for results to have any meaning (beside being polling mechanism correlation) is to find a way to have the same base (self-test) score for all.
Just to explain coz a lot of ppl seams not to understand the point.
Lets suppose a self-test score for Rybka 3 is 800, which means Rybka 3 misses 200 positions due to "polling mechanism effect".
For Robbolito self-test score is 700, which means Robbolito misses 300 positions due to "polling mechanism effect".
Now you run Robbo against Rybka 3 and get let's say 600. So they don't agree in 400 positions. How many of these 400 is just due to "polling mechanism effect"? How many of 600 were luckily chosen just due to "polling mechanism effect"?
We can never know.
You have uncertainty of "polling mechanism" of 20-30% and difference in actual results of less than 10%. And it's not only random error it's also highly skewed (systematic) mean value error therefore any error margins calculation is useless.
PAWN : Knight >> Bishop >> Rook >>Queen
Re: On "clone testing"
I understand. It's nicer for a graph, however, why don't you give a table with non-scaled results?kingliveson wrote:You are talking orange and I, apple. No one said the results were gospel truth, but in order for the plot to match the output by the "similar" tool, it needs to be scaled to 100%.
- kingliveson
- Posts: 1388
- Joined: Thu Jun 10, 2010 1:22 am
- Real Name: Franklin Titus
- Location: 28°32'1"N 81°22'33"W
Re: On "clone testing"
It doesn't have anything to do with producing a nice graph. The plot is an actual representation of the output from the tool which was parsed and pasted onto the table.Sentinel wrote:I understand. It's nicer for a graph, however, why don't you give a table with non-scaled results?kingliveson wrote:You are talking orange and I, apple. No one said the results were gospel truth, but in order for the plot to match the output by the "similar" tool, it needs to be scaled to 100%.
PAWN : Knight >> Bishop >> Rook >>Queen
Re: On "clone testing"
I don't get you at all. You have 100 in the table and in the plot for self-test. Could you answer two simple questions?kingliveson wrote:It doesn't have anything to do with producing a nice graph. The plot is an actual representation of the output from the tool which was parsed and pasted onto the table.
Where did this 100 came from?
Do you have a real number or not, and in case you do, could you please post it?
- kingliveson
- Posts: 1388
- Joined: Thu Jun 10, 2010 1:22 am
- Real Name: Franklin Titus
- Location: 28°32'1"N 81°22'33"W
Re: On "clone testing"
Sentinel wrote:I don't get you at all. You have 100 in the table and in the plot for self-test. Could you answer two simple questions?kingliveson wrote:It doesn't have anything to do with producing a nice graph. The plot is an actual representation of the output from the tool which was parsed and pasted onto the table.
Where did this 100 came from?
Do you have a real number or not, and in case you do, could you please post it?
X:\chess\similar>similar -r 25 ------ Fruit 2.1 (time: 100 ms) ------ 66.85 Fruit Beta X1 (time: 100 ms) 66.10 Fruit 2.3 (time: 100 ms) 63.95 Strelka 2.0 B (time: 100 ms) 62.10 Umko 1.1 x64 (time: 100 ms) 61.80 Rybka 1.0 Beta 32-bit (time: 100 ms) 60.95 Rybka 2.3.2a mp (time: 100 ms)
Edit: data was attached to this post.
PAWN : Knight >> Bishop >> Rook >>Queen
Re: On "clone testing"
Lol, you can't even admit that you just filled 100 where the real number is something between 70 and 90. Try for example running Fruit 2.1(time: 100 ms) vs. Fruit 2.1 (time: 100 ms) and you might realize that your plot is nothing but computer generated random design as Norman humorously pointed out.kingliveson wrote:How would you represent the following data a table and graph? Once you've answer this then you will see where the 100 comes from.
X:\chess\similar>similar -r 25 ------ Fruit 2.1 (time: 100 ms) ------ 66.85 Fruit Beta X1 (time: 100 ms) 66.10 Fruit 2.3 (time: 100 ms) 63.95 Strelka 2.0 B (time: 100 ms) 62.10 Umko 1.1 x64 (time: 100 ms) 61.80 Rybka 1.0 Beta 32-bit (time: 100 ms) 60.95 Rybka 2.3.2a mp (time: 100 ms)
Edit: In case you don't know how to do it, just add another engine in you test called Fruit 2.1_identical_copy and see how many identical moves you would get with Fruit 2.1.
- kingliveson
- Posts: 1388
- Joined: Thu Jun 10, 2010 1:22 am
- Real Name: Franklin Titus
- Location: 28°32'1"N 81°22'33"W
Re: On "clone testing"
It is obvious then that you have not read my post or just mis-understood. You are trying to prove to me the data is not accurate because an engine against itself will not score 100% -- that is another subject. The validity of the output to determine similarity, again, is another subject. How else can I make that clear?! The data plotted is actual representation of the output produced by the "similarity tool."Sentinel wrote:Lol, you can't even admit that you just filled 100 where the real number is something between 70 and 90. Try for example running Fruit 2.1(time: 100 ms) vs. Fruit 2.1 (time: 100 ms) and you might realize that your plot is nothing but computer generated random design as Norman humorously pointed out.kingliveson wrote:How would you represent the following data a table and graph? Once you've answer this then you will see where the 100 comes from.
X:\chess\similar>similar -r 25 ------ Fruit 2.1 (time: 100 ms) ------ 66.85 Fruit Beta X1 (time: 100 ms) 66.10 Fruit 2.3 (time: 100 ms) 63.95 Strelka 2.0 B (time: 100 ms) 62.10 Umko 1.1 x64 (time: 100 ms) 61.80 Rybka 1.0 Beta 32-bit (time: 100 ms) 60.95 Rybka 2.3.2a mp (time: 100 ms)
Edit: In case you don't know how to do it, just add another engine in you test called Fruit 2.1_identical_copy and see how many identical moves you would get with Fruit 2.1.
PAWN : Knight >> Bishop >> Rook >>Queen
Re: On "clone testing"
No it's not. You have 24 data points in your plot that you've just invented.kingliveson wrote:The data plotted is actual representation of the output produced by the "similarity tool."