FIDE Rules on ICGA - Rybka controversy

General discussion about computer chess...
BB+
Posts: 1484
Joined: Thu Jun 10, 2010 4:26 am

Re: FIDE Rules on ICGA - Rybka controversy

Post by BB+ » Mon May 25, 2015 5:49 am

Chris Whittington wrote:I would say your 0.8 for the rook-pawn code is an outrageous mistake.
It was 0.7 for "open" files and 0.6 for "semi-open" files.
Chris Whittington wrote:why no FILTERING?
There are three directions for filtering, namely reasons of efficiency, extrernal factors, or common use. The first two have much in common for computer programs, and an example could be that backward/weak pawns are implemented in a different way due to an external factor like board representation, possibly for efficiency. I tried not to make such filtrations when unnecessary, as it would tend to diminish the Rybka/Fruit differences. But I guess you are more concerned about common use. A group-wise comparison already has filtration by commonality built it. If every program has a feature, and implements it a similar manner, then all boats will be raised by the rising tide.

On the other hand, if there are (say) 10 features and two choices for the way of realising each, then you expect that if you take 10 engines for any feature to have 4 or 5 matches amongst the others, but it will still be strange to have an engine pair have 9 or 10 choices the same.

For instance:
00000 11111 Engine A
00000 11110 Engine B
10101 01010 Engine C
01100 11011 Engine D
11001 01100 Engine E
01001 00110 Engine F
Looking at the vertical columns (for features), one could "filter" every single feature on the basis that some engine in CDEF chose the same option, but this would miss the great A-B similarity.

BB+
Posts: 1484
Joined: Thu Jun 10, 2010 4:26 am

Re: FIDE Rules on ICGA - Rybka controversy

Post by BB+ » Mon May 25, 2015 5:52 am

Chris Whittington wrote:COMP EVAL was written as part of a mission? Isn't that otherwise known as bias?
To the extent that it was part of a "mission", the mission failed, as the two people who cared most about it in the beginning found it to be unimpressive (or maybe marginally impressive, if I wanted to be nice to myself). It was done at the request (more or less) of Panel members.
Chris Whittington wrote:On what basis are we expected to accept your choices?
I would expect that "independent confirmation" would be more useful than "peer review" in the given instance.

BB+
Posts: 1484
Joined: Thu Jun 10, 2010 4:26 am

Re: FIDE Rules on ICGA - Rybka controversy

Post by BB+ » Mon May 25, 2015 5:55 am

Rebel wrote:Then there are 2 ways to write an EVAL_COMP, the way Mark did, measuring the similarity between 8 engines, OR a SINGLE measuring between the 2 itself BASED on the accusation above. You get total different numbers.
The second way that you describe, a single measuring, was indeed done in Zach's document and then my RYBKA_FRUIT. This led to the claim of "Fruitification" of how Rybka was presented. Various panel members also suggested that some of the common elements in the Fruit/Rybka evals could be seen in other engines, and EVAL_COMP tried to put the Rybka/Fruit evaluation overlap into context.

BB+
Posts: 1484
Joined: Thu Jun 10, 2010 4:26 am

Re: FIDE Rules on ICGA - Rybka controversy

Post by BB+ » Mon May 25, 2015 5:58 am

Rebel wrote:I don't know even where to begin what's wrong with EVAL_COMP. Here is minor yet real funny one.

Evaluation bishop pair. Watkins similarity is 0.3
Fruit - classic evaluation
Rybka - no code at all, it's pre calculated in the MIT (material imbalance table)

Meaning, IN THE LIGHT OF THE ACCUSATION Rybka's eval is (ahem) virtual indentical to Fruit --> 0.0 similarity.
Both of them have the feature (the minimal requirement for a positive score), though they differ a lot in the details. Rybka uses a table, but that in and of itself is irrelevant (Fruit has a "hash table" for material evals, but does not pre-compute everything). If Rybka had a table that replicated a function of Fruit (as in other places), I don't see why that should have a dramatic impact at the applicable abstraction level. I also have no idea why "IN THE LIGHT OF THE ACCUSATION" should change the scoring.

BB+
Posts: 1484
Joined: Thu Jun 10, 2010 4:26 am

Re: FIDE Rules on ICGA - Rybka controversy

Post by BB+ » Mon May 25, 2015 6:05 am

Chris Whittington wrote:The bishop pair is a well known chess heuristic. It says "2 bishops is good" and better than "1 bishop" or "0 bishop". Therefore chess programmer, he count bishops and say if count>1 then bonus. Therefore ... FILTER this one OUT
Note that filtering anything where the Rybka/Fruit overlap is less than its average will tend to increase the final observation. Here 0.3 (bishop pair) is much less than is typical for the pair. One can also note (see EVAL_COMP) description that there are many ways of realising this well-known chess heuristic, and indeed only RESP/ExChess/Fruit have a complete matching. This is another sign that "bishop pair" in and of itself is too broad, and one needs to go to a finer level before arbitrarily "filtering" out (supposed) commonalities.

BB+
Posts: 1484
Joined: Thu Jun 10, 2010 4:26 am

Re: FIDE Rules on ICGA - Rybka controversy

Post by BB+ » Mon May 25, 2015 6:08 am

Rebel wrote:Wasn't the topic the evaluation of the bishop pair?
What's the similarity between Fruit and Rybka?
Where is that 0.3 coming from?
Both of them have the feature to some extent. As in the "Methodology" section (1.2) of EVAL_COMP, this means that there should be a nonzero score. As also explained there: Typically, for every matching sub-condition of a feature another tenth of a point would be added, with a subtraction if they had competing sub-conditions. In the case of bishop pair, the scores given were 0.0 (feature does not exist), 0.3 (small similarity), 0.5 (average similarity), 0.7 (stronger similarity), 1.0 (essentially the same).

BB+
Posts: 1484
Joined: Thu Jun 10, 2010 4:26 am

Re: FIDE Rules on ICGA - Rybka controversy

Post by BB+ » Mon May 25, 2015 6:09 am

Chris Whittington wrote:with a list of 8 programs to pair up, all the researcher need to do is find a pair which are so hacked around with the usual find-ELO desperations that the differences and stupidities zoom off into the stratosphere - and call that 0.0; then all the other pairs have to have raised numbers, even though they are still different, they are not insanely different. Hence some programs are more "not equal" than others.

"more not equal" maps to using values like 0.3 or 0.5, these are all just magical guesses for "not equal" but, eg, Faile and Crappy pair were even more "not equal" so they got the 0.0.
As explained in the methodology, "0.0" only occurs when one of the engines in the comparison essentially lacks the feature. An example would be Faile with the bishop pair.

BB+
Posts: 1484
Joined: Thu Jun 10, 2010 4:26 am

Re: FIDE Rules on ICGA - Rybka controversy

Post by BB+ » Mon May 25, 2015 6:13 am

Rebel wrote:Yep, with 8 engines to compare numbers have to correlate in order to make sense, in a direct compare the only thing that counts is Rybka <> Fruit at an exact understanding what the code actually evaluates, whole different game.
In the direct compare, one would have no idea what to filter, and the suspect engine (Rybka) could easily be "Fruitified". The use of other engines forms a useful comparison basis to determine (and perhaps quantify) how abnormal the Fruit/Rybka overlap is. In particular, (as an example) if every engine did isolated pawns the same way, it would be naturally filtered, and so any Rybka/Fruit matching therein would be attenuated. Similarly, if R/F have similarities in their features that do not appear in the other engines in the pool, this is a sign that such aspects should not be "filtered", as there are specific choices being made.

BB+
Posts: 1484
Joined: Thu Jun 10, 2010 4:26 am

Re: FIDE Rules on ICGA - Rybka controversy

Post by BB+ » Mon May 25, 2015 6:19 am

Rebel wrote:And it is something I have repeatedly asked and never got a good answer:
1. What is too much? And where in rule #2 is that described, defined?
See Levy's response in the ChessBase "interview". Similarly, the FIDE Anti-Cheating Commission chose not to define "cheating" per se, and in their commentary (on the ACC proposal) the FIDE Ethics Commission agreed that leaving the term undefined was warranted.
Rebel wrote:2. If it's not defined, then who decides what is too much?
The ICGA, either the Board or the Program Rights Committee (Article IV, Section 7).
Rebel wrote:3. How much is a programmer allowed to take from an open source?
It depends upon how much they declare on their submission form. Then the relevant ICGA organs can make a decision based upon sufficient information.

BB+
Posts: 1484
Joined: Thu Jun 10, 2010 4:26 am

Re: FIDE Rules on ICGA - Rybka controversy

Post by BB+ » Mon May 25, 2015 6:22 am

Rebel wrote:1. What's too much for you is acceptable for another programmer. Can you imagine that? Or is there only the Hyatt norm?
2. No definition of "too much" in rule #2.
3. Very very confusing for a (especially new) programmer.
One would think that the best course for a (new) programmer would be to inquire to the ICGA about the situation (assuming said programmer wants to compete in an ICGA event), being up front about the origins of the engine. It is only "very very confusing" if you try to make it so, and refrain from dissipating said confusion via suitable inquiries.

Post Reply