FIDE Rules on ICGA - Rybka controversy

General discussion about computer chess...
User avatar
Rebel
Posts: 515
Joined: Wed Jun 09, 2010 7:45 pm
Real Name: Ed Schroder

Re: FIDE Rules on ICGA - Rybka controversy

Post by Rebel » Mon May 25, 2015 8:01 pm

BB+ wrote:
Rebel wrote:they (Zach and/or Mark) came accross the observation Rybka uses the quite extraordinary pawn-value of 3200 in EVAL.
The value was 3399 in early Rybkas, and then 3717 in some of the R2 versions. Kaufman changed it to 1000.
Ed Schröder wrote:It's this kind of impressive original findings (3200) why Vasik Rajlich could dominate the computer chess world for the last 5 years, the man simply is a genius, an IM (International Master) and graduated MIT student.
Such pomposity. :roll: Kaufman is (now) a GM, also graduated from MIT, and chose 1000, rather than a tuned value.
Rebel wrote:And [hyatt] missed that he was an eyewitness of a brandnew idea in computer chess. 2 seperate scores, one in EVAL (3200) and one in SEARCH (100)
Err, this idea is so "brandnew" that similar re-scalings were already done back in the 70s. Furthermore, I don't see any particular reason to lose the extra granularity from eval when squashing it back to search, other than to keep the score fitting in (say) a 16-bit field. Possibly MTD(f) is an explanation, but there is no evidence of this with Rybka 1.0, and as Zach said, it looked like a bodging together of two separate modules of code (and indeed, the lazy eval was merged incorrectly).
BB+ wrote:
Rebel wrote:Using the 100 system (say) we have 3 bonuses of (say) 0.04 | 0.07 | 0.12 = total 0.23
Using the 1000 system allows us to fine-tune these (same) values much better (say) 0.048 | 0.075 | 0129 = total = 252 / 100 = 0.25
I'm fairly certain the given millipawn values would round to "0.05 | 0.07 (or 0.08) | 0.13". Crafty used millipawns for awhile (I can remember them back in 1994/5 as an undergrad), and the CPW page on centipawns describes Bob Hyatt's views on this. More modernly, Kaufman used 1000 as the base in R3, though anything finer than 1/200 is rare to be used in the code, and some Houdinis also use 1/200. Stockfish uses "hexapawns" (1/256), though there is some granularity thrown away.
I think you are still missing the beauty of the idea, alas. And using 2 separate scores (one in eval, one in search) is certainly new. As for his move to 1000 we can only speculate (good enough as 3200/3399, less headaches awarding bonusses, penalties) but the system remains the same, better rounded (more precise) eval scores and then the transfer to 100. Crafty certainly did not transfer the eval score of 1000 to 100 for search. Last, 3399 IMO is a parameter for the base value of 3200 increasing the score with +/- 6% but YMMV.

User avatar
Rebel
Posts: 515
Joined: Wed Jun 09, 2010 7:45 pm
Real Name: Ed Schroder

Re: FIDE Rules on ICGA - Rybka controversy

Post by Rebel » Mon May 25, 2015 8:46 pm

BB+ wrote:
Rebel wrote:This obviously is going the wrong way. If you can't even admit a (small) mistake of your work then Mr. Schröder is done with you.

Code: Select all

             FRUIT                                 RYBKA
             if (doubled) {                        if (doubled) endgame -= 158
                opening[me] -= DoubledOpening;
                endgame[me] -= DoubledEndgame;
There is NO midgame code in Rybka. In your document it is rewarded as a 100% similarity with Fruit.
As I stated, the interpolated opening/endgame value is almost always nonzero, it is only zero at the beginning of the game until a piece is taken off the board. It does not seem useful to disregard a feature simply because one of the endpoints of the interpolation happens to be 0, which might even be caused by a compiler optimisation. Supposing a linear interpolation (as in later Rybkas, though 1.0 was non-linear), one gets the following effective (rounded) values for Rybka's doubled pawn malus, depending on the phase (=24 is the game start, as with Fruit).

Code: Select all

0 7 13 20 26 33 40 46 53 59 66 72 79 86 92 99 105 112 119 125 132 138 145 151 158
I am not sure why you want the solitary "0" term to indicate there is some significant Rybka/Fruit difference. Should a zero in the middle of some long array of weights invalidate a comparison? Why is it different when it is an endpoint? Moreover, Marsland (ICCA Journal 8/2, 1985) already mentions that while doubled pawns are usually weak, there are often compensating advantages such as a half open file or control of a key square.
You totally lost me. A middle game double pawn is NOT evaluated in Rybka. To assume (like Bob) there is because it fits your theory is evidence as I said all along, tunnel vision. If courts starts reasoning as you (both) do then we need to triple the number of prisons. It stops all further discussion.

I will write something in general later.

hyatt
Posts: 1242
Joined: Thu Jun 10, 2010 2:13 am
Real Name: Bob Hyatt (Robert M. Hyatt)
Location: University of Alabama at Birmingham
Contact:

Re: FIDE Rules on ICGA - Rybka controversy

Post by hyatt » Mon May 25, 2015 9:01 pm

Rebel wrote:
BB+ wrote:
Rebel wrote:This obviously is going the wrong way. If you can't even admit a (small) mistake of your work then Mr. Schröder is done with you.

Code: Select all

             FRUIT                                 RYBKA
             if (doubled) {                        if (doubled) endgame -= 158
                opening[me] -= DoubledOpening;
                endgame[me] -= DoubledEndgame;
There is NO midgame code in Rybka. In your document it is rewarded as a 100% similarity with Fruit.
As I stated, the interpolated opening/endgame value is almost always nonzero, it is only zero at the beginning of the game until a piece is taken off the board. It does not seem useful to disregard a feature simply because one of the endpoints of the interpolation happens to be 0, which might even be caused by a compiler optimisation. Supposing a linear interpolation (as in later Rybkas, though 1.0 was non-linear), one gets the following effective (rounded) values for Rybka's doubled pawn malus, depending on the phase (=24 is the game start, as with Fruit).

Code: Select all

0 7 13 20 26 33 40 46 53 59 66 72 79 86 92 99 105 112 119 125 132 138 145 151 158
I am not sure why you want the solitary "0" term to indicate there is some significant Rybka/Fruit difference. Should a zero in the middle of some long array of weights invalidate a comparison? Why is it different when it is an endpoint? Moreover, Marsland (ICCA Journal 8/2, 1985) already mentions that while doubled pawns are usually weak, there are often compensating advantages such as a half open file or control of a key square.
You totally lost me. A middle game double pawn is NOT evaluated in Rybka. To assume (like Bob) there is because it fits your theory is evidence as I said all along, tunnel vision. If courts starts reasoning as you (both) do then we need to triple the number of prisons. It stops all further discussion.

I will write something in general later.

Ed, read what he is trying to explain. With ALL material present, the doubled pawn penalty is 0 (although it would be impossible to have a doubled pawn with all material present obviously), because interpolation would use 100% of MG penalty and 0% of EG penalty. With material <= endgame_threshold, the full endgame penalty is applied. But what if 1/4 of the material is missing? You start with 39 units, when you get to 30 this is certainly closing in on the endgame threshold. A pretty common number might be 20 units of material, some pieces some pawns. So at 10 units gone, the doubled pawn penalty is now 1/2 the endgame penalty. By the time you get to 20 units remaining, interpolation now makes this 100% of the endgame penalty. IE there IS a doubled pawn penalty in the middle game, it is just a fraction of the endgame penalty.

What if he had set the MG doubled pawn penalty at 1, the endgame at 20? Is that significantly different from 0 and 20? Of course not. Just because you don't want to see a middle game doubled pawn penalty doesn't mean there actually isn't one, however.

Here's a test for you. Set up a position that is equal, with all pieces present but set up a doubled pawn for one side. Check the score. Then remove queens. Check again. Did the score go up? Why? In Crafty you could see the actual pawn evaluation, both MG and EG, and then the composite (interpolated) score. Removing the queen just moves the composite score toward the EG score, nothing else.

User avatar
Chris Whittington
Posts: 437
Joined: Wed Jun 09, 2010 6:25 pm

Re: FIDE Rules on ICGA - Rybka controversy

Post by Chris Whittington » Tue May 26, 2015 12:12 pm

BB+ wrote:
Chris Whittington wrote:COMP EVAL was written as part of a mission? Isn't that otherwise known as bias?
To the extent that it was part of a "mission", the mission failed, as the two people who cared most about it in the beginning found it to be unimpressive (or maybe marginally impressive, if I wanted to be nice to myself). It was done at the request (more or less) of Panel members.
Chris Whittington wrote:On what basis are we expected to accept your choices?
I would expect that "independent confirmation" would be more useful than "peer review" in the given instance.
There is no "independent confirmation". Just a bunch of people still arguing. And the various variations of ponderhit tests do NOT confirm. The wheel of COMP EVAL is still in spin.

User avatar
Chris Whittington
Posts: 437
Joined: Wed Jun 09, 2010 6:25 pm

Re: FIDE Rules on ICGA - Rybka controversy

Post by Chris Whittington » Tue May 26, 2015 12:20 pm

BB+ wrote:
Rebel wrote:I don't know even where to begin what's wrong with EVAL_COMP. Here is minor yet real funny one.

Evaluation bishop pair. Watkins similarity is 0.3
Fruit - classic evaluation
Rybka - no code at all, it's pre calculated in the MIT (material imbalance table)

Meaning, IN THE LIGHT OF THE ACCUSATION Rybka's eval is (ahem) virtual indentical to Fruit --> 0.0 similarity.
Both of them have the feature (the minimal requirement for a positive score), though they differ a lot in the details. Rybka uses a table, but that in and of itself is irrelevant (Fruit has a "hash table" for material evals, but does not pre-compute everything). If Rybka had a table that replicated a function of Fruit (as in other places), I don't see why that should have a dramatic impact at the applicable abstraction level. I also have no idea why "IN THE LIGHT OF THE ACCUSATION" should change the scoring.
Whether table or not, at the relevant abstraction level, this particular feature (bishop pair) is too common to pin onto "copied" and should be filtered. imo. A massively tinkered bishop pair function with added this n that, sure, you can use in the copied list - but B>1 then add bonus is too trivially known. I note Hyatt tried a desperation defence of same "bug" in that 2B's, same colour are not trapped, but I doubt anybody seriously codes that in, basis waste of time.

And, if bishop-pair is encoded in the MIT, then are we sure that it is not actually dynamically linked to the other material present? eg the balance of N's could be relevent, as could BB vs RN - all things possible in MIT, but probably too lengthy for in-line code. My guess would be there's more dynamic linking in the MIT than we know about (else why have an MIT?) - which would make Rybka BB code VERY different indeed. Did you check?

User avatar
Chris Whittington
Posts: 437
Joined: Wed Jun 09, 2010 6:25 pm

Re: FIDE Rules on ICGA - Rybka controversy

Post by Chris Whittington » Tue May 26, 2015 12:43 pm

BB+ wrote:
Rebel wrote:And it is something I have repeatedly asked and never got a good answer:
1. What is too much? And where in rule #2 is that described, defined?
See Levy's response in the ChessBase "interview". Similarly, the FIDE Anti-Cheating Commission chose not to define "cheating" per se, and in their commentary (on the ACC proposal) the FIDE Ethics Commission agreed that leaving the term undefined was warranted.
Rebel wrote:2. If it's not defined, then who decides what is too much?
The ICGA, either the Board or the Program Rights Committee (Article IV, Section 7).
Rebel wrote:3. How much is a programmer allowed to take from an open source?
It depends upon how much they declare on their submission form. Then the relevant ICGA organs can make a decision based upon sufficient information.
Mark Watkins!! This is a bit cheeky, isn't it? Ed is asking a non-literal, almost rhetorical question. Your answer is so literal as to not answer it at all. Likewise the next post.

User avatar
Chris Whittington
Posts: 437
Joined: Wed Jun 09, 2010 6:25 pm

Re: FIDE Rules on ICGA - Rybka controversy

Post by Chris Whittington » Tue May 26, 2015 12:46 pm

Chris Whittington wrote:
BB+ wrote:
Rebel wrote:I don't know even where to begin what's wrong with EVAL_COMP. Here is minor yet real funny one.

Evaluation bishop pair. Watkins similarity is 0.3
Fruit - classic evaluation
Rybka - no code at all, it's pre calculated in the MIT (material imbalance table)

Meaning, IN THE LIGHT OF THE ACCUSATION Rybka's eval is (ahem) virtual indentical to Fruit --> 0.0 similarity.
Both of them have the feature (the minimal requirement for a positive score), though they differ a lot in the details. Rybka uses a table, but that in and of itself is irrelevant (Fruit has a "hash table" for material evals, but does not pre-compute everything). If Rybka had a table that replicated a function of Fruit (as in other places), I don't see why that should have a dramatic impact at the applicable abstraction level. I also have no idea why "IN THE LIGHT OF THE ACCUSATION" should change the scoring.
Whether table or not, at the relevant abstraction level, this particular feature (bishop pair) is too common to pin onto "copied" and should be filtered. imo. A massively tinkered bishop pair function with added this n that, sure, you can use in the copied list - but B>1 then add bonus is too trivially known. I note Hyatt tried a desperation defence of same "bug" in that 2B's, same colour are not trapped, but I doubt anybody seriously codes that in, basis waste of time.

And, if bishop-pair is encoded in the MIT, then are we sure that it is not actually dynamically linked to the other material present? eg the balance of N's could be relevent, as could BB vs RN - all things possible in MIT, but probably too lengthy for in-line code. My guess would be there's more dynamic linking in the MIT than we know about (else why have an MIT?) - which would make Rybka BB code VERY different indeed. Did you check?
I meant also to add - if "bishop pair" is in the MIT, then it surely becomes a second or third order calculation, because of interactions pieces, pawns etc etc. I think at this point you need either to stop calling it a "bishop pair" calculation to be compared to a first order B>1 then bonus. It is not really b-pair when second and third order. Which, according to your methodology, should score 0.0, no?

User avatar
Chris Whittington
Posts: 437
Joined: Wed Jun 09, 2010 6:25 pm

Re: FIDE Rules on ICGA - Rybka controversy

Post by Chris Whittington » Tue May 26, 2015 1:19 pm

BB+ wrote:
Chris Whittington wrote:Meanwhile every chess player will give a rook a BONUS for being on the open file, ZERO BONUS for being behind its own pawns, and a POSITIVE BONUS for being behind the enemy pawns
I'm not sure what you mean here by "behind", is it only on the same file? At least in rook endgames, being behind your (passed) pawn is quite useful. If by "behind the enemy pawns" you mean a general invasion criterion, then it seems that an easy implementation would have been to save some "penetration" demarcator in pawneval (either rank or square based, and can also include the openness of files), and then apply this to the rook.
Chris Whittington wrote:OK, your example. you know perfectly well that chess board pattern recognition as implemented by programmers are general purpose. Sure you can disrupt the tendency for the algorithm to put its rook on a specific file behind the enemy pawn chain by inserting a friendly pawn on a very advanced position on that same file. But what you are really trying to disrupt with this quite fanciful positioning (remember, we are discussing heavily blocked pawn frontages, with an open file penetration) is the argument that Rybka has a creative and original algorithm/pattern recognition, which is strongly superior. I''m shocked, Mark, that you stoop to this.
Some of the below (in #1) may have come from a discussion with others, I have put in (largely) my own words.

The feature tests for pawns in front and on the same file.

1) This has positive correlation to a general "penetration" criterion as a rook on the 7th/8th always gets the bonus, but it seems more likely that the "openness" of the file is its primary purpose. With penetration, the location of pawns off the tile would also be apt for consideration.

Consider, wRe1 wPf2 wPh6 bPe6 bPf7 bPh7 is scored the same as wRe5 wPf2 wPh2 bPe6 bPf7 bPh3 though the sense of penetration is quite different. This example can be made expanded, by whether the root pawn of a chain is attacked or attackable by the rook. The gazing only at pawns on the same file in front of the rook in these cases is only very weakly related to penetration. For most cases, determining whether the rook was behind many or all enemy pawns on any files, or just an enemy pawn on the same file, would better approximate penetration.

The claim that this feature is of consequent use on blocked boards is unclear too, for then the question would more often be whether the "invaded" side has any (pawn) weaknesses, and in addition whether said side can counterplay by the "open file" when the invader moves off it. Given the feature is immune to whether the file is actually "open", it doesn't form a sufficient locator for the rook. Also, the rook could more easily be in fact trapped in the situation.

2) I would be more willing to consider that the feature was some attempt at "pattern recognition" (on closed boards) if it had been done by a dopey academic. As it is, Rajlich is typically goal-oriented, and does not care about something like "implementing AI", but rather increasing ELO.

3) In its rook evaluation, Fruit does ABCDE where
  • A is a calculation of mobility,
  • B is a calculation of semi-open files,
  • C is a calculation of open files,
  • D checks whether to apply king-danger (material table flag), and if so calculates king safety with respect to semi/open files,
  • E is a criterion concerning the 7th rank.
Comparatively, Rybka's rook evaluation is similarly ABCDE with the difference that the bitmasks for "semi/open files" are directional. This gives more weight to the expectation that the BC's are similar in content.

I guess one could similarly call mobility something like "future movement capacity", and note that this is a "different" notion in closed positions, and then argue that some implementation detail (like safe square mobility, or forward mobility) means that should not be classified under "mobility" at all.
Chris Whittington wrote:(remember, we are discussing heavily blocked pawn frontages, with an open file penetration)
I am still not sure of your specific example. As noted above, the weakness of enemy pawns/squares is also of importance with blocked positions in addition to penetration, and there is no test done (in Rybka) to see if in fact the pawn frontage is heavily blocked before applying the feature, which as I say makes it as best a weak correlative.
A number of points.

I'm surprised you need to know the meaning of rook behind pawn. It is common chess player (>patzer) parlance, and forms part of the Tarrasch Rule, for example.
In general, yes it means same file as pawn; is sided depending on pawn colour: bP a4 has "behind" squares a5,a6,a7,a8; wp a4 would use a1,a2,a3.
Perhaps many of "our" differences are due to you're being principally a programmer, whereas I principally think chess?

Your second drift is mostly Hyatt-lite. Querying whether the Rybka coding is of any use, or whether it has anomolies. Fortunately, you don't go so far as to say it's a bug ;-)
It is not relevant how effective the code is, how well written, whether or not Crafty gets more or less ELO. The only point is whether or not Fruit == Rybka.

I can see from your programmer perspective that this is "just a little mask change", or from Hyqtt's perpective "it's just a rook lift", or same view from Zach "doesn't happen very often" (he, like Hyatt is thinking only of R along 3rd rank to join in against the King.
From my chess perspective, I see a massive difference. Fruit code (this particular bit we discuss) knows that behind own pawns is not useful, but knows ONLY that open file is useful (I'll ignore half open for ease of less words). Once on the open file, Fruit just wants to stay there.
Rybka knows all this as well, except it doesn't test for open file at all, but it also knows the regions to the sides of the open file, to the rear of the enemy pawns is good to. This is extra chess knowledge. Add in the effect of what you call "half-open" and there's more knowledge, chess.

Can you visualise Rook-square tables, one for each possible pawn formation? I hope so!
Then visualise a typical chess game. Start with 8+8 pawns, then gradually to 7+7, 6+6, 5+5 .... open file count increases from 0 to 1 to 2 etc ... (yes, I know there are imbalances, but ignore for ease of words again). You visualise a heat map for the R-square tables? You'll get dark squares (stuck behind own pawns), red squares and pink squares. Very different Fruit/Rybka in terms of chess, no? Chessically, they don't compare. Chessically and programmer logically it is not even open file and half open file that is being measured. There's no comparison; 0.0 should be your little number. Unless one is total beancounter programmer, when the only focus is the mask (which is a bug hahahahahahahaha!!!).
You are comparing CHESS programs, but omit the chess. imo.

Thirdly, I find your listing ABCDE not logical from a chess perspective.

A is a calculation of mobility. This is basically first order and about the Rook
B is a calculation of semi-open files. Second order, it involves pawns. You can just about get away with listing it under rook (first order) because pawn structure is relatively static.
C is a calculation of open files. Second order, as B above.
D checks whether to apply king-danger (material table flag), and if so calculates king safety with respect to semi/open files. This is majorly other piece type dependent, third order, doesn't belong here.
E is a criterion concerning the 7th rank. First or second order because usually King dependent, but you can just about get away with it, because of relative static nature of the King.

imo, D does not belong in this list. And, how you list things is crucially important for shoehorning (or not) into the Fruit and other programs template. I raise objection therefore to the HOW of the listing decisions. Here and elsewhere, this is just one example.

For Rybka, B and C it does not do. They don't compare as chess playing components.
A is majorly common code, and, as shown many times, by the time the data representation is abstracted away, there's no "code" left, only an equation.
E is majorly common code too.

I don't accept any correspondence Rybka-Fruit as you describe it for the rook and pawns. Either the method falls into the simple and bleeding obvious category, or it is actually very different in a way you three researchers didn't seem to note.

hyatt
Posts: 1242
Joined: Thu Jun 10, 2010 2:13 am
Real Name: Bob Hyatt (Robert M. Hyatt)
Location: University of Alabama at Birmingham
Contact:

Re: FIDE Rules on ICGA - Rybka controversy

Post by hyatt » Tue May 26, 2015 5:35 pm

Regardless of how much you prance around claiming that the "rook behind enemy pawns code is a good idea, it fails miserably. I tested it in TWO different programs, Crafty and Fruit. Both do the USUAL rook on open/half-open file scoring. I made them both use the "rybka idea" and both saw a SIGNIFICANT Elo drop. Because leaving the file is NOT a good idea in more cases than when it is.. If this is an example of YOUR chess skill, you ought to stop talking about it because it is bad. It MIGHT work if it paid attention to the pawn structure and got behind a WEAK pawn, but not just behind ANY pawn. Chalk this one up to "bad idea, or bad translation." Whichever one it is, it is "bad". And it isn't going to get any better.

User avatar
Chris Whittington
Posts: 437
Joined: Wed Jun 09, 2010 6:25 pm

Re: FIDE Rules on ICGA - Rybka controversy

Post by Chris Whittington » Tue May 26, 2015 8:04 pm

hyatt wrote:Regardless of how much you prance around claiming that the "rook behind enemy pawns code is a good idea, it fails miserably. I tested it in TWO different programs, Crafty and Fruit. Both do the USUAL rook on open/half-open file scoring. I made them both use the "rybka idea" and both saw a SIGNIFICANT Elo drop. Because leaving the file is NOT a good idea in more cases than when it is.. If this is an example of YOUR chess skill, you ought to stop talking about it because it is bad. It MIGHT work if it paid attention to the pawn structure and got behind a WEAK pawn, but not just behind ANY pawn. Chalk this one up to "bad idea, or bad translation." Whichever one it is, it is "bad". And it isn't going to get any better.
I already told you, it is quite irrelevent whether the code is good, bad or merely indifferent. Stop trying to derail the topic. What matters in this context is whether Rybka=Fruit, not whether the code works when Hyatt tries it out in his test environment. Nothing manages to work in the Hyatt test environment, hence -600 ELO. See Rybka Forum for details.

Don't bother to reply Hyatt. As a useful idiot you've been very helpful to us in suiciding your own case over the last four years, but you usefulness for us has now passed. Bye.

Post Reply