Testing methodology

BB+ · Post by **BB+** » Tue Mar 27, 2012 1:03 pm

Suppose Engine A is tuned by games (self-play or otherwise) from a standard opening book.
Suppose Engine B is similar, except that a book of 100K positions from early-game Chess 960 positions are used instead.
Should Engine B be better at Chess 960? Should Engine A be (slightly) better at normal chess?

hyatt · Post by **hyatt** » Tue Mar 27, 2012 6:24 pm

BB+ wrote:Suppose Engine A is tuned by games (self-play or otherwise) from a standard opening book.
Suppose Engine B is similar, except that a book of 100K positions from early-game Chess 960 positions are used instead.
Should Engine B be better at Chess 960? Should Engine A be (slightly) better at normal chess?

I think the engines have to be somewhat different from the get-go. The evaluation has to change depending on the starting position. Just swapping the knights and bishops starting squares is a big change. Suddenly g3 or b3 is not such a hot move to play since the bishop that defends those weak squares is on the other side of the board...

Clearly, as you test, you tune to how you are testing. Probably the best approach is to tune playing just the openings you plan on using in real games. But that leaves holes that testing over all openings will help you find and fill. I'm more into playing better overall, as opposed to playing better in a specific opening. I have not really thought much about chess 960 and how it is different from normal chess... I simply have not played it personally, and don't have a real feel other than what I suspect based on years of playing.

Don · Post by **Don** » Fri Dec 07, 2012 6:08 am

BB+ wrote:Suppose Engine A is tuned by games (self-play or otherwise) from a standard opening book.
Suppose Engine B is similar, except that a book of 100K positions from early-game Chess 960 positions are used instead.
Should Engine B be better at Chess 960? Should Engine A be (slightly) better at normal chess?

I have a related theory on this. I have wondered what would happen if a program were tuned with random positions instead of standard opening moves. Suppose we generated 10 random moves for each side and then made some effort to discard positions that were obviously losing. My theory is that it might produce a more "robust" and interesting player. It would be forced to handle "non-standard" types of positions such as ridiculous pawn structures, pieces hopeless buried, kings exposed and so on. Modern openings will rarely expose those features but it seems like it would be useful for a program to have some kind of experience dealing with them.

A similar idea is to generate totally random (but legal) positions. I think every program is built with all sorts of assumption built in to them, such as "rook to the 7th" (even though we have conditions there is still a level or presumption that it's good) and others which are based primarily on the opening setup.

Probably this would lead to a slightly weaker program because you usually want to specialize at the thing you want to excel in. But it might make the program more robust and interesting too. Maybe it would make it stronger if you mixed the strategies, sort of like the sprinter who also does weight training in addition to actual running.

Don

lucasart · Post by **lucasart** » Mon Dec 17, 2012 1:26 pm

Don wrote: I have a related theory on this. I have wondered what would happen if a program were tuned with random positions instead of standard opening moves. Suppose we generated 10 random moves for each side and then made some effort to discard positions that were obviously losing. My theory is that it might produce a more "robust" and interesting player. It would be forced to handle "non-standard" types of positions such as ridiculous pawn structures, pieces hopeless buried, kings exposed and so on. Modern openings will rarely expose those features but it seems like it would be useful for a program to have some kind of experience dealing with them.

Good point. Generating random but legal positions is too extreme though. You need to control certain things, if only to ensure playability and equality of the positions. The chess 960 positions are cleverely defined, because they ensure that rooks are on either side of the king (some engine knowledge about rooks trapped by a king still makes sense here) and bishops are on different colors (if not, then the whole bishop pair bonus is probably worthless, in fact N+N or N+B could be even better than a bishop pair on same color squares).
I'll probably implement Chess960 in my engine, and use the 960 positions in my testing methodology: play 960*2 games (self-play) to validate new patches. I agree that there's too much specific code in a lot of engines, especially the code that aims to compensate for the lack of an opening book (which is a strange idea, because you would expect to play with a book in any reasonable rating list or competition).

OpenChess

OpenChess

Testing methodology

Testing methodology

Re: Testing methodology

Re: Testing methodology

Re: Testing methodology