Creating a new (and independent) rating list

General discussion about computer chess...
User avatar
Matthias Gemuh
Posts: 295
Joined: Wed Jun 09, 2010 2:48 pm
Contact:

Re: Creating a new (and independent) rating list

Post by Matthias Gemuh » Fri Jun 25, 2010 10:22 am

aiorla wrote:I will dedicate some time of my i5-750 to the list if it is created!
About the time control, I think that repeating control or an incrementing sound better than 10+0, 15+0...
Yes, 10+0, 15+0... leads to meaningless results and poor pgn files, if the games are long enough for time trouble to kick in.

Matthias.
Aided by engines, GMs can be very strong.
http://www.hylogic.de

Adam Hair
Posts: 104
Joined: Fri Jun 11, 2010 4:29 am
Real Name: Adam Hair
Contact:

Re: Creating a new (and independent) rating list

Post by Adam Hair » Fri Jun 25, 2010 5:20 pm

Rebel wrote:
BB+ wrote:CCRL has the following:
CEGT has
From the stipulations I understand the intent of both CCRL and CEGT is to measure the raw engine strength. While that choice has its merits injustice is done to other efforts of the programmer to add extra elo-points to his brainchild. Opening-books, book-learning, position-learning are essential parts of a chess program, they are able to fix holes, adapt, even avoid previous made mistakes.

IMO programs should be tested as a whole as the programmer intended and not be handicapped.

This has always been the policy of the SSDF.

Ed
I can understand that you, as a programmer, want all features that you have built into your chess program to be used in testing.
However, that presents a problem for a rating list that is trying to test many engines. Comparing two engines by way of their
head-to-head match does not really give an accurate idea of their relative strengths. So, a comparison of their results against
other engines is also needed. If some of those other engines have the ability to learn, then accuracy in the comparison suffers.
Let's say that I play Yace against a gauntlet of engines, including ProDeo with book learning and position learning turned on.
Then, some time later, I play Trace against the same gauntlet. The comparison between Yace and Trace suffers to some degree
because the ProDeo that Trace played against is not the same ProDeo that Yace played against. ProDeo has evolved during the
time between the two gauntlets ( this is assuming ProDeo has played more games during that time interval ).

A different method of testing is needed to show how an engine such as ProDeo improves over time.

Adam Hair
Posts: 104
Joined: Fri Jun 11, 2010 4:29 am
Real Name: Adam Hair
Contact:

Re: Creating a new (and independent) rating list

Post by Adam Hair » Fri Jun 25, 2010 5:22 pm

Matthias Gemuh wrote:
aiorla wrote:I will dedicate some time of my i5-750 to the list if it is created!
About the time control, I think that repeating control or an incrementing sound better than 10+0, 15+0...
Yes, 10+0, 15+0... leads to meaningless results and poor pgn files, if the games are long enough for time trouble to kick in.

Matthias.
I think an incremental time control leads to better games. But it is easier to synchronize different computers using a repeating
time control.

aiorla
Posts: 10
Joined: Wed Jun 23, 2010 10:13 pm
Real Name: Aitor Ortiz de Latierro Olivella

Re: Creating a new (and independent) rating list

Post by aiorla » Fri Jun 25, 2010 5:57 pm

Adam Hair wrote:
Matthias Gemuh wrote:
aiorla wrote:I will dedicate some time of my i5-750 to the list if it is created!
About the time control, I think that repeating control or an incrementing sound better than 10+0, 15+0...
Yes, 10+0, 15+0... leads to meaningless results and poor pgn files, if the games are long enough for time trouble to kick in.

Matthias.
I think an incremental time control leads to better games. But it is easier to synchronize different computers using a repeating
time control.
Yes, repeating time control seems the better way to do it easy and fine.
And now what? :mrgreen:

User avatar
kingliveson
Posts: 1388
Joined: Thu Jun 10, 2010 1:22 am
Real Name: Franklin Titus
Location: 28°32'1"N 81°22'33"W

Re: Creating a new (and independent) rating list

Post by kingliveson » Fri Jun 25, 2010 7:03 pm

aiorla wrote:
Adam Hair wrote:
Matthias Gemuh wrote:
aiorla wrote:I will dedicate some time of my i5-750 to the list if it is created!
About the time control, I think that repeating control or an incrementing sound better than 10+0, 15+0...
Yes, 10+0, 15+0... leads to meaningless results and poor pgn files, if the games are long enough for time trouble to kick in.

Matthias.
I think an incremental time control leads to better games. But it is easier to synchronize different computers using a repeating
time control.
Yes, repeating time control seems the better way to do it easy and fine.
And now what? :mrgreen:
I guess well all agree on 40/4, 40/20, and 40/40 repeating time control. What is the consensus on EGTB?
PAWN : Knight >> Bishop >> Rook >>Queen

Andrew
Posts: 10
Joined: Mon Jun 14, 2010 3:19 pm

Re: Creating a new (and independent) rating list

Post by Andrew » Fri Jun 25, 2010 7:20 pm

BB+ wrote:I'm not sure I like the methodology of any of the current groups, but then I have a very strong preference for science. For instance, some of the rating groups let the operator choose/create the book. I have no idea how much mischief this could entail, though I see anecdotes around that Engine X does (relatively) better than Engine Y with Book Z. If we want to make it a scientific venture, more discussion is needed. For instance, should a uniform platform be adopted, or is the "benchmark and adjust" procedure sufficient? What aspects of the engine are you trying to measure (for instance, is time management important)? What interference is allowed from the GUI (for instance, is N straight moves with both at 0.00 a draw, even if no repetition has been made)? Is the focus for top engines (top 10 or 20), or for a wide variety (200+) of amateur engines? If you want the latter, then you will likely have to sacrifice "science" to some degree, as to cover such a broad spectrum you will need many different testers involved.
I think it silly to create yet another rating list, unless you spend the time to eliminate confounding variables, and focus on truth. To me this means:
  • No opening book, but rather X starting positions.
  • Identical hardware for all tests, but different computers for each engine.
  • Strongest settings provided by the programmer, or another expert (in the case of IPPO*)
  • Identical Hash size
  • 3-4-5-6 Man Tablebases
  • Ponder On
  • Classical Time Controls
  • As many moves as possible until engines agree to draw, 50 move rule, or 3 reps.
  • Starting sample of at least 30 engines
  • Identical number of games as w/b verse every other engine
I'm not sure the full details of SSDF, but it seems the most promising to me.

User avatar
kingliveson
Posts: 1388
Joined: Thu Jun 10, 2010 1:22 am
Real Name: Franklin Titus
Location: 28°32'1"N 81°22'33"W

Re: Creating a new (and independent) rating list

Post by kingliveson » Fri Jun 25, 2010 7:44 pm

Andrew wrote: I think it silly to create yet another rating list, unless you spend the time to eliminate confounding variables, and focus on truth. To me this means:
  • No opening book, but rather X starting positions.
  • Identical hardware for all tests, but different computers for each engine.
  • Strongest settings provided by the programmer, or another expert (in the case of IPPO*)
  • Identical Hash size
  • 3-4-5-6 Man Tablebases
  • Ponder On
  • Classical Time Controls
  • As many moves as possible until engines agree to draw, 50 move rule, or 3 reps.
  • Starting sample of at least 30 engines
  • Identical number of games as w/b verse every other engine
I'm not sure the full details of SSDF, but it seems the most promising to me.
  • Identical hardware for all tests, but different computers for each engine.
Can you elaborate -- are you suggesting a volunteer in Japan has to use the same hardware as a volunteer in Burkina-Faso? Or, you are saying if a volunteer runs a 40/4 test on one hardware, s/he must also run 40/20, and 40/40 on that same hardware?
  • Strongest settings provided by the programmer, or another expert (in the case of IPPO*)
There could be an issue based on current release scheduling.
  • Ponder On
Ponder On for me is only useful for collecting high quality games. I dont see it affecting end result. Am sure most people would prefer Ponder On, but the hardware is just not available at this stage.
PAWN : Knight >> Bishop >> Rook >>Queen

aiorla
Posts: 10
Joined: Wed Jun 23, 2010 10:13 pm
Real Name: Aitor Ortiz de Latierro Olivella

Re: Creating a new (and independent) rating list

Post by aiorla » Fri Jun 25, 2010 9:56 pm

kingliveson wrote:
Andrew wrote: I think it silly to create yet another rating list, unless you spend the time to eliminate confounding variables, and focus on truth. To me this means:
  • No opening book, but rather X starting positions.
  • Identical hardware for all tests, but different computers for each engine.
  • Strongest settings provided by the programmer, or another expert (in the case of IPPO*)
  • Identical Hash size
  • 3-4-5-6 Man Tablebases
  • Ponder On
  • Classical Time Controls
  • As many moves as possible until engines agree to draw, 50 move rule, or 3 reps.
  • Starting sample of at least 30 engines
  • Identical number of games as w/b verse every other engine
I'm not sure the full details of SSDF, but it seems the most promising to me.
  • Identical hardware for all tests, but different computers for each engine.
Can you elaborate -- are you suggesting a volunteer in Japan has to use the same hardware as a volunteer in Burkina-Faso? Or, you are saying if a volunteer runs a 40/4 test on one hardware, s/he must also run 40/20, and 40/40 on that same hardware?
  • Strongest settings provided by the programmer, or another expert (in the case of IPPO*)
There could be an issue based on current release scheduling.
  • Ponder On
Ponder On for me is only useful for collecting high quality games. I dont see it affecting end result. Am sure most people would prefer Ponder On, but the hardware is just not available at this stage.
I will add too that 3-4-5-6 are really big and not affordable for all the possible testers, who will have to have Robbobases too to make it more fair!
And one problem could be the SSE4.2 implementation of some engines and large pages, these topics have to be talked too.

LetoAtreides82
Posts: 32
Joined: Thu Jun 10, 2010 12:46 am

Re: Creating a new (and independent) rating list

Post by LetoAtreides82 » Sat Jun 26, 2010 1:20 am

Andrew wrote: [*]Ponder On
[/list]
Ponder on doesn't seem to be producing drastically different results from ponder off. Compare rating differences from ponder-off lists like CEGT or CCRL to ponder-on lists like IPON (http://www.inwoba.de/index.html ) ,

User avatar
IWB
Posts: 195
Joined: Thu Jun 10, 2010 4:10 pm

Re: Creating a new (and independent) rating list

Post by IWB » Sat Jun 26, 2010 9:09 am

Hi
LetoAtreides82 wrote:
Andrew wrote: [*]Ponder On
[/list]
Ponder on doesn't seem to be producing drastically different results from ponder off. Compare rating differences from ponder-off lists like CEGT or CCRL to ponder-on lists like IPON (http://www.inwoba.de/index.html ) ,
I want to disagree here. There are engines which handle their time conrol different with ponder on - as they expect to have more time left. These enignes differ in CEGT and IPON. Of course the difference is not hundreds of Elo but it might be visible in ranking. My usuall example is Shredder 12 and Naum 4. While all ponder off lists have Naum 4 in front of Shredder 12 inponder ON it is vice versa. (Naum 4.2 of cousre is good enough to pass then) Nevertheless, there are differences.

Besides that there is another logical point for ponder ON. There is not a single tourney on the world where enignes (or humans :-) ) are playing ponder off. Some people debate about 3,4,5 pc Tbs, books, learning ... as the programmer invested time in that to improve the play of his engines. The same goes for ponder ON and that is something which makes all enignes equal as they all support that.

Bye
Ingo
Ponder ON rating list: http://www.inwoba.de

Post Reply