Creating a new (and independent) rating list

General discussion about computer chess...
User avatar
Rebel
Posts: 515
Joined: Wed Jun 09, 2010 7:45 pm
Real Name: Ed Schroder

Re: Creating a new (and independent) rating list

Post by Rebel » Sat Jun 26, 2010 9:26 am

IWB wrote:I fully agree here and that is one of the resons why I decided to run the IPON years ago and finale went public last year.
Very good, keep up the good work.

If I understand your pages well your reason for not including Ippo and friends is the lack of an author name. Can you explain the why of that? Just curiosity about your way of reasoning.

Another question, would you like to accept help from other people here who are willing to test? I realize that's a big thing but if IPON is ran by a group of people it would give it more status.

Ed

User avatar
IWB
Posts: 195
Joined: Thu Jun 10, 2010 4:10 pm

Re: Creating a new (and independent) rating list

Post by IWB » Sat Jun 26, 2010 9:42 am

Hello Ed
Rebel wrote: If I understand your pages well your reason for not including Ippo and friends is the lack of an author name. Can you explain the why of that? Just curiosity about your way of reasoning.
Simply because an anonym enigne leave no possibility for someone to claim his intelectual property in case he thinks it might be infringed. If there is a real name everyone is free to take any action he thinks might be usefull. So basicaly an anonymus enige is simply unfair. (And I do not see a reason why someone wants to be anonymus with an chess engine, there is no social taboo and no one considers chess programming asa something bad)
Rebel wrote: Another question, would you like to accept help from other people here who are willing to test? I realize that's a big thing but if IPON is ran by a group of people it would give it more status.
Ed
Thank you, but no. Several reasons and I will just name a few.

1. I do some betatesting as well. If there are more people involved it is harder to keep that seperated.
2. More people means more problems. See the other lists, some of thier members want to test the Litos, some dont ... now they have to find a consensus ... difficult.
3.Actually I disagree that more people give more status (and I dont mind that). More people means you throw results together which actually do not belong together - which is one of my critics on the other list. You have to unify hardware, opening positions, books or you throw together whatever someone tested!? Then you have to adapt time (hardware), another thing which I personaly have to disagree.

And with the current status I am independent. If i want to quit tomorrow I delete my site and I am gone :-)

Have a nice weekend and good luck to your team :-)
Ingo
Ponder ON rating list: http://www.inwoba.de

BB+
Posts: 1484
Joined: Thu Jun 10, 2010 4:26 am

Re: Creating a new (and independent) rating list

Post by BB+ » Sat Jun 26, 2010 9:54 am

I agree that "ponder off" is artificial. As I said earlier, it you testing ponder off, you are likely trying to judge "analysis strength" rather than "playing strength", and might as well test via "go movetime". The main disadvantage of ponder on is that you probably need 2 computers (I guess you can test 4 cpus each on an 8-core system, but implementing that needs a bit of care).

I definitely think that a (large) fixed position set should be made, say 10K positions. I don't think this is too hard, at least as a group project, for we simply: make a first example of a test set; then allow people to (repeatedly) point out that Position #XYZ is garbage, and so we eliminate it, and insert a new one in its place. Having 10K positions allows one to play 20K games at (say) "go movetime 1000" (less than a week on a quad), and get a rating guess within 2 or 3 ELO. The purpose of these fast games would not be to judge the engines, but to get an idea if Setup A differs greatly from Setup B (example: I play 20K games A vs B and get a 50 ELO difference. You get a 75 ELO difference. So we know that A does relatively better by somewhere around 25 ELO on your setup. We can then extrapolate that to longer time controls. One problem with this "calibration" is that hash settings won't come into play too much, due to the speed of the games). I guess making sure that the relative opening frequencies match human play (which level? GM/master/club?) could also be a goal. Once the 10K positions are available, a tester could randomly choose (say) 1000 from this superset, with any bias possible likely to be less than from other aspects.

User avatar
kingliveson
Posts: 1388
Joined: Thu Jun 10, 2010 1:22 am
Real Name: Franklin Titus
Location: 28°32'1"N 81°22'33"W

Re: Creating a new (and independent) rating list

Post by kingliveson » Sat Jun 26, 2010 7:13 pm

I have to agree with these statements:
Harvey wrote:The testers need to decide what they are. Are the independent. At the moment i would say no as on the ratings lists you see many beta versions of Rybka 4. The testers are being used as last minute beta testers before release. For me a true list for public consumption should contain only engines available to the public. Ingo produces a list that has only available engines. However in his private testing it contains Shredder betas but he, rightly, chooses to keep those games off the published list.

I remember a few years ago asking for a Hiarcs setting to be tested we were told no we do not do that. Although they had already several engines with various settings on their list. Perhaps the lists should only test default settings, of released engines? They would get more games played and provide a better service to those reading the list.

I think some testers would test any gas released by Rajlich if asked to.

I have even seen 1 member of the testing groups publishing how the list would look if he added one of the engines his group will not test?!
PAWN : Knight >> Bishop >> Rook >>Queen

Gino
Posts: 15
Joined: Thu Jun 10, 2010 4:04 am

Re: Creating a new (and independent) rating list

Post by Gino » Sat Jun 26, 2010 8:00 pm

I also agree about the need for independent testers for the independent rating list

User avatar
Rebel
Posts: 515
Joined: Wed Jun 09, 2010 7:45 pm
Real Name: Ed Schroder

Endusers vs Programmers

Post by Rebel » Sun Jun 27, 2010 12:01 am

Thanks Ingo for explanations. I hope your IPON list will be accepted by the CCC, where CCC stands for Computer Chess Community, not to be confused with the current derailed forum. ;)

Ed

Andrew
Posts: 10
Joined: Mon Jun 14, 2010 3:19 pm

Re: Creating a new (and independent) rating list

Post by Andrew » Mon Jun 28, 2010 12:39 pm

kingliveson wrote:Can you elaborate -- are you suggesting a volunteer in Japan has to use the same hardware as a volunteer in Burkina-Faso? Or, you are saying if a volunteer runs a 40/4 test on one hardware, s/he must also run 40/20, and 40/40 on that same hardware?
Yes, I'm suggesting every tester use the exact same hardware, much like the SSDF currently does.
kingliveson wrote: There could be an issue based on current release scheduling.
I'm not sure I understand what you mean.
kingliveson wrote: Ponder On for me is only useful for collecting high quality games. I dont see it affecting end result.
I've not tested it, but I'd be cautious in dismissing anything without empirical data.
kingliveson wrote: Am sure most people would prefer Ponder On, but the hardware is just not available at this stage.
Yes, unfortunately, the limit to my proposal is hardware cost.
BB+ wrote:I definitely think that a (large) fixed position set should be made, say 10K positions. I don't think this is too hard, at least as a group project, for we simply: make a first example of a test set; then allow people to (repeatedly) point out that Position #XYZ is garbage, and so we eliminate it, and insert a new one in its place. Having 10K positions allows one to play 20K games at (say) "go movetime 1000" (less than a week on a quad), and get a rating guess within 2 or 3 ELO. The purpose of these fast games would not be to judge the engines, but to get an idea if Setup A differs greatly from Setup B (example: I play 20K games A vs B and get a 50 ELO difference. You get a 75 ELO difference. So we know that A does relatively better by somewhere around 25 ELO on your setup. We can then extrapolate that to longer time controls. One problem with this "calibration" is that hash settings won't come into play too much, due to the speed of the games). I guess making sure that the relative opening frequencies match human play (which level? GM/master/club?) could also be a goal. Once the 10K positions are available, a tester could randomly choose (say) 1000 from this superset, with any bias possible likely to be less than from other aspects.
My understanding is that Hyatt already does something similar for crafty. In some other threads he points out the positions he uses...

Sentinel
Posts: 122
Joined: Thu Jun 10, 2010 12:49 am
Real Name: Milos Stanisavljevic

Re: Creating a new (and independent) rating list

Post by Sentinel » Mon Jun 28, 2010 1:56 pm

BB+ wrote:I definitely think that a (large) fixed position set should be made, say 10K positions. I don't think this is too hard, at least as a group project, for we simply: make a first example of a test set; then allow people to (repeatedly) point out that Position #XYZ is garbage, and so we eliminate it, and insert a new one in its place. Having 10K positions allows one to play 20K games at (say) "go movetime 1000" (less than a week on a quad), and get a rating guess within 2 or 3 ELO. The purpose of these fast games would not be to judge the engines, but to get an idea if Setup A differs greatly from Setup B (example: I play 20K games A vs B and get a 50 ELO difference. You get a 75 ELO difference. So we know that A does relatively better by somewhere around 25 ELO on your setup. We can then extrapolate that to longer time controls. One problem with this "calibration" is that hash settings won't come into play too much, due to the speed of the games). I guess making sure that the relative opening frequencies match human play (which level? GM/master/club?) could also be a goal. Once the 10K positions are available, a tester could randomly choose (say) 1000 from this superset, with any bias possible likely to be less than from other aspects.
You don't have to go through all that trouble since we have Bob who was so generous to make (for quite some time) his 40k starting positions available on his website for anyone to use.
And I must say I do trust his choice of startup positions.

Sentinel
Posts: 122
Joined: Thu Jun 10, 2010 12:49 am
Real Name: Milos Stanisavljevic

Re: Creating a new (and independent) rating list

Post by Sentinel » Mon Jun 28, 2010 2:02 pm

Andrew wrote:Yes, I'm suggesting every tester use the exact same hardware, much like the SSDF currently does.
It largely impractical, I would even say impossible. You would need to have not only the same CPU, but the same memory (same size, same brand, same type, same latency), same motherboard, same GUI, same OS, even same settings of the OS (same services running in the background, same startup processes), because all of these things do impact performance.

Hood
Posts: 200
Joined: Thu Jun 10, 2010 2:36 pm
Real Name: Krzych C.

Re: Creating a new (and independent) rating list

Post by Hood » Mon Jun 28, 2010 2:14 pm

The environment is the same for each program so the impact ob rating shall be not so big.
Smolensk 2010. Murder or accident... Cui bono ?

There are not bugs free programms. There are programms with undiscovered bugs.
Alleluia.

Post Reply