2 TB HD for Tablebases

Code, algorithms, languages, construction...
User avatar
robbolito
Posts: 601
Joined: Thu Jun 10, 2010 3:48 am

2 TB HD for Tablebases

Post by robbolito » Wed Jun 30, 2010 8:58 pm

Considering that total TBs with 6 pieces are more than one Terabyte is it worthwhile buying HD with either 1.5 or 2 TB to download the files and than to use them for the engines during the games .Would the use of the TBs help the engine that is capable to use them or these new engines can do fine even without TBs.
The largest 12 DVD collection of TBs is about 95 Gig with portion of 6 man endgames.That is a lot of space on the HD but considering that on 1 TB HD this is only a portion of the disc that is occupied .What is more distressing is seeing engines using only a fraction of the TBs during the end games.
For that reason the above questions.Is it worth money and especially the time to download all the TBs if they are not going to be very helpful.

hyatt
Posts: 1242
Joined: Thu Jun 10, 2010 2:13 am
Real Name: Bob Hyatt (Robert M. Hyatt)
Location: University of Alabama at Birmingham
Contact:

Re: 2 TB HD for Tablebases

Post by hyatt » Thu Jul 01, 2010 1:47 am

robbolito wrote:Considering that total TBs with 6 pieces are more than one Terabyte is it worthwhile buying HD with either 1.5 or 2 TB to download the files and than to use them for the engines during the games .Would the use of the TBs help the engine that is capable to use them or these new engines can do fine even without TBs.
The largest 12 DVD collection of TBs is about 95 Gig with portion of 6 man endgames.That is a lot of space on the HD but considering that on 1 TB HD this is only a portion of the disc that is occupied .What is more distressing is seeing engines using only a fraction of the TBs during the end games.
For that reason the above questions.Is it worth money and especially the time to download all the TBs if they are not going to be very helpful.

I am no longer so enchanted with the egtb files. The first thing you encounter with 6 piece files is that either you die due to poor disk performance (those big disks are _not_ fast), Or you die because of the enormous memory requirements as it is impossible to cache enough of them to avoid doing major I/O. Since we have the 50-move rule to contend with, most of them are really not very useful anyway since you will draw many won positions by missing 50 move draws.

BB+
Posts: 1484
Joined: Thu Jun 10, 2010 4:26 am

Re: 2 TB HD for Tablebases

Post by BB+ » Thu Jul 01, 2010 4:46 am

Considering that total TBs with 6 pieces are more than one Terabyte is it worthwhile buying HD with either 1.5 or 2 TB to download the files and than to use them for the engines during the games .
I am no longer so enchanted with the egtb files. The first thing you encounter with 6 piece files is that either you die due to poor disk performance (those big disks are _not_ fast), Or you die because of the enormous memory requirements as it is impossible to cache enough of them to avoid doing major I/O. Since we have the 50-move rule to contend with, most of them are really not very useful anyway since you will draw many won positions by missing 50 move draws
I completely agree with this. Nalimov was made over a decade ago, when different parameters were being tossed around for access considerations. For instance, I don't think they ever thought anyone would want to index the 6-piece with the scheme. It uses 1 index per 8K positions, while I think the RobboTotalBases use 1 per 64K positions, or even 1 per 1M positions (something called "hyper-indexing", though what it is actually doing is not immediately clear). The main problems I see with the RobboTotalBases are that the 6-piece are not readily available (building them takes a lot of memory/time currently), and I don't think there is an explicit mechanism to use them in search (from what I gather, the whole concept of "RobboBases" seems to be that bitbases should be used in search, and not full distance-bases -- and the 6-piece "TripleBases" do not seem to exist). Gaviota seems closer to Nalimov in spirit, but I think it doesn't do 6-piece yet, and I haven't looked at all the parameter choices. I think it was over a year ago that I saw SMK talking about 6-piece ShredderBases, but I don't know if they are available.

As to the first question, I think that building them yourself is probably still a large hurdle (GCP cleaned up some rogue Nalimov code, if I recall), but presumably then it would not be too hard to add whatever params (like 50 move rule) that you want. The performance gains will likely not be large.

User avatar
robbolito
Posts: 601
Joined: Thu Jun 10, 2010 3:48 am

Re: 2 TB HD for Tablebases

Post by robbolito » Thu Jul 01, 2010 3:59 pm

Dr. Hyatt abd BB+,
thank you very much for your explanations about the use and storage of Tablebases.You have been very kind in considering my questions.
I have concluded that spending time and downloading more than 1 TB of TBs would not help much the new engines at the endgames so at least I can shelve this project for now.

User avatar
Matthias Gemuh
Posts: 295
Joined: Wed Jun 09, 2010 2:48 pm
Contact:

Re: 2 TB HD for Tablebases

Post by Matthias Gemuh » Thu Jul 01, 2010 10:22 pm

It is insane to store and use 6-men EGTB for playing games.
I use only 4-men EGTB (totalling 30 MB in size).

Matthias.
Aided by engines, GMs can be very strong.
http://www.hylogic.de

User avatar
kingliveson
Posts: 1388
Joined: Thu Jun 10, 2010 1:22 am
Real Name: Franklin Titus
Location: 28°32'1"N 81°22'33"W

Re: 2 TB HD for Tablebases

Post by kingliveson » Thu Jul 01, 2010 10:27 pm

Matthias Gemuh wrote:It is insane to store and use 6-men EGTB for playing games.
I use only 4-men EGTB (totalling 30 MB in size).

Matthias.
3 or 4 years from now, it may not be as insane.
PAWN : Knight >> Bishop >> Rook >>Queen

User avatar
Matthias Gemuh
Posts: 295
Joined: Wed Jun 09, 2010 2:48 pm
Contact:

Re: 2 TB HD for Tablebases

Post by Matthias Gemuh » Thu Jul 01, 2010 11:12 pm

kingliveson wrote:
Matthias Gemuh wrote:It is insane to store and use 6-men EGTB for playing games.
I use only 4-men EGTB (totalling 30 MB in size).

Matthias.
3 or 4 years from now, it may not be as insane.
My estimate is that the gain for moving from 5-men to 6-men will never exceed 5 Elo points (for strong engines).
So why bother ?

Matthias.
Aided by engines, GMs can be very strong.
http://www.hylogic.de

Peter C
Posts: 154
Joined: Thu Jun 10, 2010 3:12 am
Real Name: Peter C

Re: 2 TB HD for Tablebases

Post by Peter C » Fri Jul 02, 2010 12:27 am

Matthias Gemuh wrote:
kingliveson wrote:
Matthias Gemuh wrote:It is insane to store and use 6-men EGTB for playing games.
I use only 4-men EGTB (totalling 30 MB in size).

Matthias.
3 or 4 years from now, it may not be as insane.
My estimate is that the gain for moving from 5-men to 6-men will never exceed 5 Elo points (for strong engines).
So why bother ?

Matthias.
Now the gain is already far larger than that. You just have to be very rich. :P

6 pieces on a SSD on a machine with about 32 GB DDR3 would gain quite a bit. The only reason EGTBs don't gain much on normal computers is because a) disk access is really slow and kills the search and b) like Dr. Hyatt said, there is not enough RAM available to cache much of them.

I personally only use 5 piece because, like you, I don't believe the gains are more than 5 elo on a regular machine. Some engines (e.g. Naum), 6 pieces make them lose elo. Also I don't have a 1.5 TB hard drive handy.

Peter

User avatar
LiquidNitrogen
Posts: 19
Joined: Sun Jun 13, 2010 1:20 am
Real Name: Ed Trice

Re: 2 TB HD for Tablebases

Post by LiquidNitrogen » Wed Jul 07, 2010 12:43 am

For instance, I don't think they ever thought anyone would want to index the 6-piece with the scheme. It uses 1 index per 8K positions, while I think the RobboTotalBases use 1 per 64K positions, or even 1 per 1M positions (something called "hyper-indexing", though what it is actually doing is not immediately clear).
I have some idea what is going on, having generated endgames for:

chess (see http://3.bp.blogspot.com/_KQ8DMAfZCik/S ... Gothic.jpg)

checkers (see http://www.liquidnitrogenoverclocking.com/report.txt)

and Gothic Chess (see http://www.gothicchess.com/javascript_endings.html)

First of all, each position is translated to an index (just a number). That number is "how far into" the file to look for the result for the position. So, every position in a bitbase might be highly compressed, and you can actually store several positions in a byte. If, however, you are looking at a "distance to conversion" tablebase, every position most likely requires up to one byte. If it is a "distance to win" tablebase, it could be up to 2 bytes per position.

The tablebases cluster indices together in what are called BLOCKS. Most blocks for smaller tablebases are 8K in size. This is what I believe you were calling an "index" earlier. Each BLOCK is indexed, starting from 0 and going up to however many blocks a tablebase needs.

The smaller your BLOCK size is, the better your RAM buffer useage will be, but the more memory you will need. Typically, with 4-byte BLOCK indices, each BLOCK will need 4 bytes of RAM for the buffering in addition to the block size.

So, with 1 4-byte index per 8192 (8K) entries, you can load 1 million BLOCKs with 4 MB of RAM plus the 1 million x 8K = 8 GB of RAM. You can see that using one 4-byte index per 64K BLOCK is not an earth-shattering savings of RAM. You can load 128,000 64-K blocks, requiring the same 8 GB of RAM, and you're using only 128,000 x 4-bytes = 1 MB of RAM for the buffering of the BLOCK indices.

So, you saved 3 MB of RAM with 8 GB at your disposal.

What the 8K scheme "buys you" is less disk access. If your 128,000 blocks, each of 64K, does not contain a position that needs accessing, your buffer marks block # 127999 as "back on disk", it does a very expensive disk read of 64K of data, marks that as the "most recently seen" buffer block, and the links all "downgrade" every other block by 1, creating a new 64K block that is just about ready to be paged back to disk.

If your 8K blocks needs to be paged out to disk, it's much faster, and it happens less frequently, since you are able to keep the "very most recently" seen blocks in RAM, and the truly inactive ones page out to disk.

In my experience with checkers endgames, only about 20% of the entire tablebase of trillions of endgames needs to be held in RAM, and it delivers about 95% of the performance of the entire thing being loaded in there.

hyatt
Posts: 1242
Joined: Thu Jun 10, 2010 2:13 am
Real Name: Bob Hyatt (Robert M. Hyatt)
Location: University of Alabama at Birmingham
Contact:

Re: 2 TB HD for Tablebases

Post by hyatt » Wed Jul 07, 2010 4:25 am

Eugene and I worked on this part of things quite a bit. There are two issues. (1) I/O latency/bandwidth; (2) cpu utilization during decompression.

If you go to big blocks, you chew up I/O bandwidth, and then a lot of CPU cycles to decompress the entire block. In very large (6-man) tables, it is not uncommon to probe once for any particular block and then move on to another block. You invest a significant amount of time in reading a big block, then decompressing it, only to get one value for all that work. But bigger blocks lead to better compression and smaller overall file sizes.

If you shrink the blocks, you begin to minimize the I/O bandwidth issue although latency remains unchanged. You also reduce the CPU time required to decompress a block.

When Eugene and I tested to figure out the best blocksize, I was using decent SCSI drives (this was on some machine in my office, I don't recall exactly which one now) and had reasonable latency and bandwidth specs to work with. 8K seemed to fit best given the range of hardware at the time. Today, things are likely different. CPUs are way faster, latency is not a lot better, so reading in a larger chunk is acceptable since the CPU overhead to decompress is not so noticeable. But the current EGTB 8K blocksize was simply a compromise between I/O bandwidth/cpu-utilization/compression efficiency. In reality, when we were testing, we found that the value (blocksize) really had an optimal number for each different CPU/Disk drive combination. But we could not see offering N different sets of tables with different blocksizes due to storage requirements.

I am, one day soon, going to answer this question once and for all time, by running a long cluster match using two identical versions of Crafty, the only difference being that one version will use EGTBs, the other version will not. Both will play from my normal set of starting positions against a common set of opponents. Then, at least for Crafty, I can say _exactly_ what the gain (or possible loss) of Elo is when using tables vs when not using them. I suspect it is a break-even deal at present. I ought to have Crafty gather data giving eval when egtb probes start to actually happen, to see how many games are already over, vs how many are actually influenced by the tables. But the cluster approach is more accurate.

Post Reply