Alpha Zero

Code, algorithms, languages, construction...

Alpha Zero

Postby BB+ » Wed Dec 06, 2017 8:29 pm

The game of chess is the most widely-studied domain in the history of artificial intelligence. The strongest programs are based on a combination of sophisticated search techniques, domain-specific adaptations, and handcrafted evaluation functions that have been refined by human experts over several decades. In contrast, the AlphaGo Zero program recently achieved superhuman performance in the game of Go, by tabula rasa reinforcement learning from games of self-play. In this paper, we generalise this approach into a single AlphaZero algorithm that can achieve, tabula rasa, superhuman performance in many challenging domains. Starting from random play, and given no domain knowledge except the game rules, AlphaZero achieved within 24 hours a superhuman level of play in the games of chess and shogi (Japanese chess) as well as Go, and convincingly defeated a world-champion program in each case.


https://arxiv.org/abs/1712.01815
BB+
 
Posts: 1484
Joined: Thu Jun 10, 2010 4:26 am

Re: Alpha Zero

Postby BB+ » Wed Dec 06, 2017 9:00 pm

It's been a while since I've posted anything of content on computer chess, but the Alpha Zero work has prodded me to change that.

Although I am quite impressed by the result, and moreover that they achieved it via tabula rasa (as they call it), they have (after all) written a scientific paper on it, and thus I feel obliged to offer scientific criticism on the matter.

1. The strength of Stockfish 8

The paper says that SF8 ran on "64 threads and a hash size of 1GB", searching something like 70 million positions per second (on average?). No hardware is given, but I admit to doing the same in computational papers that I've written, at least when only an idea of the time taken is needed (not so true here, where the exactitude seems more important).

Anyway, I have to seriously criticize this small hash for Stockfish. They said that the "tournament time control" (?!) of 1 min/move was used (seemingly with UCI's "movetime", though that's again unclear). I haven't perused the Sardines-ness of Stockfish lately, but even at 8 bytes per hash entry this would be only 128 million positions. At the above speed, even with a low 10% saving rate (again I assume SF is not saving in q-search), the hash would be full in 1/3 of the time of one search. Moreover given that SF8 uses (I think) a lazy approach to parallelism using hash-sharing, this seems (quite) inadequate to my mind. I guess one could do experiments to determine how much this RAM paucity hurts SF, and I'd guess it is less than the observed Elo differential to AlphaZero, but it still looks quite dubious from someone in the know about computer chess. In any event, this is small potatoes compared to the next item.

2. Hardware comparisons

I think this has already been brought up by others elsewhere, but there is no real idea about how strong 4 TPUs are compared to the Stockfish setup. Perhaps this is a good reason that "software championships" are bogus anyway... :)

Google's TPUs are not the best documented except by hype-ish word-of-mouth, but I've seen numbers from 50-100 trillion operations per second, compared to some plausible upper bound of around 1 trillion for the 64 thread SF setup (and in reality, particularly with hash issues, I'd guess the effective number is about 100 billion at most). Again I'm guessing as much as anyway else, and TFLOPS might not be the most relevant measure, but this does tend to indicate (at least) a 100x hardware edge for AlphaZero, possibly something more like 1000x or more. The paper, in my mind rather cheekily, never really attempts to discuss this hardware comparison issue, which I'd think is of critical importance (of course, they are more concerned with the learning aspects, but even handwaving this would be better than ignoring it completely). There are diminishing returns from speed-doubling at some point, but even still I'd guess 100x to be at least 200 Elo here, much more than the observed 100 Elo differential.

If anything, the lack of cluster-sized parallelism in chess (beyond ZCT, Jonny, and a few others) has always annoyed me, as I've never seen the TCEC-esque huger-and-huger machine model to be that relevant. Indeed, in my own Losing Chess research I had to deal with 128 cores across 8 machines (typically), and came up with a (rather simplisitic) approach based upon the goals of that project. I don't foresee any of the top engines becoming (usefully) cluster-capable anytime soon, which is what would seem to be needed to balance the hardware comparison. Maybe we'll just have to admit the single cpu/RAM model is dead. (Yea!).

My guess (from "single machine" being noted) is that the "4 TPUs" are on one chip/module (cf. the CPU/core distinction), but even that is not clear.

Perhaps I could state this criticism succinctly: if you are going to talk about the learning aspects of AlphaZero development, fine - but I don't think that "validating" this by a murky comparison to Stockfish should necessarily be the overall focus of this (one could argue the true domain of the work is at the software/hardware interface, and that you are really showing that TPUs and/or massive parallelism are the future, as opposed to the monolithic model - rather different than a presentation of this as a "mere" software advance). Of course, the 28-0 score will draw the headlines, but to me, it is (at least as described) rather scientifically obscure as to its import. (I've seen such "benchmarketing" in my professional work in computational mathematics, and this is by no means the worst...)

3. The lack of opening book/variety

This again has been noted by others, though I can't say that it's as big of a deal as point 1 or (particularly) point 2. Unfortunately computer chess literature has IMO been historically too happy to accept 100 game tests (which here is fortunately significant) in unclear conditions. Again this seems to me to be something that someone in the know about computer chess would find anomalous, but I don't think it changes the end result too much. The fact that AlphaZero beat Stockfish from each human opening, which it itself had "found" and self-trained on, does raise some flags about bias - from a scientific point of view, they should use an independent collection of openings. They seem(?) to have done this in another 100 game sample (from what I can tell, there are 1300 games, 100 independent bookless games, and 100 each from 12 popular positions from which AlphaZero self-trained) - at the very least, this should be clarified as to what took place, and in fact the paper is IMO rather sloppy in describing its exact conditions. (I do like that the French Defense got murdered though!).

I'm not going to say that creating a separate FishTest endpoint from each of the 12 chosen opening positions would make such Stockfishes much better here, but that would seem to be the more trenchant comparison if it were feasible.

I've read that a "full paper" is in the works, and maybe it will address these issues to a workable degree - yet egad!, this is another one of my pet gripes, research and "advertizing about research" being conflated..

Some more quibbly comments:

In the abstract, the phrase "achieved within 24 hours" is rather unanchored to the computational power used (in fact, the text says it took just 4 hours to outperform SF, which is much more than merely "superhuman" to my mind).

Although I'm no expert on the (expected) rate of convergence, I'd expect a logarithmic x-axis would be more useful in Figure 1.
BB+
 
Posts: 1484
Joined: Thu Jun 10, 2010 4:26 am

Re: Alpha Zero

Postby BB+ » Fri Dec 08, 2017 10:49 am

Perhaps a useful comparison would be to a MCTS scheme that similarly accumulates ca. 50000 "evaluations" per second, but with each "evaluation" not being from a neural net calculation, but rather the result from (say) a "depth 10" search (or whatever the appropriate depth is, to equalize the hardware). At least to my understanding, the neural net policy also involves how the MCTS tree is expanded, but I'm not sure to what degree.
BB+
 
Posts: 1484
Joined: Thu Jun 10, 2010 4:26 am

Re: Alpha Zero

Postby H.G.Muller » Fri Dec 15, 2017 6:16 pm

I don't think that most of your criticism is very valid. Hash size is a pathetic excuse, as seach time usually depends on hash size only as the inverse 12th root. So each doubling of the hash size might only give 5-8 Elo. Usually the search only starts to slow down when the overload of the table reaches a factor 10. And apparently AlphaZero randomizes enough to get good game diversity even against a deterministic Stockfish. There are many different games that AlphaZero won, and not a single game it lost. So I don't think there is any doubt that AlphaZero must have performed at the Stockfish level or better. Which was the only purpose; it is completely irrelevant whether it performed 90 Elo better or 100 Elo better.

And I Also don't think there was a significant hardware advantage. The TPUs are just single chips, meaning they must have similar power usage and number of transistors, as state-of-the-art chips are typically as large as they can be made. The 64 cores used by Stockfish must also have been on more than a single chip. It is just that TPUs do many more multiplications per clock cycle, because they use virtually all their transistors in multiplier units. So the hardware is just different, but hardly more or less powerful. AlphaZero would of course run orders of magnitude slower when it would have had to use Stockfish' hardware, but that just as much holds the other way around; Stockfish would run orders of magnitude, (if not infinitely) slower on a TPU. So that is not really an argument.
H.G.Muller
 
Posts: 178
Joined: Sun Jul 14, 2013 10:00 am

Re: Alpha Zero

Postby BB+ » Sun Dec 17, 2017 2:07 am

H.G.Muller wrote:... So each doubling of the hash size might only give 5-8 Elo ... I don't think there is any doubt that AlphaZero must have performed at the Stockfish level or better. Which was the only purpose; it is completely irrelevant whether it performed 90 Elo better or 100 Elo better..

One of my criticisms is just that 1GB shows cluelessness about computer chess. E.g., I find it almost impossible to think that 1GB was the maximum on said 64-thread setup, and perhaps the only pseudo-realistic reason I give for such a low value is that they wanted the same numerology as with Elmo. Given that they took the effort to find a 64-thread (likely 32-core) machine, I'm sure Google could easily get such with 128GB (if not more) and use 64GB hash in SF8 if they cared. By your estimation, this is 30-45 Elo, about 1/3 to 1/2 of the observed differential. Adding that to the choice of SF8 vs SF-devel, and it's not that clear whether AlphaZero outperformed SF. (From what I can tell, this AlphaZero project started just a few months ago about the time of the AlphaGoZero publication, and given that I recognize at least one member of their team as being conversant with computer chess, I'd expect them to be aware of the devel version and its merits).

If they are going to claim that the primary result of their work is the blitzing of SF8 (which in fact they don't immediately highlight too much in the paper, though the media efforts are a different story), then they should be quite strict about the conditions, particularly for their adversary. This is proper from both a scientific and publicity standpoint. But as I think we both agree, the main scientific interest is simply that their techniques are even close to SF/Houdini/Komodo in the first place.

And I Also don't think there was a significant hardware advantage. The TPUs are just single chips, meaning they must have similar power usage and number of transistors, as state-of-the-art chips are typically as large as they can be made. The 64 cores [threads] used by Stockfish must also have been on more than a single chip. It is just that TPUs do many more multiplications per clock cycle, because they use virtually all their transistors in multiplier units. So the hardware is just different, but hardly more or less powerful. AlphaZero would of course run orders of magnitude slower when it would have had to use Stockfish' hardware, but that just as much holds the other way around; Stockfish would run orders of magnitude, (if not infinitely) slower on a TPU. So that is not really an argument.


I would tend to agree that (within reasonable bounds) power consumption is the most relevant comparison metric (if one exists); in this regard, to quote a recent TPU paper from Google:
Jouppi et al., (ISCA, June 2017) wrote: https://drive.google.com/file/d/0Bx4haf ... xtcEk/view
(Abstract). Despite low utilization for some applications, the TPU is on average about 15X -30X faster than its contemporary GPU or CPU, with TOPS/Watt about 30X - 80X higher. Moreover, using the GPU’s GDDR5 memory in the TPU would triple achieved TOPS and raise TOPS/Watt to nearly 70X the GPU and 200X the CPU.
(Section 5). The TPU server has 17 to 34 times better total-performance/Watt than Haswell, which makes the TPU server 14 to 16 times the performance/Watt of the K80 server. The relative incremental-performance/Watt—which was our company’s justification for a custom ASIC—is 41 to 83 for the TPU, which lifts the TPU to 25 to 29 times the performance/Watt of the GPU.

From I can tell, each TPU is 75W (at least in earlier data-sheets), which at 300W total I would guess is similar to that of the 64-thread setup, and (as indicated) distributes the power much more usefully in any event (again, I find these monolithic boxes rather klutzy power-consumption-wise compared to solutions that tap more parallelism, but alas, computer chess software has not gone much in the latter direction to date).
BB+
 
Posts: 1484
Joined: Thu Jun 10, 2010 4:26 am

Re: Alpha Zero

Postby H.G.Muller » Sun Dec 17, 2017 1:51 pm

Note that the 5-8 Elo per hash doubling only aplies in the regime where the hash table is overloaded. Once the hash is large enough to hold the entire search tree, there is no advantage whatsoever in enlarging the hash size. Your own estimate was that the search would fill up the hash in 1/3 of the time. So there is only 1.5 doubling before the hash is large enough. This of course depends on some uncertainties in your estimate concerning the size that is actually needed (e.g. if QS nodes are really not stored). My experience with programs that do store QS nodes is that you only start to see adverse affects of shrinking the hash when the number of entries gets below 1/10 of the number of reported nodes. (But even this depends on how the number of nodes is counted, e.g. whether hash cutoffs count as nodes, and how much IID you do.)

So I guess the only way to make sure how much Stockfish would improve on enlarging the hash is to just measure it. Which should not be very difficult. But I do't really expect much effect of this, if any at all.
H.G.Muller
 
Posts: 178
Joined: Sun Jul 14, 2013 10:00 am


Return to Programming and Technical Discussions

Who is online

Users browsing this forum: No registered users and 1 guest