Alpha Zero

Code, algorithms, languages, construction...

Alpha Zero

Postby BB+ » Wed Dec 06, 2017 8:29 pm

The game of chess is the most widely-studied domain in the history of artificial intelligence. The strongest programs are based on a combination of sophisticated search techniques, domain-specific adaptations, and handcrafted evaluation functions that have been refined by human experts over several decades. In contrast, the AlphaGo Zero program recently achieved superhuman performance in the game of Go, by tabula rasa reinforcement learning from games of self-play. In this paper, we generalise this approach into a single AlphaZero algorithm that can achieve, tabula rasa, superhuman performance in many challenging domains. Starting from random play, and given no domain knowledge except the game rules, AlphaZero achieved within 24 hours a superhuman level of play in the games of chess and shogi (Japanese chess) as well as Go, and convincingly defeated a world-champion program in each case.
Posts: 1483
Joined: Thu Jun 10, 2010 4:26 am

Re: Alpha Zero

Postby BB+ » Wed Dec 06, 2017 9:00 pm

It's been a while since I've posted anything of content on computer chess, but the Alpha Zero work has prodded me to change that.

Although I am quite impressed by the result, and moreover that they achieved it via tabula rasa (as they call it), they have (after all) written a scientific paper on it, and thus I feel obliged to offer scientific criticism on the matter.

1. The strength of Stockfish 8

The paper says that SF8 ran on "64 threads and a hash size of 1GB", searching something like 70 million positions per second (on average?). No hardware is given, but I admit to doing the same in computational papers that I've written, at least when only an idea of the time taken is needed (not so true here, where the exactitude seems more important).

Anyway, I have to seriously criticize this small hash for Stockfish. They said that the "tournament time control" (?!) of 1 min/move was used (seemingly with UCI's "movetime", though that's again unclear). I haven't perused the Sardines-ness of Stockfish lately, but even at 8 bytes per hash entry this would be only 128 million positions. At the above speed, even with a low 10% saving rate (again I assume SF is not saving in q-search), the hash would be full in 1/3 of the time of one search. Moreover given that SF8 uses (I think) a lazy approach to parallelism using hash-sharing, this seems (quite) inadequate to my mind. I guess one could do experiments to determine how much this RAM paucity hurts SF, and I'd guess it is less than the observed Elo differential to AlphaZero, but it still looks quite dubious from someone in the know about computer chess. In any event, this is small potatoes compared to the next item.

2. Hardware comparisons

I think this has already been brought up by others elsewhere, but there is no real idea about how strong 4 TPUs are compared to the Stockfish setup. Perhaps this is a good reason that "software championships" are bogus anyway... :)

Google's TPUs are not the best documented except by hype-ish word-of-mouth, but I've seen numbers from 50-100 trillion operations per second, compared to some plausible upper bound of around 1 trillion for the 64 thread SF setup (and in reality, particularly with hash issues, I'd guess the effective number is about 100 billion at most). Again I'm guessing as much as anyway else, and TFLOPS might not be the most relevant measure, but this does tend to indicate (at least) a 100x hardware edge for AlphaZero, possibly something more like 1000x or more. The paper, in my mind rather cheekily, never really attempts to discuss this hardware comparison issue, which I'd think is of critical importance (of course, they are more concerned with the learning aspects, but even handwaving this would be better than ignoring it completely). There are diminishing returns from speed-doubling at some point, but even still I'd guess 100x to be at least 200 Elo here, much more than the observed 100 Elo differential.

If anything, the lack of cluster-sized parallelism in chess (beyond ZCT, Jonny, and a few others) has always annoyed me, as I've never seen the TCEC-esque huger-and-huger machine model to be that relevant. Indeed, in my own Losing Chess research I had to deal with 128 cores across 8 machines (typically), and came up with a (rather simplisitic) approach based upon the goals of that project. I don't foresee any of the top engines becoming (usefully) cluster-capable anytime soon, which is what would seem to be needed to balance the hardware comparison. Maybe we'll just have to admit the single cpu/RAM model is dead. (Yea!).

My guess (from "single machine" being noted) is that the "4 TPUs" are on one chip/module (cf. the CPU/core distinction), but even that is not clear.

Perhaps I could state this criticism succinctly: if you are going to talk about the learning aspects of AlphaZero development, fine - but I don't think that "validating" this by a murky comparison to Stockfish should necessarily be the overall focus of this (one could argue the true domain of the work is at the software/hardware interface, and that you are really showing that TPUs and/or massive parallelism are the future, as opposed to the monolithic model - rather different than a presentation of this as a "mere" software advance). Of course, the 28-0 score will draw the headlines, but to me, it is (at least as described) rather scientifically obscure as to its import. (I've seen such "benchmarketing" in my professional work in computational mathematics, and this is by no means the worst...)

3. The lack of opening book/variety

This again has been noted by others, though I can't say that it's as big of a deal as point 1 or (particularly) point 2. Unfortunately computer chess literature has IMO been historically too happy to accept 100 game tests (which here is fortunately significant) in unclear conditions. Again this seems to me to be something that someone in the know about computer chess would find anomalous, but I don't think it changes the end result too much. The fact that AlphaZero beat Stockfish from each human opening, which it itself had "found" and self-trained on, does raise some flags about bias - from a scientific point of view, they should use an independent collection of openings. They seem(?) to have done this in another 100 game sample (from what I can tell, there are 1300 games, 100 independent bookless games, and 100 each from 12 popular positions from which AlphaZero self-trained) - at the very least, this should be clarified as to what took place, and in fact the paper is IMO rather sloppy in describing its exact conditions. (I do like that the French Defense got murdered though!).

I'm not going to say that creating a separate FishTest endpoint from each of the 12 chosen opening positions would make such Stockfishes much better here, but that would seem to be the more trenchant comparison if it were feasible.

I've read that a "full paper" is in the works, and maybe it will address these issues to a workable degree - yet egad!, this is another one of my pet gripes, research and "advertizing about research" being conflated..

Some more quibbly comments:

In the abstract, the phrase "achieved within 24 hours" is rather unanchored to the computational power used (in fact, the text says it took just 4 hours to outperform SF, which is much more than merely "superhuman" to my mind).

Although I'm no expert on the (expected) rate of convergence, I'd expect a logarithmic x-axis would be more useful in Figure 1.
Posts: 1483
Joined: Thu Jun 10, 2010 4:26 am

Re: Alpha Zero

Postby BB+ » Fri Dec 08, 2017 10:49 am

Perhaps a useful comparison would be to a MCTS scheme that similarly accumulates ca. 50000 "evaluations" per second, but with each "evaluation" not being from a neural net calculation, but rather the result from (say) a "depth 10" search (or whatever the appropriate depth is, to equalize the hardware). At least to my understanding, the neural net policy also involves how the MCTS tree is expanded, but I'm not sure to what degree.
Posts: 1483
Joined: Thu Jun 10, 2010 4:26 am

Re: Alpha Zero

Postby H.G.Muller » Fri Dec 15, 2017 6:16 pm

I don't think that most of your criticism is very valid. Hash size is a pathetic excuse, as seach time usually depends on hash size only as the inverse 12th root. So each doubling of the hash size might only give 5-8 Elo. Usually the search only starts to slow down when the overload of the table reaches a factor 10. And apparently AlphaZero randomizes enough to get good game diversity even against a deterministic Stockfish. There are many different games that AlphaZero won, and not a single game it lost. So I don't think there is any doubt that AlphaZero must have performed at the Stockfish level or better. Which was the only purpose; it is completely irrelevant whether it performed 90 Elo better or 100 Elo better.

And I Also don't think there was a significant hardware advantage. The TPUs are just single chips, meaning they must have similar power usage and number of transistors, as state-of-the-art chips are typically as large as they can be made. The 64 cores used by Stockfish must also have been on more than a single chip. It is just that TPUs do many more multiplications per clock cycle, because they use virtually all their transistors in multiplier units. So the hardware is just different, but hardly more or less powerful. AlphaZero would of course run orders of magnitude slower when it would have had to use Stockfish' hardware, but that just as much holds the other way around; Stockfish would run orders of magnitude, (if not infinitely) slower on a TPU. So that is not really an argument.
Posts: 177
Joined: Sun Jul 14, 2013 10:00 am

Return to Programming and Technical Discussions

Who is online

Users browsing this forum: No registered users and 2 guests