Page 1 of 1

LOS --> Draws are irrelevant

Posted: Fri Jan 24, 2014 8:59 pm
by User923005
According to this and many other sources:
http://talkchess.com/forum/viewtopic.ph ... 91&t=51003
draws are irrelevant for calculation of likelihood of superiority.

So, let's take it to the extreme.

We run one quadrillion games between engine A and engine B.
We get 1000000000000000 - 11 = 999999999999989 draws
We get 10 wins for engine A
We get 1 loss for engine A.

LOS says A is definitely much better.

Those engines are dead even. Ignoring the draws makes no sense to me.

Re: LOS --> Draws are irrelevant

Posted: Fri Jan 24, 2014 9:38 pm
by BB+
Ignoring draws is correct for the common model of LOS (though LOS itself is assuming a prior distribution).

In your example, the point is that the variance is quite low. For instance, switching to Elo units (maybe not the best choice, but), engine A is adjudged only 3.82*10^(-12) Elo better, but the standard deviation is even less than this, about 2.7 times smaller, in line with the 12/2^11 or 1 in 170 [thus about 2.78 deviations] expectation of a 10:1 or worse result from wins:losses from a fair coin.

LOS does not answer how much an engine might be better, only the yes/no as to whether it might be, and for this draws do not matter.

To make your example more computable [w/o rows of zeros obscuring it], take 999989 draws, 10 losses, 1 win, then the win% is 0.4999955, so the difference from 1/2 [the bias] is 0.0000045, while the deviation is 0.00000166, or about 2.7 times less than the bias from 1/2.

Re: LOS --> Draws are irrelevant

Posted: Fri Jan 24, 2014 9:49 pm
by User923005
I suppose that my argument is that we simply cannot ignore draws completely.
At some point, it is obvious that if the draw count dominates, it adds uncertainty to the measure.
Eventually, if the count of wins + losses is extremely small compared to the draw count, then the wins + losses are very likely noise compared to the general trend.
So I think that the draws have to contribute to the width of the standard deviation, at least. Otherwise, the calculation produces a result that I would not trust.