I finally converted my engine from fail hard to fail soft after thinking about all the potential advantages of doing so, particularly with regards to TT efficiency. Because I don't currently have a framework to test my program vs other engines over many games I'm struggling to tell if its an improvement or not. In test positions, its sometimes faster, sometimes slower. In finding long mates it does seem to be a little worse in finding the exact mates (often reporting longer mates or incorrect mates for more iterations then Fail-hard before finding the correct one). So I have some questions about my Fail-soft implementation that i'm not clear about.
Null Move Pruning:
When nullValue >= beta, I return nullValue. I also added the condition that if nullValue is a mate score I just return beta.
I've read that when it comes to null move some people still return beta when it fails high in a fail-soft implementation.
What is the consensus here?
Futility Pruning:
At the top of the node, bestScore is set to -INF. When in the moves loop, a move qualifies for futility pruning and staticEval + futilityMargin < alpha, I set bestScore to
staticEval + futilityMargin, IF that exceeds the current bestScore. I however leave bestMove as is. BestMove is only updated in this case when a move is searched and returns a value > bestScore.
Is this standard practice?
Mate Distance Pruning:
In my fail-hard program I did the following at the top of AB search
if ( alpha >= MATE - ply - 1 )
return alpha;
if ( beta <= -MATE + ply )
return beta;
This still gives me an improvement in fail-soft when hunting for mates. However, when I also change the bounds by using this type of MDP which I see everyone using:
alpha = max(-MATE + ply, alpha);
beta = min(MATE - ply - 1, beta);
if (alpha >= beta)
return alpha;
Now my PV is sometimes cut short. I see a mate score but don't see the last move that delivers mate in my PV.
And lastly...I have not implemented Aspiration windows at the root yet (I use PVS btw). Is it possible I will see bigger gains when I do so, as I will have a better sense of how much to widen the window when a move fails at the root?
Fail Soft best practices
-
- Posts: 190
- Joined: Sun Jul 14, 2013 10:00 am
- Real Name: H.G. Muller
Re: Fail Soft best practices
First priority should be to get a good setup for testing against other engines; fail-hard vs fail-soft is totally unimportant compared to that...
Null-move pruning is usually only attempted when staticEval >= beta. In that case it is better to return beta when the null move scores above it. If you would return a score S > beta, and it would go into the TT, and you hit a position again, but with S >= beta > staticEval the TT would suggest a cutoff, while in a real search the null move would not even have been tried. So the TT would not reproduce the result of a real search. Which is bad for search stability.
You should not expect any miracles from the difference between fail soft and fail hard. Alpha-beta search is very 'lazy', and harldy ever returns a score much larger than what was needed, (i.e. alpha or beta), in fail soft.
Null-move pruning is usually only attempted when staticEval >= beta. In that case it is better to return beta when the null move scores above it. If you would return a score S > beta, and it would go into the TT, and you hit a position again, but with S >= beta > staticEval the TT would suggest a cutoff, while in a real search the null move would not even have been tried. So the TT would not reproduce the result of a real search. Which is bad for search stability.
You should not expect any miracles from the difference between fail soft and fail hard. Alpha-beta search is very 'lazy', and harldy ever returns a score much larger than what was needed, (i.e. alpha or beta), in fail soft.
Re: Fail Soft best practices
Thanks Harm. I agree, if you are hashing the null move fail high it would create an inconsistency when beta changes and you do a TT cutoff based on a null move that would not have been attempted. I guess one question that comes to mind is...could you return the nullValue but not store the fail-high in the TT? I never saw much of a speedup hashing null move fail highs anyways. But for now I think I will hash them again and return beta regardless.
I found some other issues with my fail-soft which I'm pretty sure are correct now having thought about it some more. When failing low with futility pruning of non-capture moves in the main search, our assumption is that futilityMargin (a function of remaining depth) is the maximum positional gain for the move. Therefore, the value for this move which will be compared to bestValue (which is -INF at the top of the node) is staticEval + futilityMargin. So when the node fails low we in fact have true upper bound to return and store in the hash table.
And what I forgot to do in the qSearch. When pruning captures, if standpat + valueOfCapturedPiece + futilityMargin < alpha and it beats the current bestValue then we can improve bestValue. Also, when standpat + futilityMargin < alpha and SEE(move) <= 0, then at best the value for this move is standpat + futilityMargin, which we now compare to bestValue. I wasn't updating bestValue based on the upper bound of these pruning techniques.
Another bug I had in the main search was updating bestMove when a move improved bestValue. I changed it to only do so when a move beats alpha.
I do need to create a proper testing framework. Its becoming unwieldy to decide if a new feature is a gain or not. I don't know where to begin. I assume I have to find a GUI that will allow me to play my program against other engines. Do you have any recommendations for someone using a Mac?
I found some other issues with my fail-soft which I'm pretty sure are correct now having thought about it some more. When failing low with futility pruning of non-capture moves in the main search, our assumption is that futilityMargin (a function of remaining depth) is the maximum positional gain for the move. Therefore, the value for this move which will be compared to bestValue (which is -INF at the top of the node) is staticEval + futilityMargin. So when the node fails low we in fact have true upper bound to return and store in the hash table.
And what I forgot to do in the qSearch. When pruning captures, if standpat + valueOfCapturedPiece + futilityMargin < alpha and it beats the current bestValue then we can improve bestValue. Also, when standpat + futilityMargin < alpha and SEE(move) <= 0, then at best the value for this move is standpat + futilityMargin, which we now compare to bestValue. I wasn't updating bestValue based on the upper bound of these pruning techniques.
Another bug I had in the main search was updating bestMove when a move improved bestValue. I changed it to only do so when a move beats alpha.
I do need to create a proper testing framework. Its becoming unwieldy to decide if a new feature is a gain or not. I don't know where to begin. I assume I have to find a GUI that will allow me to play my program against other engines. Do you have any recommendations for someone using a Mac?
-
- Posts: 190
- Joined: Sun Jul 14, 2013 10:00 am
- Real Name: H.G. Muller
Re: Fail Soft best practices
You might get the inconsistency in the underlying node, then, in case the null-move score propagates to there. Moves refuted by the null move might get a lower upper-bound (below alpha) then when you would actually search them with a higher alpha, and there may not be any better moves.kickstone wrote:Thanks Harm. I agree, if you are hashing the null move fail high it would create an inconsistency when beta changes and you do a TT cutoff based on a null move that would not have been attempted. I guess one question that comes to mind is...could you return the nullValue but not store the fail-high in the TT?
Indeed, futility pruning is just putting an upper-bound on the move's score based on an estimate.I found some other issues with my fail-soft which I'm pretty sure are correct now having thought about it some more. When failing low with futility pruning of non-capture moves in the main search, our assumption is that futilityMargin (a function of remaining depth) is the maximum positional gain for the move. Therefore, the value for this move which will be compared to bestValue (which is -INF at the top of the node) is staticEval + futilityMargin. So when the node fails low we in fact have true upper bound to return and store in the hash table.
This would give you a less tight upper-bound to the score than just using bestScore = standpat, and I think the latter would be justified. As is a real search you will require staticEval >= beta, without adding any expected positional gain from your best unsearched move.Also, when standpat + futilityMargin < alpha and SEE(move) <= 0, then at best the value for this move is standpat + futilityMargin, which we now compare to bestValue.
Well, I never had any Mac, but if I had I would probably use XBoard (which is available as OSX App).I do need to create a proper testing framework. Its becoming unwieldy to decide if a new feature is a gain or not. I don't know where to begin. I assume I have to find a GUI that will allow me to play my program against other engines. Do you have any recommendations for someone using a Mac?
Re: Fail Soft best practices
I'm a little confused about this. This is what i'm doing in qSearch:This would give you a less tight upper-bound to the score than just using bestScore = standpat, and I think the latter would be justified. As is a real search you will require staticEval >= beta, without adding any expected positional gain from your best unsearched move.
Code: Select all
bestValue = -INF
if ( !inCheck ) {
standPat = check eval hash table OR call eval
if ( standPat > alpha ) {
if ( standPat >= beta ) return standPat
alpha = standPat
}
bestValue = standPat
}
Moves Loop:
if ( prunableCapture conditions ) {
Capture Prune 1:
if ( standPat + futilityMargin + capturedPieceValue < alpha ) {
increase bestValue if standPat + futilityMargin + capturedPieceValue > bestValue
continue
}
Capture Prune 2:
if ( standPat + futilityMargin < alpha AND SEE(move) <= 0 ) {
increase bestValue if standPat + futilityMargin > bestValue
continue
}
}
Since I don't prune captures when in check then bestValue = standPat when I examine the two ways of pruning captures. If I understand, you think in the case of Capture Prune 2, I should NOT be comparing standPat + futilityMargin to bestValue before I skip the move? If the move has at most a SEE of zero then the upper bound on this moves value is standPat + futilityMargin. Do you not agree?
By the way. I do have one other way of pruning captures in qSearch. After 6 plys of qSearch I only allow recaptures, which I believe many people do. How do most people implement this? Six plys of regular qSearch then the 7th move must be a recapture? Or Five plys of regular qSearch then the 6th move must be a recapture? Or does it really not matter and one is just slightly more aggressive than the other?