Stockfish Testing Queue

Finished - 1408 tests

18-11-03 31m mob_tworooks diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 7352 W: 1540 L: 1635 D: 4177
sprt @ 10+0.1 th 1 On the forum, @snicolet proposed that when we have two rooks, the mobility bonus for each should be based on the minimum mobility of the two. I worry that this is too drastic an effect, so let's try using the harmonic mean f(x, y) = 2xy/(x+y), which trends toward the lower value but not as much as the minimum. Include a special case where x = y = 0 to prevent division by 0.
18-11-03 31m stochastic_queen3 diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 16011 W: 3394 L: 3447 D: 9170
sprt @ 10+0.1 th 1 If double noise is better, try doubling it again.
18-11-03 31m stochastic_queen3 diff
LLR: -2.97 (-2.94,2.94) [0.00,5.00]
Total: 48882 W: 10547 L: 10441 D: 27894
sprt @ 10+0.1 th 1 Try double effect/noise on the non-pureStaticEval, non-TT version.
18-11-03 31m stochastic_bishop diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 24822 W: 5397 L: 5407 D: 14018
sprt @ 10+0.1 th 1 Try this new approach (i.e., not affecting TT) with @snicolet's original idea, stochastic bishops.
18-11-03 31m stochastic_BQ diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 16785 W: 3585 L: 3634 D: 9566
sprt @ 10+0.1 th 1 Double noise; both the bishop and queen cases.
18-11-03 31m stochastic_BQ diff
LLR: -2.97 (-2.94,2.94) [0.00,5.00]
Total: 8317 W: 1765 L: 1856 D: 4696
sprt @ 10+0.1 th 1 Add noise in both the bishop and queen cases. Do not change pureStaticEval or TT.
18-11-03 31m stochastic_queen3 diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 33867 W: 7255 L: 7222 D: 19390
sprt @ 10+0.1 th 1 Bugfix to most recent test.
18-11-03 31m stochastic_queen3 diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 13987 W: 2977 L: 3040 D: 7970
sprt @ 10+0.1 th 1 The version that has played 38K games and counting also had this bug: & 8 rather than & 7, which completely changes the noise (either 0 or 8, rather than any number from 0 to 7). I am sorry for not catching this earlier and frankly surprised it is performing this well anyway. Try patching it and hope for a green.
18-11-02 31m stochastic_queen3 diff
LLR: -2.94 (-2.94,2.94) [0.00,5.00]
Total: 38959 W: 8329 L: 8271 D: 22359
sprt @ 10+0.1 th 1 Don't change pureStaticEval; thus the TT is not affected. Also, apply the added noise to the result of a TT hit as well. Try just the ss->ply < 15 * ONE_PLY version for now.
18-11-03 31m stochastic_queen3 diff
LLR: -1.69 (-2.94,2.94) [0.00,5.00]
Total: 6128 W: 1273 L: 1318 D: 3537
sprt @ 10+0.1 th 1 I seem to remember hearing that TT is more important at LTC. (Please correct me if I'm wrong.) Perhaps the source of poor scaling was the noisy eval being saved to TT; the search.cpp versions that don't do this seem to perform better. Remove the ss->ply restriction and try this by itself.
18-11-03 31m stochastic_queen3 diff
LLR: 0.20 (-2.94,2.94) [0.00,5.00]
Total: 996 W: 234 L: 220 D: 542
sprt @ 10+0.1 th 1 I seem to remember hearing that TT is more important at LTC. (Please correct me if I'm wrong.) Perhaps the source of poor scaling was the noisy eval being saved to TT; the search.cpp versions that don't do this seem to perform better. Remove the ss->ply restriction and try this by itself.
18-11-02 31m stochastic_queen2 diff
LLR: -2.97 (-2.94,2.94) [0.00,5.00]
Total: 18599 W: 3971 L: 4012 D: 10616
sprt @ 10+0.1 th 1 Double the noise.
18-11-02 31m stochastic_queen3 diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 11595 W: 2488 L: 2562 D: 6545
sprt @ 10+0.1 th 1 Add noise to pureStaticEval too. This differs from stochastic_queen2 by also changing the eval resulting from a TT hit.
18-11-02 31m stochastic_queen2 diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 27787 W: 5941 L: 5937 D: 15909
sprt @ 10+0.1 th 1 ss->ply < 15.
18-11-02 31m stochastic_queen2^ diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 13218 W: 2811 L: 2878 D: 7529
sprt @ 10+0.1 th 1 ss->ply < 10.
18-11-02 31m stochastic_queen2^^ diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 10648 W: 2220 L: 2299 D: 6129
sprt @ 10+0.1 th 1 I think this is closer to what I intended. (Can anyone confirm?) Add the endgame noise when we are less than 5 plies into the search, not the last 5 plies of a search.
18-11-02 31m stochastic_queen2^^ diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 27470 W: 5813 L: 5811 D: 15846
sprt @ 10+0.1 th 1 I have never made original changes to search.cpp, so there are probably errors: please check and let me know. Seeing that the stochastic endgame value has given several STC greens that failed to scale, I think it is likely that it needs to only be applied at STC depths (about depth 16). Try only adding it if depth < 5.
18-11-02 31m stochastic_queen2 diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 22563 W: 4796 L: 4818 D: 12949
sprt @ 10+0.1 th 1 Depth < 15. (STC usually completes around depth 16, according to @vondele's data.)
18-11-02 31m stochastic_queen2^ diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 13158 W: 2793 L: 2860 D: 7505
sprt @ 10+0.1 th 1 Depth < 10.
18-11-02 31m stochastic_all diff
LLR: -2.97 (-2.94,2.94) [0.00,5.00]
Total: 33481 W: 7099 L: 7069 D: 19313
sprt @ 10+0.1 th 1 This same noise component, with the condition that count<Pt>() == 1, passed STC when applied to both bishops (@snicolet, April) and queens (me, today). Can it be applied more generally? Apply to knights, bishops, rooks, and queens. Note: Us == WHITE is needed because otherwise the same noise is added to both sides' evaluations and cancels out.
18-11-02 31m stochastic_all diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 20172 W: 4316 L: 4349 D: 11507
sprt @ 10+0.1 th 1 The previous test on this branch adds endgame noise once for each piece type, and may add noise multiple times. This version restricts this: any piece type may trigger the added noise (if count<Pt>() == 1), but noise cannot be added more than once to the score for this position.
18-11-02 31m stochastic_queen diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 16521 W: 2633 L: 2691 D: 11197
sprt @ 60+0.6 th 1 LTC. Endgame only.
18-11-01 31m stochastic_queen^^ diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 38208 W: 8232 L: 8177 D: 21799
sprt @ 10+0.1 th 1 I wonder if the ideas from @snicolet's stochastic bishop patch from April could help evaluate materially imbalanced positions where only one side has a queen, which were discussed on the forum recently. Add a small degree of noise to both middlegame and endgame evaluation. For a discussion of why this strange-looking idea could work, see @snicolet's comments here: https://github.com/snicolet/Stockfish/compare/04a228f...b0a8a4e
18-11-01 31m stochastic_queen^ diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 27451 W: 5841 L: 5839 D: 15771
sprt @ 10+0.1 th 1 Middlegame only.
18-11-01 31m stochastic_queen diff
LLR: 2.95 (-2.94,2.94) [0.00,5.00]
Total: 13252 W: 2912 L: 2718 D: 7622
sprt @ 10+0.1 th 1 Endgame only.
18-10-27 31m EscapeBishop^ diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 33995 W: 7400 L: 7365 D: 19230
sprt @ 10+0.1 th 1 Merge new master and return to this idea. Try not counting enemy-pawn-attacked or friendly-occupied squares as "escapes."
18-10-27 31m EscapeBishop diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 26096 W: 5592 L: 5596 D: 14908
sprt @ 10+0.1 th 1 Try considering our "home ranks" to be the back three ranks, rather than the back two.
18-10-27 31m EscapeBishop diff
LLR: -2.97 (-2.94,2.94) [0.00,5.00]
Total: 21959 W: 4780 L: 4804 D: 12375
sprt @ 10+0.1 th 1 In my October 20 tests, it appeared that larger penalties were better. Try further increasing from S(20, 0) to S(30, 0).
18-10-21 31m combo_qv4_rpsqt_ver diff
LLR: -2.95 (-2.94,2.94) [0.00,4.00]
Total: 61268 W: 13310 L: 13234 D: 34724
sprt @ 10+0.1 th 1 We have recently had two very promising [0, 4] attempts: Q_value4 by @SFisGOD (LTC 153K and counting, but currently with LLR < -2) and rookpsqt5 by @Kurtbusch (LTC 114K yellow). They might be enough for a green combo by themselves, but I would also like to include @DU-jdto's verification (73K yellow) from June 21, which has also performed well in my recent combo attempts. Hopefully all three are enough to pass.
18-10-20 31m EscapeBishop diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 64411 W: 13952 L: 13770 D: 36689
sprt @ 10+0.1 th 1 So far, penalty is better than bonus and bigger is better. Try increasing to S(15, 0).
18-10-20 31m EscapeBishop^ diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 17194 W: 2701 L: 2757 D: 11736
sprt @ 60+0.6 th 1 LTC for S(20, 0).
18-10-20 31m EscapeBishop diff
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 7653 W: 1723 L: 1555 D: 4375
sprt @ 10+0.1 th 1 Further increase to S(20, 0).
18-10-20 31m EscapeBishop diff
LLR: -2.97 (-2.94,2.94) [0.00,5.00]
Total: 34541 W: 7487 L: 7450 D: 19604
sprt @ 10+0.1 th 1 S(10, 0) penalty.
18-10-20 31m EscapeBishop^^ diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 10400 W: 2176 L: 2256 D: 5968
sprt @ 10+0.1 th 1 S(10, 0) bonus.
18-10-20 31m EscapeBishop^^^ diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 3782 W: 734 L: 846 D: 2202
sprt @ 10+0.1 th 1 Inspired by a post by Mindbreaker on the forum's suggestions thread. I would like to try the same for rooks and queens, but to avoid too many simultaneous tests will start with just bishops. If our bishop is advanced (rank 4 or greater) and could be attacked by a pawn, S(5, 0) bonus if it can retreat to our rank 1 or 2.
18-10-20 31m EscapeBishop^ diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 3836 W: 771 L: 883 D: 2182
sprt @ 10+0.1 th 1 S(5, 0) penalty if the bishop cannot escape.
18-10-19 31m WeakQueen_undefended diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 21650 W: 4597 L: 4623 D: 12430
sprt @ 10+0.1 th 1 WeakQueen currently considers blocked threats from enemy bishops and rooks. If our queen lacks any defenders, also consider blocked enemy queen attacks.
18-10-18 31m SliderRank3 diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 16960 W: 3596 L: 3645 D: 9719
sprt @ 10+0.1 th 1 My first attempt based on Bryan's analysis of Michael Chaly's posted SF losses and @Kurtbusch's tests. This test differs in three ways: (1) I think it is important to include queens, not just rooks--SF's missed moves span both about equally. (2) It may be important that, as in those games, we have a queen but the opponent does not. Require this condition. (3) Don't require that the piece actually be on rank 3--give bonus for mobility on rank 3 regardless, e.g., if our queen attacks three rank 3 squares remotely, give small bonus anyway.
18-10-18 31m psqt_RQ diff
LLR: -2.96 (-2.94,2.94) [0.00,4.00]
Total: 14083 W: 2989 L: 3098 D: 7996
sprt @ 10+0.1 th 1 I don't think I've modified PSQT before, so I apologize for any errors. Use the +4 to Rook rank 3 that produced @Kurtbusch's 114K STC yellow, but apply the same increase also to Queen.
18-10-18 31m SliderRank3 diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 10411 W: 2199 L: 2279 D: 5933
sprt @ 10+0.1 th 1 Broader scope: do not require that we have a queen, only that the opponent does not.
18-10-18 31m overload_extraQ5 diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 31381 W: 6731 L: 6710 D: 17940
sprt @ 10+0.1 th 1 Use bool(), but double the effect.
18-10-18 31m overload_extraQ5^^ diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 29107 W: 6291 L: 6280 D: 16536
sprt @ 10+0.1 th 1 Queen overload still appears to be a slight gain, even excluding WeakQueen cases. Try a few quick variations on the better-performing (no WQ exclusion) tests. Use more_than_one() rather than bool().
18-10-18 31m overload_extraQ5^ diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 17273 W: 3688 L: 3735 D: 9850
sprt @ 10+0.1 th 1 Use popcount() rather than bool().
18-10-18 31m overload_extraQ4 diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 39595 W: 8589 L: 8527 D: 22479
sprt @ 10+0.1 th 1 Maybe the problem with queen-specific overload is the obvious overlap with WeakQueen (queen-pinned pieces are likely also overload targets). Exclude this overlap.
18-10-18 31m overload_extraQ4 diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 45765 W: 9831 L: 9740 D: 26194
sprt @ 10+0.1 th 1 In light of recent attempts to change queen values and penalties, most successful or nearly successful, retry a simple implementation of queen-specific Overload.
18-10-15 31m NvsR1 diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 40538 W: 6540 L: 6505 D: 27493
sprt @ 60+0.6 th 1 LTC for @Vizvezdenec. Take 1.
18-10-14 31m combo_ver_wq_asp diff
LLR: -2.95 (-2.94,2.94) [0.00,4.00]
Total: 26461 W: 5703 L: 5763 D: 14995
sprt @ 10+0.1 th 1 I am surprised at how poorly the tweak to delta performed--try the opposite change as a sanity check.
18-10-14 31m combo_ver_wq_asp diff
LLR: -2.95 (-2.94,2.94) [0.00,4.00]
Total: 12389 W: 2564 L: 2679 D: 7146
sprt @ 10+0.1 th 1 Of my recent combos that have not passed LTC, the best STC so far is combo_ver_wq (100K yellow). Attempt to also revive this small tweak by @candirufish, which despite its small size produced a 71K LTC yellow on May 27. Perhaps this will be enough to convert this combo into a passing patch.
18-10-13 31m combo_cp_ver_wq diff
LLR: -2.95 (-2.94,2.94) [0.00,4.00]
Total: 26157 W: 5662 L: 5723 D: 14772
sprt @ 10+0.1 th 1 It appears that the top and uh tweaks no longer perform well. Try combining cp, wq, and ver, since each individual pair had a positive score (if I recall correctly).
18-10-13 31m combo_uh_ver diff
LLR: -2.95 (-2.94,2.94) [0.00,4.00]
Total: 30404 W: 6494 L: 6539 D: 17371
sprt @ 10+0.1 th 1 Revive sg's update_history (June 3, 73K LTC yellow). Combo with @DU-jdto's 73K yellow mentioned previously.