Stockfish Testing Queue

Finished - 56600 tests

15-01-11 lbr pruneqspv diff
LLR: 2.94 (-2.94,2.94) [-3.00,1.00]
Total: 43385 W: 7298 L: 7214 D: 28873
sprt @ 60+0.05 th 1 prune pv nodes in qsearch only (stefan's test does it in search only)
15-01-12 Fis TTPendingSave diff
LLR: 2.97 (-2.94,2.94) [-1.50,4.50]
Total: 7522 W: 1558 L: 1430 D: 4534
sprt @ 15+0.05 th 1 Mark which TT entries are going to be saved to later and use this information in the replacement policy. 2MB STC
15-01-12 Fis TTPendingSave diff
LLR: -2.96 (-2.94,2.94) [0.00,6.00]
Total: 3689 W: 558 L: 643 D: 2488
sprt @ 60+0.05 th 1 Mark which TT entries are going to be saved to later and use this information in the replacement policy. 8MB LTC
15-01-12 lbr qsttprune diff
LLR: -3.22 (-2.94,2.94) [-0.50,4.50]
Total: 26317 W: 5250 L: 5293 D: 15774
sprt @ 15+0.05 th 1 TT prune PV nodes in qsearch. Occasionally truncating the QS part of the PV, so test this one for an elo gain
15-01-12 n_p KingSafety diff
ELO: 2.38 +-3.1 (95%) LOS: 93.6%
Total: 20000 W: 4138 L: 4001 D: 11861
20000 @ 15+0.05 th 1 Check the values obtained from the SPSA-tuning.
15-01-12 sg pruning diff
ELO: 1.51 +-2.6 (95%) LOS: 86.9%
Total: 20000 W: 3061 L: 2974 D: 13965
20000 @ 60+0.05 th 3 LTC: SMP-Measure (as proposed by Joona) allow move pruning on pv nodes. Futility pruning in step 7 is added (pointed out by Lucas). smp test proposed by Joerg
15-01-13 n_p KingSafety diff
LLR: -2.96 (-2.94,2.94) [0.00,4.00]
Total: 156996 W: 32113 L: 31689 D: 93194
sprt @ 15+0.05 th 1
15-01-13 jos less_lmr^ diff
LLR: -2.96 (-2.94,2.94) [-1.50,4.50]
Total: 9004 W: 1712 L: 1794 D: 5498
sprt @ 15+0.15 th 1 Less lmr in endgames, take 2. Again with increased tc increment, to put a little more weight on the endgame.
15-01-13 jos less_lmr diff
LLR: -2.95 (-2.94,2.94) [-1.50,4.50]
Total: 19129 W: 3726 L: 3780 D: 11623
sprt @ 15+0.15 th 1 Take 3.
15-01-14 Fis TTutilization diff
LLR: -2.95 (-2.94,2.94) [-1.50,4.50]
Total: 27314 W: 5509 L: 5540 D: 16265
sprt @ 15+0.05 th 1 Manage handing out empty TT entries in a more uniform way exploiting the fact we already know they will be saved to later. Requires proper initialization. This time also refresh after replacement policy. 2MB STC
15-01-14 Roc FasterContactChecks diff
LLR: 2.95 (-2.94,2.94) [-1.50,4.50]
Total: 26602 W: 5459 L: 5278 D: 15865
sprt @ 15+0.05 th 1 More accurate formula for contact checks. If it fails, will need to adjust the weights. See Git notes.
15-01-14 sg razor_pv diff
ELO: -1.01 +-2.5 (95%) LOS: 21.2%
Total: 30000 W: 5902 L: 5989 D: 18109
30000 @ 15+0.05 th 1 No we allow futility pruning at pv nodes i want measure if other pruning or reductions methods are useful at pv nodes too. Test allow razoring at pv nodes.
15-01-14 sg probcut_pv diff
ELO: 0.98 +-2.5 (95%) LOS: 78.1%
Total: 30000 W: 6032 L: 5947 D: 18021
30000 @ 15+0.05 th 1 Measure effect of allowing probcut on pv nodes
15-01-15 Roc FasterContactChecks diff
LLR: -2.95 (-2.94,2.94) [0.00,6.00]
Total: 18206 W: 2976 L: 2992 D: 12238
sprt @ 60+0.05 th 1 Standard Test at 60+0.05. See git notes and comments.
15-01-15 Roc FasterContactChecks diff
LLR: 2.95 (-2.94,2.94) [-1.50,4.50]
Total: 27590 W: 5611 L: 5428 D: 16551
sprt @ 15+0.05 th 1 Take # 2. A bit more accurate, and faster. See git notes
15-01-15 BRA FasterContactChecks diff
LLR: -2.95 (-2.94,2.94) [-1.50,4.50]
Total: 8691 W: 1538 L: 1620 D: 5533
sprt @ 15+0.05 th 3 Take # 2. A bit more accurate, and faster. See git notes 3 THREADS
15-01-15 lbr step7root diff
ELO: 0.93 +-2.5 (95%) LOS: 76.9%
Total: 30000 W: 5960 L: 5880 D: 18160
30000 @ 15+0.05 th 1 Prune RootNode in Step 7 (child node futility pruning)
15-01-15 lbr nullpv diff
ELO: -0.47 +-2.5 (95%) LOS: 35.3%
Total: 30000 W: 5866 L: 5907 D: 18227
30000 @ 15+0.05 th 1 Null move pruning at PV nodes (including root)
15-01-15 lbr iid diff
LLR: -2.95 (-2.94,2.94) [-0.50,4.50]
Total: 37886 W: 7523 L: 7512 D: 22851
sprt @ 15+0.05 th 1 start IID one depth lower
15-01-15 sg probcut_pv2 diff
ELO: 0.91 +-2.5 (95%) LOS: 76.5%
Total: 30000 W: 6015 L: 5936 D: 18049
30000 @ 15+0.05 th 1 allow probcut at pv nodes (except root node)
15-01-15 sg probcut_pv diff
ELO: -1.51 +-2.3 (95%) LOS: 9.8%
Total: 29311 W: 4775 L: 4902 D: 19634
30000 @ 60+0.05 th 1 LTC: Measure effect of allowing probcut on pv nodes
15-01-15 Roc FasterContactChecks diff
LLR: -2.95 (-2.94,2.94) [-1.50,4.50]
Total: 2552 W: 466 L: 566 D: 1520
sprt @ 15+0.05 th 1 Last experiment before trying a tuning round.
15-01-15 lbr master diff
ELO: -4.61 +-2.3 (95%) LOS: 0.0%
Total: 40000 W: 9087 L: 9618 D: 21295
40000 @ 6+0.02 th 1 verify that pruning PV nodes is a regression at very short tc, as expected
15-01-16 lbr master diff
ELO: 2.08 +-2.1 (95%) LOS: 97.6%
Total: 40000 W: 7416 L: 7177 D: 25407
40000 @ 30+0.05 th 1 verify that pruning PV nodes has no impact in 30+0.05, as expected
15-01-16 lbr nullpv diff
ELO: 0.57 +-2.3 (95%) LOS: 68.9%
Total: 30000 W: 4988 L: 4939 D: 20073
30000 @ 60+0.05 th 1 Null move pruning at PV nodes (including root). Measure at LTC as PV nodes pruning performs best at LTC. Low prio
15-01-16 sg null_pv diff
ELO: -0.87 +-2.3 (95%) LOS: 22.9%
Total: 34620 W: 6825 L: 6912 D: 20883
30000 @ 15+0.05 th 1 Allow null move pruning on PV nodes, but do there always verification search
15-01-16 lbr razor_pv diff
ELO: -0.82 +-2.3 (95%) LOS: 23.9%
Total: 30000 W: 4966 L: 5037 D: 19997
30000 @ 60+0.05 th 1 LTC respin for Stefan, because PV nodes pruning is TC sensitive, and fishtest has nothing to do. No we allow futility pruning at pv nodes i want measure if other pruning or reductions methods are useful at pv nodes too. Test allow razoring at pv nodes.
15-01-16 lbr step7root diff
ELO: -0.56 +-2.3 (95%) LOS: 31.6%
Total: 30000 W: 4975 L: 5023 D: 20002
30000 @ 60+0.05 th 1 LTC for Prune RootNode in Step 7 (child node futility pruning)
15-01-16 SC tempo_endgames diff
LLR: -2.96 (-2.94,2.94) [-1.50,4.50]
Total: 6422 W: 1219 L: 1308 D: 3895
sprt @ 15+0.05 th 1 Disable post null-move trick in the search when 3 or less pieces are on the board, since tempo is accounted for in specialized endgames evaluation. This could make some difference for detecting stalemates or zugzwang in KXK endgames.
15-01-17 lbr pv diff
LLR: -2.94 (-2.94,2.94) [-3.50,0.50]
Total: 23917 W: 4621 L: 4846 D: 14450
sprt @ 15+0.05 th 1 none of the PV patches has a measurable effect individually, so unite them.
15-01-17 lbr razortune diff
37362/40000 iterations
67780/85000 games played
85000 @ 15+0.05 th 1 retune razor margins w/o the hack
15-01-17 SC tempo_endgames diff
LLR: -2.95 (-2.94,2.94) [-1.50,4.50]
Total: 45948 W: 9174 L: 9154 D: 27620
sprt @ 15+0.05 th 1 Disable post null-move trick in the search when 3 or less pieces are on the board, since tempo is accounted for in specialized endgames evaluation. This could make some difference for detecting stalemates or zugzwang in KXK endgames. Now with the correct inequality. It did not change the bench because no KXK positions are reached in bench.
15-01-17 sg spsa_backward_rank diff
36473/40000 iterations
71778/80000 games played
80000 @ 15+0.05 th 1 Tune new rank based penalty for backward pawns.
15-01-17 lbr razortune diff
LLR: -2.95 (-2.94,2.94) [-3.50,0.50]
Total: 17246 W: 3322 L: 3529 D: 10395
sprt @ 15+0.05 th 1 test razoring in its original form, with tuned margins
15-01-17 n_p TuneKingSafety diff
ELO: 0.57 +-2.1 (95%) LOS: 69.9%
Total: 40000 W: 8028 L: 7962 D: 24010
40000 @ 15+0.05 th 1 Tuning attempt on king safety. Using the result of the three SPSA-sessions and the ELO-gain to make a linear extrapolation. Medium amount change.
15-01-17 SC tempo_endgames diff
LLR: -4.03 (-2.94,2.94) [-1.50,4.50]
Total: 22623 W: 3665 L: 3751 D: 15207
sprt @ 60+0.05 th 1 Disable post null-move trick in the search when 3 or less pieces are on the board, since tempo is accounted for in specialized endgames evaluation. This could make some difference for detecting stalemates or zugzwang in KXK endgames. LTC after a almost passed STC.
15-01-17 n_p MedHighKingSafety diff
ELO: 2.41 +-2.2 (95%) LOS: 98.5%
Total: 40000 W: 8345 L: 8068 D: 23587
40000 @ 15+0.05 th 1 Tuning attempt on king safety. Using the result of the three SPSA-sessions and the ELO-gain to make a linear extrapolation. MedHigh amount of change.
15-01-17 n_p HighKingSafety diff
ELO: 2.88 +-2.2 (95%) LOS: 99.5%
Total: 40000 W: 8312 L: 7980 D: 23708
40000 @ 15+0.05 th 1 Tuning attempt on king safety. Using the result of the three SPSA-sessions and the ELO-gain to make a linear extrapolation. High amount of change. Let see if the positive results will hold.
15-01-18 sg backward_rank diff
LLR: -2.95 (-2.94,2.94) [-1.50,4.50]
Total: 8527 W: 1651 L: 1734 D: 5142
sprt @ 15+0.05 th 1 Test tuned values for new rank based backward pawn penalty.
15-01-18 lbr razortune diff
44230/40000 iterations
85000/85000 games played
85000 @ 15+0.05 th 1 tune razor margins *with* the "hack". indeed the "hack" is an elo gain by itself, as shown by previous unsuccessful tuning.
15-01-18 lbr pv diff
ELO: -1.20 +-2.5 (95%) LOS: 17.0%
Total: 30000 W: 5894 L: 5998 D: 18108
30000 @ 15+0.05 th 1 measure pruning at PV nodes. fixed the razoring bug at PV nodes.
15-01-18 lbr pv diff
ELO: -1.63 +-2.2 (95%) LOS: 7.7%
Total: 30000 W: 4848 L: 4989 D: 20163
30000 @ 60+0.05 th 1 measure pruning at PV nodes. fixed the razoring bug at PV nodes.
15-01-18 n_p ExtremeKingSafety diff
ELO: 1.50 +-2.2 (95%) LOS: 91.2%
Total: 40000 W: 8257 L: 8084 D: 23659
40000 @ 15+0.05 th 1 Tuning attempt on king safety. Using the result of the three SPSA-sessions and the ELO-gain to make a linear extrapolation. Extreme amount of change. Let see if the positive results will hold and I get to test the CrazyKingSafety.
15-01-18 lbr razortune diff
LLR: -3.88 (-2.94,2.94) [-0.50,4.50]
Total: 22354 W: 4420 L: 4506 D: 13428
sprt @ 15+0.05 th 1 test tuned values
15-01-18 n_p HighKingSafety diff
LLR: 2.96 (-2.94,2.94) [0.00,4.00]
Total: 11529 W: 2075 L: 1882 D: 7572
sprt @ 60+0.05 th 1 LTC:Tuning attempt on king safety. Using the result of the three SPSA-sessions and the ELO-gain to make a linear extrapolation. High amount of change. These values seems to best at STC,
15-01-18 jki master diff
ELO: 51.71 +-1.9 (95%) LOS: 100.0%
Total: 40000 W: 9633 L: 3723 D: 26644
40000 @ 60+0.05 th 1 SF6_RC1 vs. SF5
15-01-18 jki master diff
ELO: 48.10 +-1.9 (95%) LOS: 100.0%
Total: 40000 W: 8971 L: 3468 D: 27561
40000 @ 60+0.05 th 3 SF6_RC1 vs. SF5, 3 threads
15-01-19 sg backward_rank2 diff
LLR: -2.96 (-2.94,2.94) [-1.50,4.50]
Total: 6505 W: 1230 L: 1319 D: 3956
sprt @ 15+0.05 th 1 After failed spsa tuning try i simple linear rank based penalty for backward pawns
15-01-19 lbr safety diff
LLR: -2.95 (-2.94,2.94) [-0.50,4.50]
Total: 28016 W: 5588 L: 5613 D: 16815
sprt @ 15+0.05 th 1 try a couple of "logic" tweaks on top of Niklas' patch
15-01-19 n_p SPSAKingSafety diff
45803/50000 iterations
91644/100000 games played
100000 @ 60+0.05 th 1 Another SPSA-sessionon on king safety with start values from HighKingSafety. This time LTC.