Stockfish Testing Queue

Finished - 29418 tests

14-11-05 fwi optimise_bestMove_chang diff
19615/20000 iterations
40000/40000 games played
40000 @ 15+0.05 th 1 Looking at the graphs, I don't think it has stabilsed, yet restarting at best values of previous spsa spsa optimise bestmove
14-11-06 lbr ttmove diff
LLR: 2.95 (-2.94,2.94) [-4.00,0.00]
Total: 21498 W: 4327 L: 4246 D: 12925
sprt @ 15+0.05 th 1 prune ttmove. verify that it doesn't regress with strong hash pressure.
14-11-04 lbr RRQ diff
LLR: 2.95 (-2.94,2.94) [0.00,4.00]
Total: 51053 W: 8670 L: 8353 D: 34030
sprt @ 60+0.05 th 1 RRQ: take 1
14-11-04 lbr ttmove diff
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 82149 W: 14044 L: 14023 D: 54082
sprt @ 60+0.05 th 1 prune ttmove
14-11-04 fwi optimise_bestMove_chang diff
19734/20000 iterations
40000/40000 games played
40000 @ 15+0.05 th 1 spsa optimise bestmove
14-11-04 lbr RRQ diff
LLR: 3.06 (-2.94,2.94) [0.00,4.00]
Total: 38481 W: 6228 L: 5952 D: 26301
sprt @ 15+0.05 th 1 RRQ: take 1
14-11-04 gli no_pv_pruning diff
ELO: 0.12 +-2.9 (95%) LOS: 53.3%
Total: 20000 W: 3654 L: 3647 D: 12699
20000 @ 15+0.05 th 3 (3 threads + 8mb hash) Measure impact of not TT pruning in PV nodes
14-11-01 jki master diff
ELO: 31.00 +-1.9 (95%) LOS: 100.0%
Total: 40000 W: 8044 L: 4484 D: 27472
40000 @ 60+0.05 th 1 Regression Test (after Retire PawnsFileSpan)
14-11-04 lbr ttmove diff
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 49264 W: 10076 L: 10007 D: 29181
sprt @ 15+0.05 th 1 prune ttmove
14-11-04 fwi timemanagement_depthbas diff
LLR: -2.97 (-2.94,2.94) [0.00,6.00]
Total: 5262 W: 822 L: 900 D: 3540
sprt @ 60+0.05 th 1 using spsa tuned values. Simplification. Simply use ~ half of standard time allotment, when previous depth has been reached and bestMoveStability is high.
14-11-04 fwi timemanagement_depthbas diff
LLR: 2.96 (-2.94,2.94) [-1.50,4.50]
Total: 10950 W: 2281 L: 2143 D: 6526
sprt @ 15+0.05 th 1 using spsa tuned values. Simplification. Simply use ~ half of standard time allotment, when previous depth has been reached and bestMoveStability is high.
14-11-04 gli no_pv_pruning diff
ELO: -4.71 +-2.9 (95%) LOS: 0.1%
Total: 20000 W: 3528 L: 3799 D: 12673
20000 @ 15+0.05 th 3 (3 threads) Measure impact of not TT pruning in PV nodes
14-11-04 uri changeLMR diff
LLR: -2.95 (-2.94,2.94) [-1.50,4.50]
Total: 16646 W: 3270 L: 3331 D: 10045
sprt @ 15+0.05 th 1 I test increasing LMR I also think to test changes like it with fixed number of games also with very small hash to see if it is better to prune more or less with very small hash(but first normal test)
14-11-03 Fis NoTTBias diff
LLR: -2.95 (-2.94,2.94) [-1.50,4.50]
Total: 25840 W: 5201 L: 5236 D: 15403
sprt @ 15+0.05 th 1 Remove order based replacement bias in TT. See for now closed pull request #83.
14-11-02 mco min_split_depth diff
LLR: -2.96 (-2.94,2.94) [-1.50,4.50]
Total: 13140 W: 2323 L: 2394 D: 8423
sprt @ 15+0.05 th 15 Decrease min split depth with high number of threads (set at priority 1 to allocate the few capable machines, should not slow down other tests)
14-11-01 jki check_time diff
LLR: 2.96 (-2.94,2.94) [-6.00,0.00]
Total: 63644 W: 10623 L: 10828 D: 42193
sprt @ 60+0.05 th 1 Regression test: check_time
14-11-02 fwi timemanagement_depthbas diff
19733/20000 iterations
40000/40000 games played
40000 @ 15+0.05 th 1 Simplification. Simply use ~ half of standard time allotment, when previous depth has been reached and bestMoveStability is high.
14-11-03 lbr all_threats_tuned_B diff
LLR: 0.29 (-2.94,2.94) [0.00,6.00]
Total: 30503 W: 5356 L: 5195 D: 19952
sprt @ 60+0.05 th 1 respin, as suggested by marco, to verify it's not a lucky run due to overtesting. use final spsa tuned values instead of intermediate ones : LTC
14-11-03 Mys no_sec diff
LLR: -2.95 (-2.94,2.94) [-3.00,1.00]
Total: 18156 W: 3665 L: 3861 D: 10630
sprt @ 15+0.05 th 1 Remove second push low pri.
14-11-02 sni pawns3 diff
LLR: -1.87 (-2.94,2.94) [0.00,4.00]
Total: 102102 W: 20812 L: 20534 D: 60756
sprt @ 15+0.05 th 1 SPRT test with tuned values for doubled or isolated a and b pawns
14-11-02 gli accurate_pv diff
ELO: 1.42 +-3.1 (95%) LOS: 81.9%
Total: 20000 W: 4078 L: 3996 D: 11926
20000 @ 15+0.05 th 1 Measure ELO of accurate PV version 4 (allow TT-refined values to be used in PV q-search)
14-11-02 gli no_pv_pruning diff
ELO: 3.64 +-23.9 (95%) LOS: 61.7%
Total: 286 W: 52 L: 49 D: 185
20000 @ 15+0.05 th 3 (3 threads + 8mb hash) Measure impact of not TT pruning in PV nodes
14-11-02 lbr maxply diff
ELO: -5.35 +-6.6 (95%) LOS: 5.7%
Total: 10000 W: 4658 L: 4812 D: 530
10000 @ 9+0.03 th 3 Crash test for maxply, as suggested by Joona.
14-11-02 gli accurate_pv diff
ELO: -7.35 +-5.0 (95%) LOS: 0.2%
Total: 7327 W: 1403 L: 1558 D: 4366
20000 @ 15+0.05 th 1 Measure ELO of accurate PV version 3 (fix undefined behavior and use correct bench)
14-11-02 gli accurate_pv diff
ELO: -18.19 +-10.5 (95%) LOS: 0.0%
Total: 1606 W: 265 L: 349 D: 992
20000 @ 15+0.05 th 3 (3 threads) Measure ELO of accurate PV version 3 (fix undefined behavior and use correct bench)
14-11-02 Roc RookOnPawnsV2 diff
LLR: -2.95 (-2.94,2.94) [-1.50,4.50]
Total: 8076 W: 1635 L: 1720 D: 4721
sprt @ 15+0.05 th 1 Distinguishing three cases for RookOnPawn, based on SPSA values. Last test I mixed up my params values (RookFromBehind is now greater than RookFromFront as one would expect...) and retuned again including rookopenfile
14-11-02 aji all_threats_tuned_B diff
LLR: 2.97 (-2.94,2.94) [0.00,6.00]
Total: 13563 W: 2402 L: 2232 D: 8929
sprt @ 60+0.05 th 1 use final spsa tuned values instead of intermediate ones : LTC
14-11-02 lbr pawns diff
LLR: -2.96 (-2.94,2.94) [-3.00,1.00]
Total: 27489 W: 5522 L: 5735 D: 16232
sprt @ 15+0.05 th 1 test tuned values
14-11-01 fwi timemanagement_depthbas diff
LLR: 2.97 (-2.94,2.94) [-1.50,4.50]
Total: 45342 W: 9216 L: 8983 D: 27143
sprt @ 15+0.05 th 1 Simplification. Simply use ~ half of standard time allotment, when previous depth has been reached and bestMoveStability is high.
14-11-01 lbr pawns diff
28088/25000 iterations
58303/61000 games played
61000 @ 12+0.04 th 1 SPSA tuning of Doubled+Isolated+Backward, without file dependency.
14-11-02 aji all_threats_tuned_B diff
LLR: 2.96 (-2.94,2.94) [-1.50,4.50]
Total: 6074 W: 1284 L: 1160 D: 3630
sprt @ 15+0.05 th 1 Use final spsa tuned values instead of intermediate ones : STC
14-11-01 sni pawns2 diff
19172/20000 iterations
39161/40000 games played
40000 @ 15+0.05 th 1 Tuning doubled b and g pawns penalties around the values that passed STC
14-11-01 mco min_split_depth diff
LLR: -2.96 (-2.94,2.94) [-1.50,4.50]
Total: 1395 W: 218 L: 320 D: 857
sprt @ 15+0.05 th 15 Increase min split depth with high number of threads (set at priority 1 to allocate the few capable machines, should not slow down other tests)
14-10-31 sni pawns2 diff
LLR: -0.86 (-2.94,2.94) [0.00,4.00]
Total: 38749 W: 6650 L: 6569 D: 25530
sprt @ 60+0.05 th 1 LTC: Doubled pawns on b and g files may be somewhat less handicapping, see Larry Kaufman’s article on http://home.comcast.net/~danheisman/Articles/doubled_pawns.htm
14-11-01 gli no_pv_pruning diff
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 12119 W: 2126 L: 1991 D: 8002
sprt @ 60+0.05 th 1 LTC Simplification: No TT pruning in PV nodes
14-11-01 Mys opposed diff
LLR: -2.97 (-2.94,2.94) [-1.50,4.50]
Total: 11392 W: 2338 L: 2414 D: 6640
sprt @ 15+0.05 th 1 Maybe it's better having opposed pawns on the opponents half of the board
14-10-31 gli no_pv_pruning diff
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 121082 W: 24243 L: 24306 D: 72533
sprt @ 15+0.05 th 1 Simplification: No TT pruning in PV nodes
14-10-31 lbr no_pawn_span^ diff
LLR: 2.94 (-2.94,2.94) [-3.00,1.00]
Total: 60034 W: 10359 L: 10303 D: 39372
sprt @ 60+0.05 th 1 LTC: Retire PawnSpan: take 2 (stopped by bogus worker leszek?)
14-10-31 pro even_cluster_use diff
LLR: -2.96 (-2.94,2.94) [0.00,4.00]
Total: 12686 W: 2491 L: 2605 D: 7590
sprt @ 15+0.05 th 1 Make all entries in a TT cluster equally likely to get replaced: strong hash pressure, this time. SPRT, to stop early if the patch has no clear benefit (LOS 23%)
14-11-01 Roc RookOnPawnsV2 diff
LLR: -2.96 (-2.94,2.94) [-1.50,4.50]
Total: 5670 W: 1122 L: 1214 D: 3334
sprt @ 15+0.05 th 1 Distinguishing three cases for RookOnPawn, based on SPSA values
14-10-31 sni little_combinaisons9 diff
LLR: -2.97 (-2.94,2.94) [-1.50,4.50]
Total: 20045 W: 3992 L: 4044 D: 12009
sprt @ 15+0.05 th 1 Little combinations on under-protected pieces. Take 3: same idea, but now targeting pieces too.
14-10-31 sni little_combinaisons9 diff
LLR: -2.95 (-2.94,2.94) [-1.50,4.50]
Total: 6630 W: 1329 L: 1418 D: 3883
sprt @ 15+0.05 th 1 Little combinations on under-protected pawns.Take 2, penalty of one tenth of a pawn.
14-10-31 fwi easyMove@lowCost2 diff
LLR: -2.96 (-2.94,2.94) [-1.50,4.50]
Total: 2631 W: 480 L: 580 D: 1571
sprt @ 15+0.05 th 1 Use SPSA optimised values.
14-10-29 fwi easyMove@lowCost2 diff
24576/10000 iterations
50000/50000 games played
50000 @ 15+0.05 th 1 Simplified. Check multipv at low depths and continue with multipv if evaluation difference to second best move is large enough. Tune what large enough means.. With ucioptions for spsa. Converted UCI Option values to double.
14-10-31 gli no_pv_pruning diff
ELO: -0.03 +-3.0 (95%) LOS: 49.1%
Total: 20000 W: 4002 L: 4004 D: 11994
20000 @ 15+0.05 th 1 Measure impact of not TT pruning in PV nodes
14-10-30 pro even_cluster_use diff
ELO: -1.11 +-3.0 (95%) LOS: 23.6%
Total: 20000 W: 3940 L: 4004 D: 12056
20000 @ 15+0.05 th 1 Make all entries in a TT cluster equally likely to get replaced
14-10-30 mco no_pawn_span^ diff
LLR: -0.63 (-2.94,2.94) [-3.00,1.00]
Total: 2493 W: 397 L: 434 D: 1662
sprt @ 60+0.05 th 1 LTC: Retire PawnSpan: take 2
14-10-30 sni pawns2 diff
LLR: 2.95 (-2.94,2.94) [-1.50,4.50]
Total: 37182 W: 7578 L: 7368 D: 22236
sprt @ 15+0.05 th 1 Doubled pawns on b and g files may be somewhat less handicapping, see Larry Kaufman’s article on http://home.comcast.net/~danheisman/Articles/doubled_pawns.htm
14-10-30 Fis TTtuneresult diff
LLR: -2.97 (-2.94,2.94) [-1.50,4.50]
Total: 2740 W: 504 L: 604 D: 1632
sprt @ 15+0.05 th 1 New tuned TT replace
14-10-29 kam spsa_passed_pawns diff
19509/30000 iterations
39454/60000 games played
60000 @ 15+0.05 th 1 Tune constants in formula for passed pawns bonus. c_end = 10000, r_end = 0.0001 for both parameters. Low priority.