Stockfish Testing Queue

Finished - 49059 tests

15-02-14 jki pawnmob diff
LLR: -2.96 (-2.94,2.94) [-1.50,4.50]
Total: 6843 W: 1387 L: 1476 D: 3980
sprt @ 15+0.05 th 1 Safe pawn push tweak try
15-02-14 sni any_safe_push3 diff
LLR: -2.96 (-2.94,2.94) [-1.50,4.50]
Total: 7697 W: 1532 L: 1618 D: 4547
sprt @ 15+0.05 th 1 Increase the bonus for safe pawn pushes in endgame
15-02-14 mco SpaceThreshold diff
5171/50000 iterations
10509/100000 games played
100000 @ 15+0.05 th 1 Tune space evaluation threshold. I think it has never been properly tuned before.
15-02-14 jki smp diff
ELO: 50.50 +-8.1 (95%) LOS: 100.0%
Total: 2591 W: 667 L: 293 D: 1631
5000 @ 15+0.05 th 16 smp improvement attempt (16 threads)
15-02-14 jki pmob diff
LLR: -0.36 (-2.94,2.94) [-3.00,3.00]
Total: 13242 W: 2602 L: 2615 D: 8025
sprt @ 15+0.05 th 1 Remove piece checks for safe pawn pushes. sprt [-3, 3]
15-02-14 mco SpaceThreshold diff
21048/50000 iterations
43140/100000 games played
100000 @ 15+0.05 th 1 Tune space evaluation threshold. I think it has never been properly tuned before. Take 2 (wider changes)
15-02-14 jki pmob diff
LLR: -3.85 (-2.94,2.94) [-3.00,1.00]
Total: 33801 W: 6682 L: 6954 D: 20165
sprt @ 15+0.05 th 1 Remove piece checks for safe pawn pushes. No regression test.
15-02-15 Roc Battery diff
LLR: -2.95 (-2.94,2.94) [-1.50,4.50]
Total: 7526 W: 1473 L: 1559 D: 4494
sprt @ 15+0.05 th 1 Another attempt at the Q-> R battery idea.
15-02-15 jki space diff
LLR: -3.92 (-2.94,2.94) [-1.50,4.50]
Total: 18337 W: 3718 L: 3809 D: 10810
sprt @ 15+0.05 th 1 space try inspired by Lyudmil
15-02-15 Roc MoreDblAttacks diff
LLR: -3.20 (-2.94,2.94) [0.00,4.00]
Total: 38342 W: 7572 L: 7605 D: 23165
sprt @ 15+0.05 th 1 Back to original S(16, 0), Verifying additional binds on c6, f6 and b7 g7 since SPSA tuning showed singularities on those squares,
15-02-15 jki smp diff
ELO: -1.27 +-2.1 (95%) LOS: 12.3%
Total: 40000 W: 7829 L: 7975 D: 24196
40000 @ 15+0.05 th 1 smp improvement attempt (check regression: 1 thread)
15-02-15 jki smp diff
ELO: 0.28 +-2.9 (95%) LOS: 57.4%
Total: 20000 W: 3650 L: 3634 D: 12716
20000 @ 15+0.05 th 2 smp improvement attempt (check regression: 2 thread)
15-02-15 jki smp diff
ELO: 0.19 +-2.8 (95%) LOS: 55.3%
Total: 20000 W: 3440 L: 3429 D: 13131
20000 @ 15+0.05 th 4 smp improvement attempt (check regression: 4 thread)
15-02-15 jki smp diff
ELO: 6.19 +-3.9 (95%) LOS: 99.9%
Total: 10325 W: 1824 L: 1640 D: 6861
10000 @ 15+0.05 th 8 smp improvement attempt (check regression: 8 thread)
15-02-15 vin en_passant_bonus diff
ELO: -0.55 +-2.4 (95%) LOS: 32.9%
Total: 31000 W: 6085 L: 6134 D: 18781
30000 @ 15+0.05 th 1 Now that the pawn push activity has subsided, measure Elo of re-tuned values at STC.
15-02-15 mco SpaceThreshold diff
48367/50000 iterations
85524/100000 games played
100000 @ 15+0.05 th 1 Tune space evaluation threshold. I think it has never been properly tuned before. Take 3 (even wider changes and smaller range)
15-02-15 mco king8 diff
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 55077 W: 10999 L: 10940 D: 33138
sprt @ 15+0.05 th 1 Simplify attackUnits formula
15-02-15 vin en_passant_bonus diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 62686 W: 10413 L: 10287 D: 41986
sprt @ 60+0.05 th 1 LTC test of retuned values, after STC test inconclusive. As suggested by Joona: "I suggest to make a final conclusive test at LTC. Because this test has already passed once, I think you could use less strict bounds, like [0, 5] to reduce the risk of "unlucky run"."
15-02-16 Roc Battery diff
LLR: -2.95 (-2.94,2.94) [-1.50,4.50]
Total: 11479 W: 2223 L: 2298 D: 6958
sprt @ 15+0.05 th 1 This time, give a bonus only for squares which are in the opponent half of the board.
15-02-16 sni connected_pawns2 diff
LLR: 2.97 (-2.94,2.94) [-1.50,4.50]
Total: 52393 W: 10912 L: 10656 D: 30825
sprt @ 15+0.05 th 1 Try to create mobile phalanxes
15-02-16 sni connected_pawns2 diff
LLR: -0.31 (-2.94,2.94) [-1.50,4.50]
Total: 38196 W: 7859 L: 7763 D: 22574
sprt @ 15+0.05 th 1 Try to create mobile phalanxes. Take 2: with increased pawn mobility value
15-02-16 mco king8 diff
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 14751 W: 2530 L: 2400 D: 9821
sprt @ 60+0.05 th 1 LTC: Simplify attackUnits formula
15-02-16 mco SpaceThreshold diff
LLR: -2.95 (-2.94,2.94) [0.00,4.00]
Total: 32155 W: 6384 L: 6426 D: 19345
sprt @ 15+0.05 th 1 SpaceThreshold tuning verification
15-02-16 Roc Battery diff
LLR: -2.95 (-2.94,2.94) [-1.50,4.50]
Total: 3062 W: 552 L: 650 D: 1860
sprt @ 15+0.05 th 1 Consider only vertical batteries.
15-02-16 Roc Battery diff
LLR: -2.97 (-2.94,2.94) [-1.50,4.50]
Total: 8183 W: 1610 L: 1695 D: 4878
sprt @ 15+0.05 th 1 Consider only horz batteries in the opponent half of the board.
15-02-16 vin wedges diff
LLR: 2.96 (-2.94,2.94) [-1.50,4.50]
Total: 13056 W: 2681 L: 2538 D: 7837
sprt @ 15+0.05 th 1 Try bonus for an advanced cramping pawn on d5/e5/c6/d6/e6/f6 that cuts opponent's lines.
15-02-16 jos space_threshold diff
LLR: -2.96 (-2.94,2.94) [0.00,4.00]
Total: 77390 W: 15402 L: 15279 D: 46709
sprt @ 15+0.05 th 1 SPSA tuning try by Marco failed, now let's try CLOP value after 38k games.
15-02-17 Roc Battery diff
LLR: -2.95 (-2.94,2.94) [-1.50,4.50]
Total: 8215 W: 1602 L: 1686 D: 4927
sprt @ 15+0.05 th 1 Lower score, only horz battery.
15-02-17 Roc Battery diff
LLR: -2.96 (-2.94,2.94) [-1.50,4.50]
Total: 5684 W: 1049 L: 1140 D: 3495
sprt @ 15+0.05 th 1 Lower score, for file battery only.
15-02-17 vin wedges diff
LLR: -2.96 (-2.94,2.94) [0.00,6.00]
Total: 17010 W: 2787 L: 2809 D: 11414
sprt @ 60+0.05 th 1 Test passed at STC, so proceed to test at LTC to see if it scales as-is.
15-02-17 vin wedges diff
LLR: 2.96 (-2.94,2.94) [-1.50,4.50]
Total: 14827 W: 3028 L: 2880 D: 8919
sprt @ 15+0.05 th 1 Try variant of wedges idea, making levers and wedges exclusive, in case this is even better.
15-02-17 Roc PawnDefensePush diff
LLR: -2.95 (-2.94,2.94) [-1.50,4.50]
Total: 12161 W: 2385 L: 2458 D: 7318
sprt @ 15+0.05 th 1 There is a S(20,20) bonus if pawn can attack piece. About about a S(10,10) if pawn can defend a piece.
15-02-17 vin wedges diff
LLR: -2.96 (-2.94,2.94) [0.00,6.00]
Total: 41041 W: 6830 L: 6737 D: 27474
sprt @ 60+0.05 th 1 STC test of alternate version also passed, so try at LTC. Hopefully at least one of them will emerge as a better candidate.
15-02-17 jki smp5 diff
ELO: -6.01 +-5.0 (95%) LOS: 0.9%
Total: 6246 W: 1002 L: 1110 D: 4134
10000 @ 15+0.05 th 16 MAX_SLAVES_PER_SPLITPOINT = 5
15-02-17 jki smp3 diff
ELO: -15.72 +-6.0 (95%) LOS: 0.0%
Total: 4336 W: 635 L: 831 D: 2870
10000 @ 15+0.05 th 16 MAX_SLAVES_PER_SPLITPOINT = 3
15-02-18 sni connected_pawns2 diff
LLR: 2.96 (-2.94,2.94) [0.00,6.00]
Total: 30398 W: 5315 L: 5063 D: 20020
sprt @ 60+0.05 th 1 LTC: Try to create mobile phalanxes
15-02-18 sg pawn_attack_threat5 diff
ELO: -0.31 +-2.5 (95%) LOS: 40.2%
Total: 30000 W: 5957 L: 5984 D: 18059
30000 @ 15+0.05 th 1 Measure elo for tuned parameters on STC first. I expect no or little gain because parameters tuned on LTC and last tests show a strong TC dependency.
15-02-18 Roc PawnDefensePush diff
LLR: -2.95 (-2.94,2.94) [-1.50,4.50]
Total: 8658 W: 1706 L: 1789 D: 5163
sprt @ 15+0.05 th 1 Larger score S(15,15). Fixed base signature in the test submission.
15-02-18 vin wedges2 diff
LLR: -2.95 (-2.94,2.94) [-1.50,4.50]
Total: 15564 W: 3031 L: 3095 D: 9438
sprt @ 15+0.05 th 1 Rewrite to use a BB approach which allows finer control. Also try a rank-based bonus. If this STC is a regression from previous try, then we'll change the scores to match the previous try and use that as the base for tuning.
15-02-18 jki smpinf diff
ELO: -44.95 +-12.6 (95%) LOS: 0.0%
Total: 956 W: 99 L: 222 D: 635
10000 @ 15+0.05 th 16 MAX_SLAVES_PER_SPLITPOINT = 100
15-02-18 mco smp diff
ELO: -0.52 +-2.8 (95%) LOS: 36.0%
Total: 20000 W: 3489 L: 3519 D: 12992
20000 @ 15+0.05 th 4 Crash test for smp simplification patch
15-02-18 vin wedges_spsa diff
19762/20000 iterations
40000/40000 games played
40000 @ 15+0.05 th 1 Take the wedge scores from the middle run as these performed best, and use these as the start for SPSA run. Implementation is different but static eval (and so bench signature) is the same.
15-02-18 Roc PawnDefensePush diff
LLR: -2.95 (-2.94,2.94) [-1.50,4.50]
Total: 15086 W: 2991 L: 3056 D: 9039
sprt @ 15+0.05 th 1 S(5, 5). Fixed Git last commit
15-02-18 sni piece_support2 diff
LLR: -2.95 (-2.94,2.94) [-1.50,4.50]
Total: 14413 W: 2883 L: 2950 D: 8580
sprt @ 15+0.05 th 1 Implement Alain Savard's idea of a bonus for pawn pushes supporting one of our pieces (using a mask to restrict the area in the enemy camp)
15-02-19 lbr maxslaves diff
ELO: -180.32 +-46.6 (95%) LOS: 0.0%
Total: 130 W: 7 L: 69 D: 54
20000 @ 15+0.05 th 2 MaxSlavesPerSplitPoint= Threads / 2 (suggested by vincent)
15-02-19 lbr smp3 diff
LLR: -2.95 (-2.94,2.94) [0.00,6.00]
Total: 6769 W: 1109 L: 1179 D: 4481
sprt @ 15+0.05 th 4 MAX_SLAVES_PER_SPLITPOINT = 1+log2(Threads) for Threads=4. No change on 2 or 8 threads. Within error bar on 16. See if we get an elo gain on 4.
15-02-19 sg pawn_attack_threat5 diff
LLR: -2.97 (-2.94,2.94) [0.00,6.00]
Total: 29837 W: 4936 L: 4897 D: 20004
sprt @ 60+0.05 th 1 As expected the patch seems neutral at STC. Now test on LTC where the paramaters are tuned.
15-02-19 sni king_on_pieces diff
LLR: -2.95 (-2.94,2.94) [-1.50,4.50]
Total: 10386 W: 2037 L: 2115 D: 6234
sprt @ 15+0.05 th 1 Tweak KingOnOne and KingOnMany values
15-02-19 vin wedges diff
LLR: -2.96 (-2.94,2.94) [-1.50,4.50]
Total: 27744 W: 5455 L: 5486 D: 16803
sprt @ 15+0.05 th 1 STC test of SPSA tuned values. Since the values for the two ranks turned out to be virtually identical, go back to the simpler code format (which also yellowed at LTC and is therefore the most promising)
15-02-19 mco late_join diff
ELO: -0.76 +-2.8 (95%) LOS: 29.9%
Total: 20000 W: 3477 L: 3521 D: 13002
20000 @ 15+0.05 th 4 Use only 'level' as late join metric: quick test to get a rough idea if this could work.