Stockfish Testing Queue

Finished - 40699 tests

15-02-15 jki smp diff
ELO: -1.27 +-2.1 (95%) LOS: 12.3%
Total: 40000 W: 7829 L: 7975 D: 24196
40000 @ 15+0.05 th 1 smp improvement attempt (check regression: 1 thread)
15-02-15 jki smp diff
ELO: 0.28 +-2.9 (95%) LOS: 57.4%
Total: 20000 W: 3650 L: 3634 D: 12716
20000 @ 15+0.05 th 2 smp improvement attempt (check regression: 2 thread)
15-02-15 jki smp diff
ELO: 0.19 +-2.8 (95%) LOS: 55.3%
Total: 20000 W: 3440 L: 3429 D: 13131
20000 @ 15+0.05 th 4 smp improvement attempt (check regression: 4 thread)
15-02-15 jki smp diff
ELO: 6.19 +-3.9 (95%) LOS: 99.9%
Total: 10325 W: 1824 L: 1640 D: 6861
10000 @ 15+0.05 th 8 smp improvement attempt (check regression: 8 thread)
15-02-15 vin en_passant_bonus diff
ELO: -0.55 +-2.4 (95%) LOS: 32.9%
Total: 31000 W: 6085 L: 6134 D: 18781
30000 @ 15+0.05 th 1 Now that the pawn push activity has subsided, measure Elo of re-tuned values at STC.
15-02-15 mco SpaceThreshold diff
48367/50000 iterations
85524/100000 games played
100000 @ 15+0.05 th 1 Tune space evaluation threshold. I think it has never been properly tuned before. Take 3 (even wider changes and smaller range)
15-02-15 mco king8 diff
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 55077 W: 10999 L: 10940 D: 33138
sprt @ 15+0.05 th 1 Simplify attackUnits formula
15-02-15 vin en_passant_bonus diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 62686 W: 10413 L: 10287 D: 41986
sprt @ 60+0.05 th 1 LTC test of retuned values, after STC test inconclusive. As suggested by Joona: "I suggest to make a final conclusive test at LTC. Because this test has already passed once, I think you could use less strict bounds, like [0, 5] to reduce the risk of "unlucky run"."
15-02-16 Roc Battery diff
LLR: -2.95 (-2.94,2.94) [-1.50,4.50]
Total: 11479 W: 2223 L: 2298 D: 6958
sprt @ 15+0.05 th 1 This time, give a bonus only for squares which are in the opponent half of the board.
15-02-16 sni connected_pawns2 diff
LLR: 2.97 (-2.94,2.94) [-1.50,4.50]
Total: 52393 W: 10912 L: 10656 D: 30825
sprt @ 15+0.05 th 1 Try to create mobile phalanxes
15-02-16 sni connected_pawns2 diff
LLR: -0.31 (-2.94,2.94) [-1.50,4.50]
Total: 38196 W: 7859 L: 7763 D: 22574
sprt @ 15+0.05 th 1 Try to create mobile phalanxes. Take 2: with increased pawn mobility value
15-02-16 mco king8 diff
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 14751 W: 2530 L: 2400 D: 9821
sprt @ 60+0.05 th 1 LTC: Simplify attackUnits formula
15-02-16 mco SpaceThreshold diff
LLR: -2.95 (-2.94,2.94) [0.00,4.00]
Total: 32155 W: 6384 L: 6426 D: 19345
sprt @ 15+0.05 th 1 SpaceThreshold tuning verification
15-02-16 Roc Battery diff
LLR: -2.95 (-2.94,2.94) [-1.50,4.50]
Total: 3062 W: 552 L: 650 D: 1860
sprt @ 15+0.05 th 1 Consider only vertical batteries.
15-02-16 Roc Battery diff
LLR: -2.97 (-2.94,2.94) [-1.50,4.50]
Total: 8183 W: 1610 L: 1695 D: 4878
sprt @ 15+0.05 th 1 Consider only horz batteries in the opponent half of the board.
15-02-16 vin wedges diff
LLR: 2.96 (-2.94,2.94) [-1.50,4.50]
Total: 13056 W: 2681 L: 2538 D: 7837
sprt @ 15+0.05 th 1 Try bonus for an advanced cramping pawn on d5/e5/c6/d6/e6/f6 that cuts opponent's lines.
15-02-16 jos space_threshold diff
LLR: -2.96 (-2.94,2.94) [0.00,4.00]
Total: 77390 W: 15402 L: 15279 D: 46709
sprt @ 15+0.05 th 1 SPSA tuning try by Marco failed, now let's try CLOP value after 38k games.
15-02-17 Roc Battery diff
LLR: -2.95 (-2.94,2.94) [-1.50,4.50]
Total: 8215 W: 1602 L: 1686 D: 4927
sprt @ 15+0.05 th 1 Lower score, only horz battery.
15-02-17 Roc Battery diff
LLR: -2.96 (-2.94,2.94) [-1.50,4.50]
Total: 5684 W: 1049 L: 1140 D: 3495
sprt @ 15+0.05 th 1 Lower score, for file battery only.
15-02-17 vin wedges diff
LLR: -2.96 (-2.94,2.94) [0.00,6.00]
Total: 17010 W: 2787 L: 2809 D: 11414
sprt @ 60+0.05 th 1 Test passed at STC, so proceed to test at LTC to see if it scales as-is.
15-02-17 vin wedges diff
LLR: 2.96 (-2.94,2.94) [-1.50,4.50]
Total: 14827 W: 3028 L: 2880 D: 8919
sprt @ 15+0.05 th 1 Try variant of wedges idea, making levers and wedges exclusive, in case this is even better.
15-02-17 Roc PawnDefensePush diff
LLR: -2.95 (-2.94,2.94) [-1.50,4.50]
Total: 12161 W: 2385 L: 2458 D: 7318
sprt @ 15+0.05 th 1 There is a S(20,20) bonus if pawn can attack piece. About about a S(10,10) if pawn can defend a piece.
15-02-17 vin wedges diff
LLR: -2.96 (-2.94,2.94) [0.00,6.00]
Total: 41041 W: 6830 L: 6737 D: 27474
sprt @ 60+0.05 th 1 STC test of alternate version also passed, so try at LTC. Hopefully at least one of them will emerge as a better candidate.
15-02-17 jki smp5 diff
ELO: -6.01 +-5.0 (95%) LOS: 0.9%
Total: 6246 W: 1002 L: 1110 D: 4134
10000 @ 15+0.05 th 16 MAX_SLAVES_PER_SPLITPOINT = 5
15-02-17 jki smp3 diff
ELO: -15.72 +-6.0 (95%) LOS: 0.0%
Total: 4336 W: 635 L: 831 D: 2870
10000 @ 15+0.05 th 16 MAX_SLAVES_PER_SPLITPOINT = 3
15-02-18 sni connected_pawns2 diff
LLR: 2.96 (-2.94,2.94) [0.00,6.00]
Total: 30398 W: 5315 L: 5063 D: 20020
sprt @ 60+0.05 th 1 LTC: Try to create mobile phalanxes
15-02-18 sg pawn_attack_threat5 diff
ELO: -0.31 +-2.5 (95%) LOS: 40.2%
Total: 30000 W: 5957 L: 5984 D: 18059
30000 @ 15+0.05 th 1 Measure elo for tuned parameters on STC first. I expect no or little gain because parameters tuned on LTC and last tests show a strong TC dependency.
15-02-18 Roc PawnDefensePush diff
LLR: -2.95 (-2.94,2.94) [-1.50,4.50]
Total: 8658 W: 1706 L: 1789 D: 5163
sprt @ 15+0.05 th 1 Larger score S(15,15). Fixed base signature in the test submission.
15-02-18 vin wedges2 diff
LLR: -2.95 (-2.94,2.94) [-1.50,4.50]
Total: 15564 W: 3031 L: 3095 D: 9438
sprt @ 15+0.05 th 1 Rewrite to use a BB approach which allows finer control. Also try a rank-based bonus. If this STC is a regression from previous try, then we'll change the scores to match the previous try and use that as the base for tuning.
15-02-18 jki smpinf diff
ELO: -44.95 +-12.6 (95%) LOS: 0.0%
Total: 956 W: 99 L: 222 D: 635
10000 @ 15+0.05 th 16 MAX_SLAVES_PER_SPLITPOINT = 100
15-02-18 mco smp diff
ELO: -0.52 +-2.8 (95%) LOS: 36.0%
Total: 20000 W: 3489 L: 3519 D: 12992
20000 @ 15+0.05 th 4 Crash test for smp simplification patch
15-02-18 vin wedges_spsa diff
19762/20000 iterations
40000/40000 games played
40000 @ 15+0.05 th 1 Take the wedge scores from the middle run as these performed best, and use these as the start for SPSA run. Implementation is different but static eval (and so bench signature) is the same.
15-02-18 Roc PawnDefensePush diff
LLR: -2.95 (-2.94,2.94) [-1.50,4.50]
Total: 15086 W: 2991 L: 3056 D: 9039
sprt @ 15+0.05 th 1 S(5, 5). Fixed Git last commit
15-02-18 sni piece_support2 diff
LLR: -2.95 (-2.94,2.94) [-1.50,4.50]
Total: 14413 W: 2883 L: 2950 D: 8580
sprt @ 15+0.05 th 1 Implement Alain Savard's idea of a bonus for pawn pushes supporting one of our pieces (using a mask to restrict the area in the enemy camp)
15-02-19 lbr maxslaves diff
ELO: -180.32 +-46.6 (95%) LOS: 0.0%
Total: 130 W: 7 L: 69 D: 54
20000 @ 15+0.05 th 2 MaxSlavesPerSplitPoint= Threads / 2 (suggested by vincent)
15-02-19 lbr smp3 diff
LLR: -2.95 (-2.94,2.94) [0.00,6.00]
Total: 6769 W: 1109 L: 1179 D: 4481
sprt @ 15+0.05 th 4 MAX_SLAVES_PER_SPLITPOINT = 1+log2(Threads) for Threads=4. No change on 2 or 8 threads. Within error bar on 16. See if we get an elo gain on 4.
15-02-19 sg pawn_attack_threat5 diff
LLR: -2.97 (-2.94,2.94) [0.00,6.00]
Total: 29837 W: 4936 L: 4897 D: 20004
sprt @ 60+0.05 th 1 As expected the patch seems neutral at STC. Now test on LTC where the paramaters are tuned.
15-02-19 sni king_on_pieces diff
LLR: -2.95 (-2.94,2.94) [-1.50,4.50]
Total: 10386 W: 2037 L: 2115 D: 6234
sprt @ 15+0.05 th 1 Tweak KingOnOne and KingOnMany values
15-02-19 vin wedges diff
LLR: -2.96 (-2.94,2.94) [-1.50,4.50]
Total: 27744 W: 5455 L: 5486 D: 16803
sprt @ 15+0.05 th 1 STC test of SPSA tuned values. Since the values for the two ranks turned out to be virtually identical, go back to the simpler code format (which also yellowed at LTC and is therefore the most promising)
15-02-19 mco late_join diff
ELO: -0.76 +-2.8 (95%) LOS: 29.9%
Total: 20000 W: 3477 L: 3521 D: 13002
20000 @ 15+0.05 th 4 Use only 'level' as late join metric: quick test to get a rough idea if this could work.
15-02-19 jos passed_defdef diff
LLR: 2.94 (-2.94,2.94) [-1.50,4.50]
Total: 13771 W: 2758 L: 2615 D: 8398
sprt @ 15+0.05 th 1 Bonus for a passer which is supported by a pawn, which again is also defended by a pawn.
15-02-19 sg asp_window diff
LLR: -2.96 (-2.94,2.94) [0.00,4.00]
Total: 13774 W: 2671 L: 2781 D: 8322
sprt @ 15+0.05 th 1 Increase aspiration window on research by a constant(=4). So this is more like a tuning.
15-02-19 zar tune_double diff
LLR: -2.94 (-2.94,2.94) [0.00,4.00]
Total: 114045 W: 22830 L: 22571 D: 68644
sprt @ 15+0.05 th 1 Add penalty for doubled pawns in A and H files
15-02-19 SC ortho_threats diff
ELO: -23.35 +-4.5 (95%) LOS: 0.0%
Total: 10000 W: 1822 L: 2493 D: 5685
10000 @ 15+0.05 th 1 Orthogonality experiment, take 1. Replace threats evaluation by linear combination of king, passed_pawns and mobility. Coefficients obtained from bench. A quick check to see how much Elo is lost.
15-02-19 Roc CenterWedge diff
LLR: -4.10 (-2.94,2.94) [-1.50,4.50]
Total: 61556 W: 12085 L: 12065 D: 37406
sprt @ 15+0.05 th 1 Dbl supported wedge (or pawn in the centerbind). A bonus or a liability ? Try a S(5,5) bonus.
15-02-19 mco official diff
ELO: -1.78 +-4.0 (95%) LOS: 19.1%
Total: 9575 W: 1547 L: 1596 D: 6432
10000 @ 15+0.05 th 16 Regression test at 16 threads for the full smp series. It should be a non functional change, but becuase it is SMP stuff it is better to be safe.
15-02-19 mco late_join diff
ELO: 1.36 +-3.9 (95%) LOS: 75.0%
Total: 10000 W: 1690 L: 1651 D: 6659
10000 @ 15+0.05 th 16 Use only 'level' as late join metric: quick test to get a rough idea if this could work. RESCHEDULE with 16 threads (with 4 threads it seems ok).
15-02-20 vin wedges diff
LLR: -2.96 (-2.94,2.94) [-1.50,4.50]
Total: 9741 W: 1865 L: 1945 D: 5931
sprt @ 15+0.05 th 1 Ok, final try for this approach. Suitably enlightened about the limitations of SPSA.. since slightly increasing the bonus was bad, try slightly reducing.
15-02-20 jos passed_defdef diff
LLR: -2.95 (-2.94,2.94) [0.00,6.00]
Total: 5110 W: 809 L: 887 D: 3414
sprt @ 60+0.05 th 1 LTC: Bonus for a passer which is supported by a pawn, which again is also defended by a pawn.
15-02-20 sni mobility diff
LLR: -2.95 (-2.94,2.94) [-1.50,4.50]
Total: 18318 W: 3674 L: 3730 D: 10914
sprt @ 15+0.05 th 1 Bigger penalty for pieces with very bad mobility