Stockfish Testing Queue

Finished - 1741 tests

15-05-27 SC scale_regression_4 diff
ELO: -15.89 +-3.0 (95%) LOS: 0.0%
Total: 20000 W: 3488 L: 4402 D: 12110
20000 @ 10+0.05 th 1 Bugfix in scale_regression_4, now must give some meaningfull results. (I messed up in the python code which writes the regression coefficients).
15-05-27 SC scale_regression_4 diff
ELO: -315.35 +-116.1 (95%) LOS: 0.0%
Total: 50 W: 2 L: 38 D: 10
20000 @ 10+0.05 th 1 Final linear attempt at a regression model for scale factor. No sprt but fixed number of games.
15-05-25 SC noNullPawnEndings diff
LLR: -2.97 (-2.94,2.94) [-1.50,4.50]
Total: 19238 W: 2830 L: 2889 D: 13519
sprt @ 15+0.05 th 1 Only skip null-move verification in pawn endings and not only in KPs vs KX. Vaguely motivated by http://tests.stockfishchess.org/tests/view/55632d110ebc5940ca5d6dd1.
15-05-23 SC scale_regression_3 diff
ELO: -15.92 +-3.0 (95%) LOS: 0.0%
Total: 20000 W: 3558 L: 4474 D: 11968
20000 @ 10+0.05 th 1 scale_regression project, take 3: a linear model from a better set of features. StdErr down to 9.5.
15-05-21 SC scale_regression_2 diff
ELO: -17.11 +-2.9 (95%) LOS: 0.0%
Total: 20000 W: 3195 L: 4179 D: 12626
20000 @ 15+0.05 th 1 Analogously, check if new weigths after fix in regression code give some improvements.
15-05-21 SC score_evasions_8 diff
ELO: -1.09 +-3.1 (95%) LOS: 24.2%
Total: 20000 W: 4030 L: 4093 D: 11877
20000 @ 10+0.05 th 1 After a major improvement in my regression code, I think I have go much better weights than before. Short check that this is indeed the case.
15-05-21 SC score_captures_1 diff
LLR: -2.96 (-2.94,2.94) [-1.50,4.50]
Total: 5585 W: 1024 L: 1115 D: 3446
sprt @ 15+0.05 th 1 Last byproduct of regression code improvement.
15-05-12 SC score_evasions_7 diff
LLR: -0.85 (-2.94,2.94) [-3.00,1.00]
Total: 137944 W: 21553 L: 21807 D: 94584
sprt @ 60+0.05 th 1 Try different formulas for evasions scoring. Take 7: use regression formula based on current m.value(). LTC.
15-05-19 SC scale_regression diff
ELO: -58.91 +-3.2 (95%) LOS: 0.0%
Total: 20000 W: 2871 L: 6230 D: 10899
20000 @ 10+0.05 th 1 What would happens if we would throw to the dogs all the endgame knowledge for scaling factors and replace by a simple regression formula?
15-05-19 SC scale_oppbis_pieces diff
LLR: -2.97 (-2.94,2.94) [-1.50,4.50]
Total: 31379 W: 5888 L: 5911 D: 19580
sprt @ 15+0.05 th 1 Use pawns number also for scaling opposite bishop endings with other pieces.
15-05-19 SC scale_oppbis diff
LLR: -2.96 (-2.94,2.94) [-1.50,4.50]
Total: 15346 W: 2864 L: 2929 D: 9553
sprt @ 15+0.05 th 1 Scale more smoothly endings with opposite bishops, I messed up something in the previous patch.
15-05-18 SC no_tempo diff
LLR: -2.96 (-2.94,2.94) [-3.00,1.00]
Total: 4068 W: 680 L: 845 D: 2543
sprt @ 15+0.05 th 1 Is tempo worth anything?
15-05-16 SC tempo_can_castle diff
LLR: -2.97 (-2.94,2.94) [-1.50,4.50]
Total: 23798 W: 4489 L: 4532 D: 14777
sprt @ 15+0.05 th 1 Value tempo more if only one side can caste.
15-05-13 SC can_castle_eval diff
LLR: -2.97 (-2.94,2.94) [-1.50,4.50]
Total: 7282 W: 1355 L: 1442 D: 4485
sprt @ 15+0.05 th 1 Reward castling rights with a small bonus. Locally made the search from initial position much more stable. Bugfix.
15-05-13 SC can_castle_eval diff
LLR: 0.42 (-2.94,2.94) [-1.50,4.50]
Total: 3545 W: 662 L: 638 D: 2245
sprt @ 15+0.05 th 1 Reward castling rights with a small bonus. Locally made the search from initial position much more stable.
15-05-11 SC score_evasions_6 diff
ELO: -3.23 +-3.0 (95%) LOS: 1.9%
Total: 20000 W: 3906 L: 4092 D: 12002
20000 @ 10+0.05 th 1 Try different formulas for evasions scoring. Take 6: use regression formula instead of MVV/LVA for good captures.
15-05-11 SC score_evasions_7 diff
ELO: 2.02 +-3.1 (95%) LOS: 90.1%
Total: 20000 W: 4134 L: 4018 D: 11848
20000 @ 10+0.05 th 1 Try different formulas for evasions scoring. Take 7: use regression formula based on current m.value().
15-05-10 SC score_evasions diff
ELO: -10.90 +-3.1 (95%) LOS: 0.0%
Total: 20000 W: 3768 L: 4395 D: 11837
20000 @ 10+0.05 th 1 Try different formulas for evasions scoring. Take 5: decide whether a capture is bad based on magic formula rather than pos.see_sign(m). This is the most promising!
15-05-10 SC score_evasions diff
ELO: -42.56 +-3.1 (95%) LOS: 0.0%
Total: 20000 W: 2973 L: 5411 D: 11616
20000 @ 10+0.05 th 1 Try different formulas for evasions scoring. Take 4: never use pos.see(m) for captures.
15-05-08 SC MVV_evasions diff
ELO: -12.81 +-3.1 (95%) LOS: 0.0%
Total: 20000 W: 3684 L: 4421 D: 11895
20000 @ 10+0.05 th 1 Try different formulas for evasions scoring. Take 3: pos.see().
15-05-08 SC MVV_evasions diff
ELO: -16.38 +-3.1 (95%) LOS: 0.0%
Total: 20000 W: 3631 L: 4573 D: 11796
20000 @ 10+0.05 th 1 Try different formulas for evasions scoring. Take 2: history with LVA correction.
15-05-08 SC MVV_evasions diff
ELO: -11.24 +-3.1 (95%) LOS: 0.0%
Total: 20000 W: 3709 L: 4356 D: 11935
20000 @ 10+0.05 th 1 Try different formulas for evasions scoring. Take 1: same formula as for captures.
15-05-07 SC MVV_evasions diff
LLR: -2.96 (-2.94,2.94) [-3.00,1.00]
Total: 14748 W: 2809 L: 2996 D: 8943
sprt @ 15+0.05 th 1 1) In scoring evasions value of captured piece should not play any role, so MVV does not apply (you can always just capture one piece) 2) LVA is (according to Lucas) already taken into account by move generator and stable sorting. Maybe is possible to brutally simplify evasions scoring.
15-05-06 SC bestMoveChanges diff
LLR: -5.46 (-2.94,2.94) [-1.50,4.50]
Total: 34689 W: 6526 L: 6629 D: 21534
sprt @ 15+0.05 th 1 Use log2 instead of H of move count to increment besteMoveChanges. Results from tuning time management parameters.
15-05-06 SC MVV_rank_tuned diff
LLR: -2.96 (-2.94,2.94) [0.00,4.00]
Total: 29421 W: 5606 L: 5660 D: 18155
sprt @ 15+0.05 th 1 Higher values as indicated by SPSA (using a much higher values, since SPSA did not converge and I have other evidence indicating that something around 250 should be better).
15-05-04 SC MVV_rank_tuning diff
10310/10000 iterations
19648/20000 games played
20000 @ 30+0.05 th 1 Tune the rank penalty in MVV/rank for scoring captures.
15-04-27 SC bestMoveChangesTuning diff
39617/20000 iterations
71716/80000 games played
80000 @ 60+0.2 th 1 Given that the manually tuned bestMoveChanges performed much better than the trivial one and the BMCtime is not looking really promising, I'll give a try at tuning per SPSA. Using nodestime as specified in the guidelines, and using a longer tc in the hope to be more sensitive on time management. (Both as in the previous two patches).
15-05-05 SC see_depth diff
LLR: -2.95 (-2.94,2.94) [-3.00,1.00]
Total: 3059 W: 506 L: 670 D: 1883
sprt @ 15+0.05 th 1 Retire see_sign, take 2. This time non functionally, with incosistent speed changes.
15-05-04 SC statistical_see_sign diff
LLR: -2.95 (-2.94,2.94) [-1.50,4.50]
Total: 9596 W: 1789 L: 1869 D: 5938
sprt @ 15+0.05 th 1 statistical_see was something like -250 ELO. Is there a chance for statistical_see_sign?
15-05-04 SC see_depth diff
LLR: -2.95 (-2.94,2.94) [-3.00,1.00]
Total: 639 W: 72 L: 236 D: 331
sprt @ 15+0.05 th 1 Try to retire see_sign by adding a depth argument to the signature of see, at which the swap algorithm is stopped. If see_sign(m) >= 0 then see(m, 2) would be also >= 0 but the inverse is not true. Let's see how far this go. One could then tune the swap depth for the specific tuning purposes.
15-05-03 SC statistical_see diff
LLR: -2.96 (-2.94,2.94) [-3.00,1.00]
Total: 210 W: 3 L: 159 D: 48
sprt @ 15+0.05 th 1 Replace Position::see() and Position::see_sign() by a statistical formula in Position::see(). This is quite far fetched, but one never knows. I was in particular surprised than nps went considerably down.
15-04-30 SC MVV_MAV diff
LLR: 2.95 (-2.94,2.94) [0.00,6.00]
Total: 25770 W: 4184 L: 3964 D: 17622
sprt @ 60+0.05 th 1 On bench PieceValue - 200* relativeRank is a better approximation of pos.see than PieceValue alone (Pearson correlation from 0.55 to 0.59). See whether it is enough to beat MVV.
15-04-29 SC MVV_MAV diff
LLR: 2.95 (-2.94,2.94) [-1.50,4.50]
Total: 4632 W: 945 L: 827 D: 2860
sprt @ 15+0.05 th 1 On bench PieceValue - 200* relativeRank is a better approximation of pos.see than PieceValue alone (Pearson correlation from 0.55 to 0.59). See whether it is enough to beat MVV.
15-04-21 SC pieceValuesMP_simple2 diff
LLR: -1.91 (-2.94,2.94) [-3.00,1.00]
Total: 50948 W: 8027 L: 8204 D: 34717
sprt @ 60+0.05 th 1 As Joona pointed out: MVV/LVA also aims to define exactly the ordering in which captures are searched, which is left open by MVV only. Try a more compact implementation of MVV/LVA. LTC.
14-12-16 SC specKPPKPPeval diff
ELO: 0.89 +-3.1 (95%) LOS: 71.5%
Total: 20000 W: 4075 L: 4024 D: 11901
20000 @ 15+0.05 th 1 In nonsymmetric KPP KPP endgames only reward advanced pawns. Showed some effect in superfast endgames. (And thank you to Arjun for the patience)
14-12-17 SC specKPPKPPeval diff
LLR: 3.46 (-2.94,2.94) [-1.50,4.50]
Total: 35549 W: 7328 L: 7103 D: 21118
sprt @ 15+0.05 th 1 SPRT for a corrected version of KPP vs KPP eval patch. Previous version did not detect symmetric positions correctly.
14-12-18 SC specKPPKPPeval diff
LLR: -3.73 (-2.94,2.94) [0.00,6.00]
Total: 11768 W: 1949 L: 2022 D: 7797
sprt @ 60+0.05 th 1 SPRT for a corrected version of KPP vs KPP eval patch. Previous version did not detect symmetric positions correctly.
14-12-18 SC streamline_KPPKPP diff
23081/30000 iterations
41650/60000 games played
60000 @ 5+0.1 th 1 KPPKPP eval patch passed STC and failed LTC. Retry to tune a streamlined patch with a endgame-oriented tc.
14-12-19 SC streamline_KPPKPP diff
112202/50000 iterations
127275/300000 games played
300000 @ 5+0.25 th 1 Rescheduling of SPSA tuning of KPP vs KPP, this time with correct base branch. - 100000 games at priority -2 - tc 5+0.25 (endgame oriented) - starting values from previous SPSA run
14-12-21 SC streamline_KPPKPP diff
LLR: -2.94 (-2.94,2.94) [-1.50,4.50]
Total: 8923 W: 1762 L: 1844 D: 5317
sprt @ 15+0.05 th 1 Retry SPRT after moving from a formula to a table and after a long SPSA session. I also manually verified that for the (not really critical, but exemplary) position fen 8/2k1n3/1p1p4/6K1/2B1PP2/8/8/8 w - - the master also chooses f4f5, only at much higher depth than the patch.
14-12-22 SC king_distance_KPPKPP diff
LLR: -0.10 (-2.94,2.94) [-1.50,4.50]
Total: 183 W: 36 L: 39 D: 108
sprt @ 15+0.05 th 1 Now taking into account also minimal king distance in evaluating the endgame. Parameter tuning taken over from previous SPSA run, let us see how good it is. I rechecked my test signature and I've go exactly the same bench as before.
14-12-22 SC king_distance_KPPKPP diff
LLR: -1.11 (-2.94,2.94) [-1.50,4.50]
Total: 3259 W: 615 L: 646 D: 1998
sprt @ 15+0.05 th 1 Now taking into account also minimal king distance in evaluating the endgame. Parameter tuning taken over from previous SPSA run, let us see how good it is. I rechecked my test signature and I've go exactly the same bench as before.
14-12-23 SC king_distance_KPPKPP diff
LLR: -0.06 (-2.94,2.94) [-1.50,4.50]
Total: 26 W: 5 L: 7 D: 14
sprt @ 15+0.05 th 1 KPP KPP with king distance. Fix of non initialized variables to avoid signature errors. No real tuning, only some bench positions tested.
14-12-24 SC king_distance_KPPKPP diff
LLR: -0.49 (-2.94,2.94) [-1.50,4.50]
Total: 117 W: 13 L: 30 D: 74
sprt @ 15+0.05 th 1 King distance in KPP KPP, further try to resolve signature error problems.
14-12-25 SC king_distance_KPPKPP diff
LLR: 0.11 (-2.94,2.94) [-1.50,4.50]
Total: 977 W: 208 L: 201 D: 568
sprt @ 15+0.05 th 1 King distance KPP KPP. Some minor ambiguities in code removed (double init of a variable). Signature error was seemingly caused by wrong assert spotted by Joerg Oster.
14-12-30 SC king_distance_KPPKPP diff
LLR: -2.95 (-2.94,2.94) [-1.50,4.50]
Total: 10397 W: 2028 L: 2106 D: 6263
sprt @ 15+0.05 th 1 Local tuning function for position benches changed to something more stable (verified that it behaves well when comparing with older SF versions). Further parameter tweak.
15-01-01 SC king_distance_KPPKPP diff
LLR: -2.97 (-2.94,2.94) [-1.50,4.50]
Total: 9971 W: 1997 L: 2077 D: 5897
sprt @ 15+0.05 th 1 Improved version of KPP KPP. Bench is now ca. 1% faster than base (previously 2% slower) and parameters have been further improved. Take 1 with tuned values.
15-01-01 SC king_distance_KPPKPP_2 diff
LLR: -3.52 (-2.94,2.94) [-1.50,4.50]
Total: 10128 W: 2077 L: 2177 D: 5874
sprt @ 15+0.05 th 1 Improved KPP KPP, take 2. More moderate scaling and lower bonus for tempo.
15-01-01 SC king_distance_KPPKPP_4 diff
LLR: -2.95 (-2.94,2.94) [-1.50,4.50]
Total: 12526 W: 2498 L: 2570 D: 7458
sprt @ 15+0.05 th 1 Improved KPP KPP, take 3 with very low tempo bonus.
15-01-07 SC manual_KPP diff
LLR: -3.12 (-2.94,2.94) [-1.50,4.50]
Total: 29312 W: 5843 L: 5875 D: 17594
sprt @ 15+0.05 th 1 It seems that bench is dependent on whether I am relying on specialized endgame implementation. Try this one withouth resorting to the infrastructure.