Stockfish Testing Queue

Finished - 977 tests

18-05-12 31m pin_pressure diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 14119 W: 2837 L: 2900 D: 8382
sprt @ 10+0.1 th 1 Tuned values. S(3, -1) is a small change to a one-time bonus, but changes of S(5, 0) have had a reasonably large effect in past tests of this idea, and the pre-tuning values were already close to passing (67K yellow). This could be a lengthy test.
18-05-12 31m hanging_nonlinear diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 13654 W: 2796 L: 2861 D: 7997
sprt @ 10+0.1 th 1 Bryan just suggested on the forum that the hanging penalty should be larger than simply scaling linearly with the number of hanging pieces. It seems like a simple quadratic would be a good first try. 2^x - 1 also seems like an interesting candidate, but not as rapid in growth as this quadratic over the typical range of values.
18-05-12 31m tune_pin_pressure diff
23243/25000 iterations
48347/50000 games played
50000 @ 20+0.2 th 1 My attempts to find the last few tenths of an Elo point that this patch needs by altering its logic have failed thus far. Try to refine the bonus with SPSA, as it was not carefully tuned previously. If this tuning session is not making progress and needs to be restarted with larger ck, I may not be available to do so--I don't mind if somebody else does (thanks!).
18-05-12 31m pin_pressure diff
LLR: -2.94 (-2.94,2.94) [0.00,5.00]
Total: 13442 W: 2651 L: 2717 D: 8074
sprt @ 10+0.1 th 1 Give an extra S(5, 0) bonus on top of PinPressure (or S(20, 5) total) if one of the defenders is the enemy king. Having lots of tension against a pinned piece implies a potential attack, even more so if the enemy king is closely involved. It is worth noting that in the overload_pinned tests, the king-is-defender-only test was +0.62 Elo, compared to the unrestricted test's +0.99 Elo--there's reason to suspect that this case is the most important.
18-05-12 31m pin_pressure diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 9476 W: 1872 L: 1957 D: 5647
sprt @ 10+0.1 th 1 It appears that the attackedBy[Them][KING] case represents about half of the positions in which this bonus applies, based on dbg_mean_of() and the bench positions. Therefore, these values should maintain the average bonus from the 67K yellow test, but further test the logic in the last test I submitted. PinPressure = S(12, 5); PinPressureK = S(6, 0).
18-05-12 31m pin_pressure diff
LLR: -2.97 (-2.94,2.94) [0.00,5.00]
Total: 45054 W: 9225 L: 9144 D: 26685
sprt @ 10+0.1 th 1 Restore the version that failed yellow in 67K games and narrow its scope: exclude pins that are pawn-defended but not pawn-attacked. If this fails and I have no other ideas, I'll try to tune the best version's bonus with SPSA. (Tuning by submitting multiple SPRTs isn't feasible when each is expected to take 60K or more games.)
18-05-12 31m pin_pressure diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 34365 W: 6914 L: 6884 D: 20567
sprt @ 10+0.1 th 1 Give an extra S(5, 0) bonus on top of PinPressure (or S(20, 5) total) if one of the defenders is the enemy queen.
18-05-11 31m pin_pressure diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 67290 W: 13659 L: 13476 D: 40155
sprt @ 10+0.1 th 1 Try the best version so far, but with pawns also included (i.e., pos.pieces(Them) instead of nonPawnEnemies).
18-05-11 31m pin_pressure diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 44155 W: 8878 L: 8803 D: 26474
sprt @ 10+0.1 th 1 The current test appears to be right on the threshold of passing: at the time of writing, LLR = 1.41 after 48K games. Perhaps this can make the difference. A subtle change: move the code outside of if (defended | weak), which changes the bench. I think the main effect here is to include pawn-defended pawns.
18-05-11 31m pin_pressure diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 4245 W: 786 L: 895 D: 2564
sprt @ 10+0.1 th 1 A further change: include queen-pinned pieces. This can be done without adding an expensive call to slider_blockers, by rewriting the patch as a penalty for pressure against our own pins (as opposed to a bonus for pressure against the enemy) and initializing a field to hold the queen blockers, used both here and for WeakQueen. See code for details.
18-05-10 31m pin_pressure^^^ diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 41214 W: 8299 L: 8237 D: 24678
sprt @ 10+0.1 th 1 Rescheduled because I forgot to remove a condition. The overload_pinned tests appear to show promise, but maybe Elo can be gained by applying some Score other than Overload = S(10, 5). Based on @snicolet's test, use if (b) to avoid adding a popcount, and quickly search the parameter space by scheduling 4 tests, tweaking the mg/eg components by +/- 5. Here, add 5 to mg: S(15, 5).
18-05-11 31m pin_pressure diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 29913 W: 5971 L: 5962 D: 17980
sprt @ 10+0.1 th 1 Increase even further: S(25, 5).
18-05-11 31m pin_pressure diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 26555 W: 5402 L: 5407 D: 15746
sprt @ 10+0.1 th 1 S(15, 5) seems to be the best value so far, but it may not be enough. Try to merge @snicolet's innovation into my approach: verify that the pinning piece is not attacked.
18-05-11 31m pin_pressure diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 6456 W: 1257 L: 1356 D: 3843
sprt @ 10+0.1 th 1 After increasing the mg bonus by 5 cp, resulting in S(15, 5), there appears to be substantial improvement: If the current result holds, +0.5 Elo over S(10, 5) (which was already +1 Elo over master). Let's see if increasing further results in a green: S(20, 5).
18-05-10 31m xray_pinned diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 12557 W: 2494 L: 2564 D: 7499
sprt @ 10+0.1 th 1 I'm grateful to Rocky for noticing that my last test was inadvertently a duplicate of Torfranz's earlier test. Here, try making all king blockers opaque to x-ray attacks, regardless of which side's king it is.
18-05-10 31m pin_pressure diff
LLR: -2.97 (-2.94,2.94) [0.00,5.00]
Total: 23162 W: 4632 L: 4654 D: 13876
sprt @ 10+0.1 th 1 S(10, 0).
18-05-10 31m xray_pinned diff
LLR: 0.07 (-2.94,2.94) [0.00,5.00]
Total: 26 W: 7 L: 4 D: 15
sprt @ 10+0.1 th 1 I apologize if this has been attempted before or has a conceptual error; it seems too intuitively simple. A more drastic change than Rocky's patch, based on Bryan's idea: make all pinned pieces, including pinned queens, opaque for friendly x-ray attacks. (I don't understand why we would only do this for rooks, not queens.)
18-05-10 31m pin_pressure^ diff
LLR: -2.94 (-2.94,2.94) [0.00,5.00]
Total: 15821 W: 3142 L: 3197 D: 9482
sprt @ 10+0.1 th 1 S(10, 10).
18-05-10 31m pin_pressure^^ diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 12714 W: 2487 L: 2557 D: 7670
sprt @ 10+0.1 th 1 S(5, 5).
18-05-10 31m overload_pinned diff
LLR: -2.94 (-2.94,2.94) [0.00,5.00]
Total: 39150 W: 7978 L: 7924 D: 23248
sprt @ 10+0.1 th 1 I thought I was running out of ideas for tweaks to Overload, but Bryan just left a very insightful comment in the forum which gave me an idea. Conceptually, the positions he describes seem to fit well with overloading. Include in Overload all non-pawn, pinned-to-king pieces which are attacked multiple times by both sides. This is not conceptually the exact same as in Bryan's post--Bryan discusses the case with exactly 2 attackers/defenders, whereas this is at least 2--but code-wise I think accessing which squares are attacked/defended exactly twice requires more significant changes, so I would prefer to try this simpler modification first. This also does not evaluate whether the king can break the pin without breaking its defense, for the same reason.
18-05-10 31m overload_pinned diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 29707 W: 5961 L: 5952 D: 17794
sprt @ 10+0.1 th 1 Take 2: Further require that the enemy king be one of the defenders. The previous patch allows, for example, a c3-knight pinned by a b4-bishop to an e1-king to qualify (assuming the other conditions are met).
18-05-10 31m overload_promotions^ diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 19921 W: 4046 L: 4082 D: 11793
sprt @ 10+0.1 th 1 I think I found bugs in my previous tests--my apologies. Try this bugfix version of take 2.
18-05-10 31m overload_promotions diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 9125 W: 1820 L: 1906 D: 5399
sprt @ 10+0.1 th 1 I think I found bugs in my previous tests--my apologies. Try this bugfix version of take 1.
18-05-09 31m overload_promotions_sep diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 21358 W: 4264 L: 4294 D: 12800
sprt @ 10+0.1 th 1 Perhaps a bonus of 20 centipawns was too much. Try S(10, 10).
18-05-09 31m overload_promotions_sep diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 20951 W: 4264 L: 4295 D: 12392
sprt @ 10+0.1 th 1 The overload_promotions test appears to show some promise. Perhaps a larger score should be applied to these threats--but is it worth the cost of an extra popcount? Split the two cases and apply double bonus--S(20, 10)--to the promotion threats overload.
18-05-09 31m overload_promotions diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 36203 W: 7326 L: 7287 D: 21590
sprt @ 10+0.1 th 1 Take 2: Add some squares to Overload based on promotion threats: enemy pieces which are blocking the immediate promotion of our pawns, on 1st or 8th rank squares which are otherwise undefended.
18-05-09 31m overload_pawns2^ diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 24040 W: 4833 L: 4850 D: 14357
sprt @ 10+0.1 th 1 An idea: limiting the cases where this is applied even further. Beyond just applying the bonus in endgames, apply it only in endgames where each side has exactly one non-pawn piece. S(0, 5).
18-05-09 31m overload_pawns2 diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 10473 W: 2100 L: 2180 D: 6193
sprt @ 10+0.1 th 1 Same, but S(0, 10).
18-05-09 31m overload_promotions diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 8172 W: 1584 L: 1675 D: 4913
sprt @ 10+0.1 th 1 First attempt: include our promotion threats in Overload.
18-05-08 31m overload_complexity^ diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 25533 W: 5156 L: 5166 D: 15211
sprt @ 10+0.1 th 1 It seems strange that considering the defender-only overload was worse than the sum of both sides...perhaps the attacker's overloading is the important consideration. 8 * Overloading[Attacker].
18-05-08 31m overload_complexity diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 10401 W: 2054 L: 2134 D: 6213
sprt @ 10+0.1 th 1 12 * Overloading[Attacker].
18-05-07 31m overload_complexity diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 14842 W: 2945 L: 3005 D: 8892
sprt @ 10+0.1 th 1 12 times the overloading of the defender: 12 * (eg > 0 ? Overloading[BLACK] : Overloading[WHITE]) - 138.
18-05-07 31m overload_complexity^ diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 9898 W: 1958 L: 2041 D: 5899
sprt @ 10+0.1 th 1 Running out of ideas...maybe I was too quick to discard this approach. 8 times the overloading of the defender (ignore that of the attacker).
18-05-07 31m overload_complexity diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 15271 W: 3053 L: 3111 D: 9107
sprt @ 10+0.1 th 1 A small tweak to the 39K yellow (+1.01 Elo), trying to get the small amount of Elo needed to make this green. 11 * (Overloading[WHITE] + Overloading[BLACK]) - 139.
18-05-06 31m combo_TOQ_shelter diff
LLR: -2.95 (-2.94,2.94) [0.00,4.00]
Total: 18717 W: 3697 L: 3788 D: 11232
sprt @ 10+0.1 th 1 I continue to watch fishtest for parameter tweaks that pass STC but fail yellow after many games at LTC, to combo with my previous threat-on-queen tweak that did the same. Try to combo TOQ with @snicolet's shelter tweak.
18-05-06 31m overload_complexity diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 20417 W: 4213 L: 4246 D: 11958
sprt @ 10+0.1 th 1 Forgot to change the bench from my last test. My apologies for the mistakes; this should work now. Continuing to search for an optimum. 10 * (Overloading[WHITE] + Overloading[BLACK]) - 140.
18-05-06 31m overload_complexity diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 7456 W: 1448 L: 1542 D: 4466
sprt @ 10+0.1 th 1 The optimum, if there is one, is near 12, above 8 and below 16. Try 14 * (Overloading[WHITE] + Overloading[BLACK]) - 143.
18-05-06 31m overload_complexity diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 39475 W: 8121 L: 8065 D: 23289
sprt @ 10+0.1 th 1 Take 2: 12 * (Overloading[WHITE] + Overloading[BLACK]) and greater decrease to the constant to compensate.
18-05-06 31m overload_complexity diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 23935 W: 4812 L: 4830 D: 14293
sprt @ 10+0.1 th 1 As an alternative, use only the overloading of the defending side. (I'm not entirely sure I did this correctly and would therefore appreciate any correction.)
18-05-06 31m overload_complexity diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 25269 W: 5104 L: 5115 D: 15050
sprt @ 10+0.1 th 1 Take 3: Since 12 was better than 8, try an even larger effect: 16 * (Overloading[WHITE] + Overloading[BLACK]) - 143.
18-05-06 31m overload_complexity diff
LLR: -2.97 (-2.94,2.94) [0.00,5.00]
Total: 14961 W: 2959 L: 3019 D: 8983
sprt @ 10+0.1 th 1 As a test, very large effect: 24 * (Overloading[WHITE] + Overloading[BLACK]) - 147.
18-05-06 31m overload_complexity diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 9278 W: 1852 L: 1938 D: 5488
sprt @ 10+0.1 th 1 Add overload to the complexity measure in initiative(). I anticipate that even if there is merit to this idea, it might take many tries to get it right. Use 8 * (Overloading[WHITE] + Overloading[BLACK]) and adjust the constant to maintain similar average results across bench positions. Please let me know if removing the const from threats() is an issue.
18-05-06 31m overload_pawns2 diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 7883 W: 1544 L: 1636 D: 4703
sprt @ 10+0.1 th 1 S(0, 5) was neutral, and S(0, 10) was terrible. I'm not sure the middle-of-the-road approach S(0, 7) will work, but let's try it.
18-05-05 31m overload_pawns2 diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 7436 W: 1486 L: 1580 D: 4370
sprt @ 10+0.1 th 1 Return to take 1 (the simplest and best-performing so far), but double the effect: S(0, 10).
18-05-05 31m overload_pawns2^^ diff
LLR: -2.97 (-2.94,2.94) [0.00,5.00]
Total: 26777 W: 5491 L: 5495 D: 15791
sprt @ 10+0.1 th 1 I apologize for my absence in the last two weeks--I've been a bit overloaded. :) I tried something similar in the past using half the overload bonus--S(5, 2)--but it occurs to me now that while piece overload is probably important in middlegames because of complexity/initiative/attack (I intend to also try to add it to initiative soon), pawn overload might be more important in endgames. Here, try S(0, 5) to start; this may need to be increased.
18-05-05 31m overload_pawns2^ diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 22540 W: 4563 L: 4587 D: 13390
sprt @ 10+0.1 th 1 Take 2: Exclude pawn targets that are pawn-defended.
18-05-05 31m overload_pawns2 diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 9674 W: 1930 L: 2014 D: 5730
sprt @ 10+0.1 th 1 Take 3: Exclude pawn targets that are pawn-defended and not pawn-attacked.
18-04-22 31m ocb_strongPasser diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 16934 W: 3415 L: 3465 D: 10054
sprt @ 10+0.1 th 1 The previous two versions had a bug--sorry. Here is the corrected version: "Similar to the most recent attempt, but try a slower rate of rise."
18-04-22 31m ocb_strongPasser^ diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 21627 W: 4318 L: 4347 D: 12962
sprt @ 10+0.1 th 1 The previous two versions had a bug--sorry. Here is the corrected version: "Rather than scaling sf linearly with the number of indefensible passers, try an approach inspired by @Stefano80's saturation tests."
18-04-22 31m ocb_strongPasser diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 2793 W: 487 L: 602 D: 1704
sprt @ 10+0.1 th 1 Rather than scaling sf linearly with the number of indefensible passers, try an approach inspired by @Stefano80's saturation tests.