Stockfish Testing Queue

Finished - 977 tests

18-05-21 31m tweak_threatOnQueen3 diff
LLR: -2.96 (-2.94,2.94) [0.00,4.00]
Total: 47960 W: 9679 L: 9662 D: 28619
sprt @ 10+0.1 th 1 Before concluding that this supposed small Elo gain is relatively insensitive to the degree of increase of this value, also try half the original effect.
18-05-21 31m tweak_threatOnQueen3 diff
LLR: -2.95 (-2.94,2.94) [0.00,4.00]
Total: 74886 W: 15060 L: 14944 D: 44882
sprt @ 10+0.1 th 1 Same, but double effect instead. If I am unable to improve upon the 81K yellow result through this or other tests, I will try a speculative LTC.
18-05-21 31m tweak_threatOnQueen3^ diff
LLR: -2.96 (-2.94,2.94) [0.00,4.00]
Total: 19151 W: 3772 L: 3862 D: 11517
sprt @ 10+0.1 th 1 Fishtest continues to surprise me. What was once a 33K green, and now a 6K red, is comprised of three nearly-neutral changes and an 81K yellow, without any large regressions. Try to improve upon the single tweak that led to an 81K yellow by increasing the effect by 50%. I apologize for rescheduling a few times; I made errors in these descriptions or accidentally used [0, 5] a few times.
18-05-21 31m tweak_threatOnQueen3 diff
LLR: -2.94 (-2.94,2.94) [0.00,4.00]
Total: 81349 W: 16353 L: 16213 D: 48783
sprt @ 10+0.1 th 1 The last of the four tests. See tweak_threatOnQueen3^^^ for description.
18-05-21 31m tweak_threatOnQueen3^ diff
LLR: -2.96 (-2.94,2.94) [0.00,4.00]
Total: 36173 W: 7262 L: 7289 D: 21622
sprt @ 10+0.1 th 1 See tweak_threatOnQueen3^^^.
18-05-21 31m tweak_threatOnQueen3^^ diff
LLR: -2.95 (-2.94,2.94) [0.00,4.00]
Total: 31018 W: 6130 L: 6176 D: 18712
sprt @ 10+0.1 th 1 See tweak_threatOnQueen3^^^.
18-05-21 31m tweak_threatOnQueen3^^^ diff
LLR: -2.96 (-2.94,2.94) [0.00,4.00]
Total: 29240 W: 5801 L: 5854 D: 17585
sprt @ 10+0.1 th 1 Since the framework is nearly empty, I would like to test my theory that some subset of this became a regression recently (the past 5-6 weeks) while the rest is still positive (if yellow). To do so, I will schedule four half-throughput tests, each testing a single one of the four tweaked values. I don't expect any to pass on their own, but in particular, I'm watching to see if one fails red very quickly.
18-05-21 31m connectivity_pins diff
LLR: -2.94 (-2.94,2.94) [0.00,5.00]
Total: 7358 W: 1426 L: 1520 D: 4412
sprt @ 10+0.1 th 1 S(5, 5), to compare to the 2*Connectivity = S(6, 2) used in the best test so far (47K yellow, +1.15 Elo).
18-05-20 31m connectivity_pins diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 47614 W: 9575 L: 9484 D: 28555
sprt @ 10+0.1 th 1 Still double effect, but ensure that the pinned piece is not attacked twice or more by the opponent but only once by us.
18-05-20 31m connectivity_pins diff
LLR: -2.97 (-2.94,2.94) [0.00,5.00]
Total: 11862 W: 2389 L: 2463 D: 7010
sprt @ 10+0.1 th 1 Narrowing the double-effect version appears to have substantially improved it, so try narrowing it further: include ~attackedBy[Them][PAWN] as a condition for the pinned piece. The resulting code is inspired by the definition of "weak", but applied to friendly rather than enemy pieces.
18-05-20 31m connectivity_pins diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 10035 W: 2021 L: 2103 D: 5911
sprt @ 10+0.1 th 1 Triple effect, without the new pawn-attacked condition.
18-05-20 31m connectivity_pins diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 4739 W: 910 L: 1017 D: 2812
sprt @ 10+0.1 th 1 Triple effect, with the new pawn-attacked condition.
18-05-20 31m connectivity_pins diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 32227 W: 6450 L: 6430 D: 19347
sprt @ 10+0.1 th 1 Borrowing from @xoroshiro's tests of applying double Connectivity in certain circumstances, but applied to different ones: promote solid defense by applying Connectivity a second time for our defended king blockers. Although this is a small bonus (perhaps too small) for a one-time application, the bench appears to change by a non-trivial amount, so I'm interested in seeing what this means in practice.
18-05-20 31m connectivity_pins diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 17299 W: 3509 L: 3557 D: 10233
sprt @ 10+0.1 th 1 Currently, the result appears essentially neutral. Ensure that this isn't due to inadequate bonus by doubling the effect.
18-05-19 31m soleDefendedPin diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 25171 W: 5101 L: 5113 D: 14957
sprt @ 10+0.1 th 1 Since the recommendation from "Alt Doom" was very specific, here's an attempt to interpret and implement it precisely. S(10, 10). I exclude the queen- and rook-specific cases for now to first evaluate how this performs on its own and to minimize the number of new bonuses to optimize.
18-05-19 31m weakpin diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 10686 W: 2142 L: 2221 D: 6323
sprt @ 10+0.1 th 1 Closer to the spirit of what the user "Alt Doom" posted on the FishCooking suggestions thread and not the current Overload, namely by removing ~attackedBy2[Us] as a condition. However, this is slightly different, excluding enemy pawns and using weak rather than ~attackedBy2[Them]. Give a S(10, 10) bonus for enemy king blockers that are weak.
18-05-19 31m tweak_threatOnQueen2 diff
LLR: -2.96 (-2.94,2.94) [0.00,4.00]
Total: 9302 W: 1758 L: 1884 D: 5660
sprt @ 10+0.1 th 1 The original tests were mostly mg effects, so try that alone. -20% to ThreatByRook[QUEEN] and ThreatByMinor[QUEEN] middlegame values.
18-05-17 31m tweak_threatOnQueen2 diff
LLR: -2.96 (-2.94,2.94) [0.00,4.00]
Total: 11013 W: 2165 L: 2285 D: 6563
sprt @ 10+0.1 th 1 Double effect: -20% of TOQ. Rescheduled with correct bounds (thanks Michael Chaly!).
18-05-17 31m tweak_threatOnQueen2^ diff
LLR: -2.95 (-2.94,2.94) [0.00,4.00]
Total: 9622 W: 1868 L: 1993 D: 5761
sprt @ 10+0.1 th 1 5-6 weeks ago, increasing TOQ reliably led to long yellows or even a green. A repeat of that 33K green (+2.5 Elo) yesterday failed red in 6400 games (-7 Elo). I presume something has changed in SF to make this a regression (any ideas what?), but since increasing TOQ is now terrible, maybe decreasing TOQ is a gain. Here, -10%.
18-05-17 31m tweak_threatOnQueen2 diff
LLR: 0.93 (-2.94,2.94) [0.00,5.00]
Total: 3795 W: 795 L: 737 D: 2263
sprt @ 10+0.1 th 1 Double effect: -20% of TOQ.
18-05-17 31m overload_pinned2 diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 10750 W: 2184 L: 2263 D: 6303
sprt @ 10+0.1 th 1 S(10, 10) was a 52K yellow. Try S(15, 15), and hope it's enough.
18-05-17 31m overload_pinned2 diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 23688 W: 4766 L: 4785 D: 14137
sprt @ 10+0.1 th 1 I'm surprised but optimistic that eg > mg might be the answer I've been searching for. Try S(10, 15).
18-05-17 31m overload_pinned2^ diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 18441 W: 3604 L: 3648 D: 11189
sprt @ 10+0.1 th 1 Also try S(5, 10).
18-05-17 31m overload_pinned2 diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 52482 W: 10586 L: 10472 D: 31424
sprt @ 10+0.1 th 1 S(20, 0) was unexpectedly much worse than S(20, 10). Though the effect could be noise, perhaps the eg bonus is actually the key. Try S(10, 10).
18-05-17 31m overload_pinned2 diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 29394 W: 5947 L: 5939 D: 17508
sprt @ 10+0.1 th 1 The most recent test appears headed for fail-yellow in about 40K games (about +1 Elo). This is consistent with my previous efforts, but preferable because it is comparably simple. Make sure the difficulty isn't due to inadequate bonus: double it to S(20, 10).
18-05-17 31m overload_pinned2 diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 12535 W: 2458 L: 2529 D: 7548
sprt @ 10+0.1 th 1 Because I am suspicious of applying such a large (10 cp) endgame bonus, also try S(20, 0).
18-05-17 31m overload_pinned2 diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 39779 W: 8024 L: 7969 D: 23786
sprt @ 10+0.1 th 1 A post by the user "Alt Doom" on the suggestions thread made me rethink my approach to this idea. Rather than redefine Overload entirely to add multiply-attacked pieces, here's a much simpler version (+2 lines only): just take the subset of Overload targets that are pinned, and give extra bonus. (I can't believe I didn't try this approach before.) The amount of bonus may need to be tuned, but the framework is empty.
18-05-16 31m RookOnPasser diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 21894 W: 4425 L: 4452 D: 13017
sprt @ 10+0.1 th 1 I've already submitted more tests of this idea than initially intended, but since the framework is nearly empty, I'll try again. Do not reduce the bonus if the passed pawn, escaping by a pawn push, exposes another pawn (not necessarily passed) to attack.
18-05-16 31m tweak_threatOnQueen diff
LLR: -2.95 (-2.94,2.94) [0.00,4.00]
Total: 6410 W: 1236 L: 1373 D: 3801
sprt @ 10+0.1 th 1 Since the framework is nearly empty, here's a sanity check I've wanted to schedule for a while. Three attempts to combo this STC green/LTC yellow with more recent [0, 4] patches have failed. Was the STC green just lucky? Has something changed in the SF code to make this a regression? Rebase on new master and test. "No change" is 33K green; I suspect this could fail quickly. I do not intend to pursue LTC even if green, because it has failed before. This is for information only.
18-05-16 31m RookOnPasser2^ diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 23237 W: 4697 L: 4718 D: 13822
sprt @ 10+0.1 th 1 An inelegant solution, but closer to my original intentions in its effect. In the passed pawn evaluation, check whether the passed pawn can safely move upward. If so, check whether RookOnPawn was given to the opponent and cancel it out by giving ourselves the same bonus.
18-05-16 31m RookOnPasser2 diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 11747 W: 2359 L: 2433 D: 6955
sprt @ 10+0.1 th 1 Half effect (i.e., cancel half of the RookOnPawn bonus).
18-05-16 31m RookOnPasser diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 10824 W: 2160 L: 2238 D: 6426
sprt @ 10+0.1 th 1 That test outperformed my expectations (32K yellow, +0.75 Elo). As a sanity check, try a simpler (and more drastic) change: eliminate this case entirely, rather than giving half bonus. Do not apply RookOnPawn at all for passed pawns attacked along a rank.
18-05-16 31m RookOnPasser diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 32679 W: 6540 L: 6518 D: 19621
sprt @ 10+0.1 th 1 Inspired by the discussion around TCEC 11 Superfinal: Game 59, started by Bryan. I agree with Rocky that RookOnPawn probably does not have much to do with this, but since the framework is mostly empty (besides tuning), I would like to try something anyway. Halve the RookOnPawn bonus for passed pawns attacked along a rank, since this does nothing to stop their progress.
18-05-14 31m KingOpenFile2 diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 28083 W: 5610 L: 5609 D: 16864
sprt @ 10+0.1 th 1 Take 6b: Apply a bonus if the opponent has exactly one open file against our king, but the opponent has no rook that will ever be able to use it. Intended to offset some of the king safety penalty in this case.
18-05-14 31m KingOpenFile2^ diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 11851 W: 2325 L: 2399 D: 7127
sprt @ 10+0.1 th 1 Retry the best two attempts with the logic from 1b (only apply if there is exactly one open file). Here, take 2b: also apply if the enemy has a queen.
18-05-14 31m KingOpenFile2^^ diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 30078 W: 6099 L: 6088 D: 17891
sprt @ 10+0.1 th 1 I suspect the previous tests might have been applied too broadly. Here's a trio of modifications to take 1 to narrow the scope of this patch. Take 1b: Only apply the penalty in the case of exactly one open file.
18-05-14 31m KingOpenFile2 diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 10636 W: 2092 L: 2171 D: 6373
sprt @ 10+0.1 th 1 Take 1b (currently +0.67 Elo after 28K games and counting) appears to have been an improvement over the original take 1 (failed red in 20K). I may try to merge this new logic into other versions that performed slightly better than take 1, but I would also like to see if the results of 1b can be improved by simply doubling the bonus to S(10, 0)--5 cp was always a conservative guess.
18-05-14 31m KingOpenFile2^ diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 10199 W: 1999 L: 2080 D: 6120
sprt @ 10+0.1 th 1 Take 1c: Exclude the king's own file, to prevent overlap with checks in the king safety evaluation.
18-05-14 31m KingOpenFile2 diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 2976 W: 543 L: 658 D: 1775
sprt @ 10+0.1 th 1 Take 1d: Only consider the very narrow case of one adjacent open file. I suspect this is too restrictive, but would like to test it as a sanity check. As an aside, I know some of these tests can be simplified; I will do so later if one passes.
18-05-14 31m KingOpenFile2 diff
LLR: -2.97 (-2.94,2.94) [0.00,5.00]
Total: 23283 W: 4703 L: 4724 D: 13856
sprt @ 10+0.1 th 1 Apply a bonus if the opponent has an open file against our king, but the opponent has no rook that will ever be able to use it. Intended to offset some of the king safety penalty in this case.
18-05-14 31m KingOpenFile2 diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 16622 W: 3373 L: 3424 D: 9825
sprt @ 10+0.1 th 1 Take 4: Apply if the opponent has more rooks than we do.
18-05-14 31m KingOpenFile2 diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 10890 W: 2120 L: 2198 D: 6572
sprt @ 10+0.1 th 1 Take 5: Apply only if the opponent has more queens and rooks than we do.
18-05-14 31m KingOpenFile2 diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 23830 W: 4841 L: 4859 D: 14130
sprt @ 10+0.1 th 1 Take 2: Also apply if the enemy has a queen.
18-05-14 31m KingOpenFile2 diff
LLR: -2.97 (-2.94,2.94) [0.00,5.00]
Total: 19964 W: 4076 L: 4112 D: 11776
sprt @ 10+0.1 th 1 My gratitude to @Rocky640 for the continued patience and bugfixes. A modest middlegame penalty, S(5, 0), if there are open files against our king (i.e., its file and the adjacent ones) and the enemy has a rook. Inspired by Bryan's analysis of AlphaZero vs. Stockfish: Game 4.
18-05-14 31m KingOpenFile2 diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 10966 W: 2168 L: 2246 D: 6552
sprt @ 10+0.1 th 1 Take 3: A narrower approach, applied only if the enemy has a rook and we do not.
18-05-13 31m kingRingOpenFile^ diff
LLR: -1.09 (-2.94,2.94) [0.00,5.00]
Total: 1817 W: 347 L: 386 D: 1084
sprt @ 10+0.1 th 1 Take 2: Also apply if the opponent has a queen.
18-05-13 31m kingRingOpenFile^^ diff
LLR: -0.21 (-2.94,2.94) [0.00,5.00]
Total: 1734 W: 341 L: 342 D: 1051
sprt @ 10+0.1 th 1 Based on Bryan's analysis of AlphaZero vs. Stockfish: Game 4. I've thought about how to structure this off-and-on for months, and can only conclude that there are many different versions that can be tested, so let's submit my first three. Here, apply a modest S(5, 0) penalty if open files exist in our king ring, and the enemy has a rook. This is not the same as king ring attacks--the rook need not be using the file currently, but can in the future--nor unsafe/safe checks, because this is also applied to future threats on open files adjacent to the king.
18-05-13 31m kingRingOpenFile diff
LLR: 0.09 (-2.94,2.94) [0.00,5.00]
Total: 1539 W: 319 L: 308 D: 912
sprt @ 10+0.1 th 1 Take 3: A narrower version: only apply if the opponent has a rook and we do not.
18-05-12 31m pin_pressure diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 18470 W: 2652 L: 2706 D: 13112
sprt @ 60+0.6 th 1 Speculative LTC for the 67K yellow. (I haven't submitted a speculative LTC before, so I apologize for any errors.) Low throughput (100).
18-05-12 31m hanging_nonlinear diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 13935 W: 2751 L: 2815 D: 8369
sprt @ 10+0.1 th 1 A somewhat slower rate of rise: (x^2 + 1) / 2 rather than simply x^2.