Stockfish Testing Queue

Finished - 977 tests

18-06-01 31m passerDoubleSupport diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 42965 W: 8730 L: 8659 D: 25576
sprt @ 10+0.1 th 1 I was disappointed to see the STC 14K green fail yellow at LTC, since a prior test on this branch had scaled very well. Try increasing the size of the effect even further, to k += 4. My hope is that a large effect, if still a gain, can give a LTC green if it makes it past STC (i.e., that the current effect is of inadequate size at LTC).
18-06-01 31m passerDoubleSupport2 diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 47773 W: 9762 L: 9668 D: 28343
sprt @ 10+0.1 th 1 Check the effect of the new logic (using the whole file) on this 96K yellow.
18-06-01 31m passerDoubleSupport diff
LLR: -2.97 (-2.94,2.94) [0.00,5.00]
Total: 35182 W: 7113 L: 7079 D: 20990
sprt @ 10+0.1 th 1 Greater effect: k += 3.
18-06-01 31m passerDoubleSupport^ diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 8728 W: 1700 L: 1788 D: 5240
sprt @ 10+0.1 th 1 One of a few variants of the current LTC, with priority -1 until it finishes, to be raised to 0 if it fails and deleted if it passes. If I am unavailable to do so, hopefully someone else can (as it's not at all clear when this LTC will finish). Here, try less effect: k++ rather than k += 2.
18-06-01 31m passerDoubleSupport diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 33053 W: 6635 L: 6611 D: 19807
sprt @ 10+0.1 th 1 A different idea. Consider cases where we have more rook/queen defenders along the file than the opponent has rook/queen attackers: at least two friendly rooks/queens behind the passed pawn versus less than two enemy ones in front of it, or at least one versus none. (I ignore 3 vs 2, because I suspect this is very rare.)
18-06-01 31m passerDoubleSupport diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 48645 W: 7092 L: 7038 D: 34515
sprt @ 60+0.6 th 1 LTC: A wider application of the same bonus as the 60K STC, 61K LTC yellow. Rather than only consider defense of the passed pawn from behind, consider the entire file.
18-06-01 31m passerDoubleSupport diff
LLR: 2.95 (-2.94,2.94) [0.00,5.00]
Total: 14052 W: 2949 L: 2755 D: 8348
sprt @ 10+0.1 th 1 A wider application of the same bonus as the 60K STC, 61K LTC yellow. Rather than only consider defense of the passed pawn from behind, consider the entire file.
18-06-01 31m passerDoubleSupport^ diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 25329 W: 5186 L: 5196 D: 14947
sprt @ 10+0.1 th 1 Another attempt to narrow the extra bonus. Require that the passer, supported twice from behind, is not attacked twice from its front.
18-06-01 31m passerDoubleSupport^^ diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 16963 W: 3394 L: 3444 D: 10125
sprt @ 10+0.1 th 1 It appears more likely than not that neither speculative LTC will pass. However, there's still useful information: passerDoubleSupport appears to scale much better than passerDoubleSupport2 (60K yellow -> 60K games and counting, versus 96K yellow -> 18K red). Let's try a few tweaks to the logic in the 60K yellow and hope for a green. Here, consider rooks only; previous versions included queens.
18-05-30 31m ee33a3f7deb5a63d96466af diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 61227 W: 9107 L: 9005 D: 43115
sprt @ 60+0.6 th 1 Speculative LTC for this 60K yellow STC from passerDoubleSupport. If this also scales poorly, perhaps there is no LTC Elo to gain here. Low throughput (166).
18-05-30 31m passerDoubleSupport2^ diff
LLR: -2.97 (-2.94,2.94) [0.00,5.00]
Total: 18980 W: 2788 L: 2840 D: 13352
sprt @ 60+0.6 th 1 Speculative LTC for this 96K yellow. Low throughput (166).
18-05-30 31m passerDoubleSupport2 diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 96783 W: 19584 L: 19266 D: 57933
sprt @ 10+0.1 th 1 S(15, 15), since 10 was better than 5.
18-05-30 31m passerDoubleSupport2 diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 10798 W: 2128 L: 2207 D: 6463
sprt @ 10+0.1 th 1 S(20, 20).
18-05-29 31m passerDoubleSupport2 diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 41865 W: 8424 L: 8359 D: 25082
sprt @ 10+0.1 th 1 S(10, 10) instead.
18-05-29 31m passerDoubleSupport2^ diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 19510 W: 3900 L: 3938 D: 11672
sprt @ 10+0.1 th 1 Independence from rank and friendly blocked pieces appears to have been worse than simply changing k. What if S(5, 5) is given only if the pawn is free to advance (i.e., no blocking piece) but still independent of rank?
18-05-29 31m passerDoubleSupport2^ diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 36694 W: 7386 L: 7345 D: 21963
sprt @ 10+0.1 th 1 A speculative LTC for my 60K yellow is still an option, but I would like to try a few other things first. Here, apply a simple S(5, 5) bonus. This differs from my previous attempts in that the magnitude of extra bonus is independent of rank (unlike changing k) and is also applied if a friendly piece blocks the passer.
18-05-29 31m passerDoubleSupport2 diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 9928 W: 1968 L: 2051 D: 5909
sprt @ 10+0.1 th 1 Same, but S(10, 10).
18-05-29 31m passerDoubleSupport diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 60762 W: 12344 L: 12191 D: 36227
sprt @ 10+0.1 th 1 Increase k by 2. I suspect this might be too much, but I would like to test this on the empty framework.
18-05-29 31m passerDoubleSupport^ diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 14520 W: 2926 L: 2987 D: 8607
sprt @ 10+0.1 th 1 Increasing k by 2 currently has an LLR of about 0 after 32K games, better than increasing by 1 (red in 17K). It's not clear why. The intuitive next step is to try increasing k by 3.
18-05-29 31m passerDoubleSupport diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 13474 W: 2673 L: 2739 D: 8062
sprt @ 10+0.1 th 1 Also try increasing k by 4.
18-05-29 31m passerDoubleSupport^ diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 17443 W: 3509 L: 3557 D: 10377
sprt @ 10+0.1 th 1 Another idea based on Jonny 1/2 - 1/2 Stockfish. In Evaluation::passed(), increase k (and thus the passed pawn's bonus) if the passed pawn has support from behind from more than one friendly rook or queen. It appears that Elo is likely quite sensitive to k, so try increasing by only 1 first.
18-05-28 31m promotionBlocker diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 5738 W: 1140 L: 1242 D: 3356
sprt @ 10+0.1 th 1 Sanity check: S(5, 5) bonus, not penalty.
18-05-28 31m promotionBlocker diff
LLR: -2.97 (-2.94,2.94) [0.00,5.00]
Total: 12103 W: 2413 L: 2486 D: 7204
sprt @ 10+0.1 th 1 Same, but a larger S(10, 10).
18-05-28 31m promotionBlocker^ diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 13310 W: 2618 L: 2685 D: 8007
sprt @ 10+0.1 th 1 I had hoped to achieve this by modifying mobility, since that makes conceptual sense, but no test has so far lasted more than 8200 games. For small values of mobility, even changing mob by 1 can dramatically influence MobilityBonus, and this seems rather unavoidable. Perhaps I just need a constant penalty. Here, try a deliberately small S(5, 5) (partly as a sanity check).
18-05-28 31m promotionBlocker diff
LLR: -2.94 (-2.94,2.94) [0.00,5.00]
Total: 3274 W: 606 L: 719 D: 1949
sprt @ 10+0.1 th 1 Restrict b to only include squares which give checks to the opponent, though after attackedBy and KingAttackers bitboards are modified. I suspect this will fail quickly, but want to be sure (and the framework is mostly empty).
18-05-28 31m promotionBlocker^ diff
LLR: -2.94 (-2.94,2.94) [0.00,5.00]
Total: 2341 W: 410 L: 527 D: 1404
sprt @ 10+0.1 th 1 Simply decrement mob (if mob > 0). This is the smallest effect I've tested.
18-05-27 31m promotionBlocker diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 7206 W: 1376 L: 1471 D: 4359
sprt @ 10+0.1 th 1 Alternatively, halve the original mob value, rather than making it a constant.
18-05-27 31m promotionBlocker^ diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 8179 W: 1604 L: 1695 D: 4880
sprt @ 10+0.1 th 1 Same, but mob = 2.
18-05-27 31m promotionBlocker^^ diff
LLR: -2.94 (-2.94,2.94) [0.00,5.00]
Total: 3749 W: 710 L: 821 D: 2218
sprt @ 10+0.1 th 1 mob = 0 was clearly too drastic a change; upon closer inspection, the average mob in this case is 5.3. Therefore, a simple way to test this idea with less drastic effect is to fix mob at a small, but greater than zero, value. Here, mob = 1.
18-05-27 31m promotionBlocker^ diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 7967 W: 1583 L: 1675 D: 4709
sprt @ 10+0.1 th 1 In a very recent TCEC game (Jonny 1/2 - 1/2 Stockfish), Stockfish gave a steady -0.65 eval. Notably, the a8-rook is stuck stopping the promotion of an a7-pawn supported by two rooks on the file, yet SF still appears to give full mobility bonus to its own rook. Here, wipe out the mobility bonus (i.e., mob = 0) for a piece on the back rank stopping an immediate promotion by a rook- or queen-supported pawn.
18-05-27 31m promotionBlocker diff
LLR: -2.97 (-2.94,2.94) [0.00,5.00]
Total: 5321 W: 988 L: 1092 D: 3241
sprt @ 10+0.1 th 1 Take 2: Only eliminate the mobility bonus if the passed pawn has more than one R or Q supporting it along the file.
18-05-26 31m korigatachi diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 13851 W: 2788 L: 2852 D: 8211
sprt @ 10+0.1 th 1 Half effect, S(10, 10), after which I may be out of ideas on this branch for now.
18-05-26 31m korigatachi diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 6891 W: 1328 L: 1425 D: 4138
sprt @ 10+0.1 th 1 Restore S(20, 20) and add a condition: more_than_one(weak). My idea is that if there is only one weak enemy piece, trivially there will only be weak pieces on one side of the board. This is not over-concentration.
18-05-26 31m korigatachi diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 11020 W: 2238 L: 2315 D: 6467
sprt @ 10+0.1 th 1 @snicolet recently submitted a really interesting idea--I hope nobody minds if I also submit a few tests. I noticed that eg malus >= mg malus in all tests so far of this idea (that I've seen), but recently S(20, 20) performed slightly better than S(0, 20). I'm curious, as a sanity check on a nearly empty framework, to see how S(20, 0) performs for comparison.
18-05-26 31m overload3 diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 12282 W: 2403 L: 2475 D: 7404
sprt @ 10+0.1 th 1 Offset by reducing the main Overload to S(5, 5), so Overload + OverloadGoodTrade = the old Overload. If this fails red, I think I can conclude that the 28K yellow was simply lucky.
18-05-26 31m overload3 diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 16814 W: 3380 L: 3431 D: 10003
sprt @ 10+0.1 th 1 I expected that sanity check to perform quite poorly, so I was surprised to see a 28K yellow. Let's explore this further. Extra S(5, 0) if the Overload bitboard contains a clearly good trade.
18-05-26 31m overload3 diff
LLR: -2.94 (-2.94,2.94) [0.00,5.00]
Total: 28747 W: 5794 L: 5789 D: 17164
sprt @ 10+0.1 th 1 Restricting Overload to exclude these cases was so abysmal (red in 5702 games) that, as a sanity check, I wonder how using only these cases fares. Not expected to pass, but submitted to inform development (and due to empty framework).
18-05-25 31m overload3 diff
LLR: -2.97 (-2.94,2.94) [0.00,5.00]
Total: 22622 W: 4474 L: 4499 D: 13649
sprt @ 10+0.1 th 1 For an empty framework, here's another idea. Don't give Overload bonus if the opponent can easily make an equal trade to simplify away the tension (i.e., if the targeted piece and the attacking piece are of the same type). I suspect search might already take care of this, but I'm still curious to see how this performs.
18-05-25 31m overload3 diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 5702 W: 1101 L: 1203 D: 3398
sprt @ 10+0.1 th 1 More generally, exclude our clearly good trades: (a) any nonPawnEnemies attacked by our pawns, (b) rooks or queens attacked by minors, or (c) queens attacked by rooks.
18-05-25 31m overload3 diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 13733 W: 2673 L: 2738 D: 8322
sprt @ 10+0.1 th 1 A lot of people have made a lot of overload tests. If I recall correctly, excluding enemy pawn attacks failed, and excluding both sides' pawn attacks failed, but I don't think anyone has tested excluding just our own. Since the enemies are nonPawnEnemies, anything we attack with a pawn is more or less hanging. (Can this be tested as [0, 4]? I'm not sure, so I've scheduled the more conservative bounds for now. Please clarify for me--thanks!)
18-05-24 31m useless_outpost diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 7293 W: 1412 L: 1507 D: 4374
sprt @ 10+0.1 th 1 This should be a very narrow take. Check whether the knight outpost can "see" even one square which is another piece or enemy kingRing square, but not defended by an enemy pawn. If not, cut the bonus by a quarter.
18-05-24 31m useless_outpost diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 8754 W: 1747 L: 1835 D: 5172
sprt @ 10+0.1 th 1 For knight outposts, check the squares attacked from the outpost that are not defended by enemy pawns and are occupied by non-KP pieces of either color. If there are none, cut the Outpost bonus by a quarter. Fixed a bug which prevented the second case from having any effect.
18-05-24 31m useless_outpost diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 8485 W: 1649 L: 1738 D: 5098
sprt @ 10+0.1 th 1 Halve the bonus only if none of the knight's attacked squares are (a) enemy pieces, including pawns or (b) enemy kingRing squares.
18-05-24 31m useless_outpost diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 19030 W: 3835 L: 3875 D: 11320
sprt @ 10+0.1 th 1 A narrower take: halve the bonus only if the outpost knight attacks no non-KP pieces at all. (A knight that attacks one such piece gets full bonus in this patch, but half in the previous one.)
18-05-24 31m useless_outpost^ diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 8846 W: 1773 L: 1861 D: 5212
sprt @ 10+0.1 th 1 Inspired by Bryan's posts and @robal's tests. However, I think that proximity to the king might be a bad measure of the activity of an outpost, because in many circumstances both sides focus their activities on the other side of the board--these outposts are still useful. Instead, consider the number of pieces attacked. Here, halve the Outpost bonus for knights that attack no more than one non-KP piece (of either color).
18-05-23 31m tweak_threatOnQueen3 diff
LLR: -2.95 (-2.94,2.94) [0.00,4.00]
Total: 43360 W: 6355 L: 6380 D: 30625
sprt @ 60+0.6 th 1 Since the framework is nearly empty, except for another speculative LTC, here's one for my 81K yellow. Quarter throughput.
18-05-22 31m combo_TOQ3_asp diff
LLR: -2.96 (-2.94,2.94) [0.00,4.00]
Total: 28561 W: 5776 L: 5831 D: 16954
sprt @ 10+0.1 th 1 I was surprised and disappointed by the fail-red combos. In particular, the two search patches appear to not be orthogonal, and their interaction leads to a pronounced regression. Separate them. In hindsight, perhaps combining one eval and one search patch increases the chances the two patches will be orthogonal and thus the chance of a green. Here, combo TOQ3 with asp_tune3.
18-05-22 31m combo_TOQ3_SE diff
LLR: -2.95 (-2.94,2.94) [0.00,4.00]
Total: 10671 W: 2134 L: 2255 D: 6282
sprt @ 10+0.1 th 1 Along the same lines as combo_TOQ3_asp, combo TOQ3 with seExp.
18-05-22 31m combo_TOQ3_KD diff
LLR: -2.95 (-2.94,2.94) [0.00,4.00]
Total: 31595 W: 6378 L: 6421 D: 18796
sprt @ 10+0.1 th 1 Originally, I intended to pursue a speculative LTC. However, there are now four completely separate, long yellow [0, 4] runs, so there seems to be ample opportunity for combo/union patches. These could be combined in any way, but since there are 2 eval and 2 search patches, and those naturally group together, try those pairings first. Here, combine my best tweak_threatOnQueen3 (81K yellow, +0.98 Elo) and @xoroshiro's best kingDanger (60K yellow, +0.77 Elo).
18-05-22 31m combo_SE_asp diff
LLR: -2.95 (-2.94,2.94) [0.00,4.00]
Total: 10411 W: 2011 L: 2133 D: 6267
sprt @ 10+0.1 th 1 I hope it's OK to submit a combo, even if I didn't write either of the constituent patches, since the framework is mostly empty. (I apologize if it is not; please let me know.) Combine two recent promising yellow [0, 4] patches: seExp by @VoyagerOne (76K yellow, +0.94 Elo) and @bigpen0r's asp_tune3 (69K yellow, +0.88 Elo). Full credit to those two authors for this patch and their excellent work.