Stockfish Testing Queue

Pending - 0 tests 0.0 hrs

None

Active - 0 tests

Finished - 255 tests

18-05-28 31m promotionBlocker diff
LLR: -2.97 (-2.94,2.94) [0.00,5.00]
Total: 12103 W: 2413 L: 2486 D: 7204
sprt @ 10+0.1 th 1 Same, but a larger S(10, 10).
18-05-28 31m promotionBlocker^ diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 13310 W: 2618 L: 2685 D: 8007
sprt @ 10+0.1 th 1 I had hoped to achieve this by modifying mobility, since that makes conceptual sense, but no test has so far lasted more than 8200 games. For small values of mobility, even changing mob by 1 can dramatically influence MobilityBonus, and this seems rather unavoidable. Perhaps I just need a constant penalty. Here, try a deliberately small S(5, 5) (partly as a sanity check).
18-05-28 31m promotionBlocker diff
LLR: -2.94 (-2.94,2.94) [0.00,5.00]
Total: 3274 W: 606 L: 719 D: 1949
sprt @ 10+0.1 th 1 Restrict b to only include squares which give checks to the opponent, though after attackedBy and KingAttackers bitboards are modified. I suspect this will fail quickly, but want to be sure (and the framework is mostly empty).
18-05-28 31m promotionBlocker^ diff
LLR: -2.94 (-2.94,2.94) [0.00,5.00]
Total: 2341 W: 410 L: 527 D: 1404
sprt @ 10+0.1 th 1 Simply decrement mob (if mob > 0). This is the smallest effect I've tested.
18-05-27 31m promotionBlocker diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 7206 W: 1376 L: 1471 D: 4359
sprt @ 10+0.1 th 1 Alternatively, halve the original mob value, rather than making it a constant.
18-05-27 31m promotionBlocker^ diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 8179 W: 1604 L: 1695 D: 4880
sprt @ 10+0.1 th 1 Same, but mob = 2.
18-05-27 31m promotionBlocker^^ diff
LLR: -2.94 (-2.94,2.94) [0.00,5.00]
Total: 3749 W: 710 L: 821 D: 2218
sprt @ 10+0.1 th 1 mob = 0 was clearly too drastic a change; upon closer inspection, the average mob in this case is 5.3. Therefore, a simple way to test this idea with less drastic effect is to fix mob at a small, but greater than zero, value. Here, mob = 1.
18-05-27 31m promotionBlocker^ diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 7967 W: 1583 L: 1675 D: 4709
sprt @ 10+0.1 th 1 In a very recent TCEC game (Jonny 1/2 - 1/2 Stockfish), Stockfish gave a steady -0.65 eval. Notably, the a8-rook is stuck stopping the promotion of an a7-pawn supported by two rooks on the file, yet SF still appears to give full mobility bonus to its own rook. Here, wipe out the mobility bonus (i.e., mob = 0) for a piece on the back rank stopping an immediate promotion by a rook- or queen-supported pawn.
18-05-27 31m promotionBlocker diff
LLR: -2.97 (-2.94,2.94) [0.00,5.00]
Total: 5321 W: 988 L: 1092 D: 3241
sprt @ 10+0.1 th 1 Take 2: Only eliminate the mobility bonus if the passed pawn has more than one R or Q supporting it along the file.
18-05-26 31m korigatachi diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 13851 W: 2788 L: 2852 D: 8211
sprt @ 10+0.1 th 1 Half effect, S(10, 10), after which I may be out of ideas on this branch for now.
18-05-26 31m korigatachi diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 6891 W: 1328 L: 1425 D: 4138
sprt @ 10+0.1 th 1 Restore S(20, 20) and add a condition: more_than_one(weak). My idea is that if there is only one weak enemy piece, trivially there will only be weak pieces on one side of the board. This is not over-concentration.
18-05-26 31m korigatachi diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 11020 W: 2238 L: 2315 D: 6467
sprt @ 10+0.1 th 1 @snicolet recently submitted a really interesting idea--I hope nobody minds if I also submit a few tests. I noticed that eg malus >= mg malus in all tests so far of this idea (that I've seen), but recently S(20, 20) performed slightly better than S(0, 20). I'm curious, as a sanity check on a nearly empty framework, to see how S(20, 0) performs for comparison.
18-05-26 31m overload3 diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 12282 W: 2403 L: 2475 D: 7404
sprt @ 10+0.1 th 1 Offset by reducing the main Overload to S(5, 5), so Overload + OverloadGoodTrade = the old Overload. If this fails red, I think I can conclude that the 28K yellow was simply lucky.
18-05-26 31m overload3 diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 16814 W: 3380 L: 3431 D: 10003
sprt @ 10+0.1 th 1 I expected that sanity check to perform quite poorly, so I was surprised to see a 28K yellow. Let's explore this further. Extra S(5, 0) if the Overload bitboard contains a clearly good trade.
18-05-26 31m overload3 diff
LLR: -2.94 (-2.94,2.94) [0.00,5.00]
Total: 28747 W: 5794 L: 5789 D: 17164
sprt @ 10+0.1 th 1 Restricting Overload to exclude these cases was so abysmal (red in 5702 games) that, as a sanity check, I wonder how using only these cases fares. Not expected to pass, but submitted to inform development (and due to empty framework).
18-05-25 31m overload3 diff
LLR: -2.97 (-2.94,2.94) [0.00,5.00]
Total: 22622 W: 4474 L: 4499 D: 13649
sprt @ 10+0.1 th 1 For an empty framework, here's another idea. Don't give Overload bonus if the opponent can easily make an equal trade to simplify away the tension (i.e., if the targeted piece and the attacking piece are of the same type). I suspect search might already take care of this, but I'm still curious to see how this performs.
18-05-25 31m overload3 diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 5702 W: 1101 L: 1203 D: 3398
sprt @ 10+0.1 th 1 More generally, exclude our clearly good trades: (a) any nonPawnEnemies attacked by our pawns, (b) rooks or queens attacked by minors, or (c) queens attacked by rooks.
18-05-25 31m overload3 diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 13733 W: 2673 L: 2738 D: 8322
sprt @ 10+0.1 th 1 A lot of people have made a lot of overload tests. If I recall correctly, excluding enemy pawn attacks failed, and excluding both sides' pawn attacks failed, but I don't think anyone has tested excluding just our own. Since the enemies are nonPawnEnemies, anything we attack with a pawn is more or less hanging. (Can this be tested as [0, 4]? I'm not sure, so I've scheduled the more conservative bounds for now. Please clarify for me--thanks!)
18-05-24 31m useless_outpost diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 7293 W: 1412 L: 1507 D: 4374
sprt @ 10+0.1 th 1 This should be a very narrow take. Check whether the knight outpost can "see" even one square which is another piece or enemy kingRing square, but not defended by an enemy pawn. If not, cut the bonus by a quarter.
18-05-24 31m useless_outpost diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 8754 W: 1747 L: 1835 D: 5172
sprt @ 10+0.1 th 1 For knight outposts, check the squares attacked from the outpost that are not defended by enemy pawns and are occupied by non-KP pieces of either color. If there are none, cut the Outpost bonus by a quarter. Fixed a bug which prevented the second case from having any effect.
18-05-24 31m useless_outpost diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 8485 W: 1649 L: 1738 D: 5098
sprt @ 10+0.1 th 1 Halve the bonus only if none of the knight's attacked squares are (a) enemy pieces, including pawns or (b) enemy kingRing squares.
18-05-24 31m useless_outpost diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 19030 W: 3835 L: 3875 D: 11320
sprt @ 10+0.1 th 1 A narrower take: halve the bonus only if the outpost knight attacks no non-KP pieces at all. (A knight that attacks one such piece gets full bonus in this patch, but half in the previous one.)
18-05-24 31m useless_outpost^ diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 8846 W: 1773 L: 1861 D: 5212
sprt @ 10+0.1 th 1 Inspired by Bryan's posts and @robal's tests. However, I think that proximity to the king might be a bad measure of the activity of an outpost, because in many circumstances both sides focus their activities on the other side of the board--these outposts are still useful. Instead, consider the number of pieces attacked. Here, halve the Outpost bonus for knights that attack no more than one non-KP piece (of either color).
18-05-23 31m tweak_threatOnQueen3 diff
LLR: -2.95 (-2.94,2.94) [0.00,4.00]
Total: 43360 W: 6355 L: 6380 D: 30625
sprt @ 60+0.6 th 1 Since the framework is nearly empty, except for another speculative LTC, here's one for my 81K yellow. Quarter throughput.
18-05-22 31m combo_TOQ3_asp diff
LLR: -2.96 (-2.94,2.94) [0.00,4.00]
Total: 28561 W: 5776 L: 5831 D: 16954
sprt @ 10+0.1 th 1 I was surprised and disappointed by the fail-red combos. In particular, the two search patches appear to not be orthogonal, and their interaction leads to a pronounced regression. Separate them. In hindsight, perhaps combining one eval and one search patch increases the chances the two patches will be orthogonal and thus the chance of a green. Here, combo TOQ3 with asp_tune3.
18-05-22 31m combo_TOQ3_SE diff
LLR: -2.95 (-2.94,2.94) [0.00,4.00]
Total: 10671 W: 2134 L: 2255 D: 6282
sprt @ 10+0.1 th 1 Along the same lines as combo_TOQ3_asp, combo TOQ3 with seExp.
18-05-22 31m combo_TOQ3_KD diff
LLR: -2.95 (-2.94,2.94) [0.00,4.00]
Total: 31595 W: 6378 L: 6421 D: 18796
sprt @ 10+0.1 th 1 Originally, I intended to pursue a speculative LTC. However, there are now four completely separate, long yellow [0, 4] runs, so there seems to be ample opportunity for combo/union patches. These could be combined in any way, but since there are 2 eval and 2 search patches, and those naturally group together, try those pairings first. Here, combine my best tweak_threatOnQueen3 (81K yellow, +0.98 Elo) and @xoroshiro's best kingDanger (60K yellow, +0.77 Elo).
18-05-22 31m combo_SE_asp diff
LLR: -2.95 (-2.94,2.94) [0.00,4.00]
Total: 10411 W: 2011 L: 2133 D: 6267
sprt @ 10+0.1 th 1 I hope it's OK to submit a combo, even if I didn't write either of the constituent patches, since the framework is mostly empty. (I apologize if it is not; please let me know.) Combine two recent promising yellow [0, 4] patches: seExp by @VoyagerOne (76K yellow, +0.94 Elo) and @bigpen0r's asp_tune3 (69K yellow, +0.88 Elo). Full credit to those two authors for this patch and their excellent work.
18-05-21 31m tweak_threatOnQueen3 diff
LLR: -2.96 (-2.94,2.94) [0.00,4.00]
Total: 47960 W: 9679 L: 9662 D: 28619
sprt @ 10+0.1 th 1 Before concluding that this supposed small Elo gain is relatively insensitive to the degree of increase of this value, also try half the original effect.
18-05-21 31m tweak_threatOnQueen3 diff
LLR: -2.95 (-2.94,2.94) [0.00,4.00]
Total: 74886 W: 15060 L: 14944 D: 44882
sprt @ 10+0.1 th 1 Same, but double effect instead. If I am unable to improve upon the 81K yellow result through this or other tests, I will try a speculative LTC.
18-05-21 31m tweak_threatOnQueen3^ diff
LLR: -2.96 (-2.94,2.94) [0.00,4.00]
Total: 19151 W: 3772 L: 3862 D: 11517
sprt @ 10+0.1 th 1 Fishtest continues to surprise me. What was once a 33K green, and now a 6K red, is comprised of three nearly-neutral changes and an 81K yellow, without any large regressions. Try to improve upon the single tweak that led to an 81K yellow by increasing the effect by 50%. I apologize for rescheduling a few times; I made errors in these descriptions or accidentally used [0, 5] a few times.
18-05-21 31m tweak_threatOnQueen3 diff
LLR: -2.94 (-2.94,2.94) [0.00,4.00]
Total: 81349 W: 16353 L: 16213 D: 48783
sprt @ 10+0.1 th 1 The last of the four tests. See tweak_threatOnQueen3^^^ for description.
18-05-21 31m tweak_threatOnQueen3^ diff
LLR: -2.96 (-2.94,2.94) [0.00,4.00]
Total: 36173 W: 7262 L: 7289 D: 21622
sprt @ 10+0.1 th 1 See tweak_threatOnQueen3^^^.
18-05-21 31m tweak_threatOnQueen3^^ diff
LLR: -2.95 (-2.94,2.94) [0.00,4.00]
Total: 31018 W: 6130 L: 6176 D: 18712
sprt @ 10+0.1 th 1 See tweak_threatOnQueen3^^^.
18-05-21 31m tweak_threatOnQueen3^^^ diff
LLR: -2.96 (-2.94,2.94) [0.00,4.00]
Total: 29240 W: 5801 L: 5854 D: 17585
sprt @ 10+0.1 th 1 Since the framework is nearly empty, I would like to test my theory that some subset of this became a regression recently (the past 5-6 weeks) while the rest is still positive (if yellow). To do so, I will schedule four half-throughput tests, each testing a single one of the four tweaked values. I don't expect any to pass on their own, but in particular, I'm watching to see if one fails red very quickly.
18-05-21 31m connectivity_pins diff
LLR: -2.94 (-2.94,2.94) [0.00,5.00]
Total: 7358 W: 1426 L: 1520 D: 4412
sprt @ 10+0.1 th 1 S(5, 5), to compare to the 2*Connectivity = S(6, 2) used in the best test so far (47K yellow, +1.15 Elo).
18-05-20 31m connectivity_pins diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 47614 W: 9575 L: 9484 D: 28555
sprt @ 10+0.1 th 1 Still double effect, but ensure that the pinned piece is not attacked twice or more by the opponent but only once by us.
18-05-20 31m connectivity_pins diff
LLR: -2.97 (-2.94,2.94) [0.00,5.00]
Total: 11862 W: 2389 L: 2463 D: 7010
sprt @ 10+0.1 th 1 Narrowing the double-effect version appears to have substantially improved it, so try narrowing it further: include ~attackedBy[Them][PAWN] as a condition for the pinned piece. The resulting code is inspired by the definition of "weak", but applied to friendly rather than enemy pieces.
18-05-20 31m connectivity_pins diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 10035 W: 2021 L: 2103 D: 5911
sprt @ 10+0.1 th 1 Triple effect, without the new pawn-attacked condition.
18-05-20 31m connectivity_pins diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 4739 W: 910 L: 1017 D: 2812
sprt @ 10+0.1 th 1 Triple effect, with the new pawn-attacked condition.
18-05-20 31m connectivity_pins diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 32227 W: 6450 L: 6430 D: 19347
sprt @ 10+0.1 th 1 Borrowing from @xoroshiro's tests of applying double Connectivity in certain circumstances, but applied to different ones: promote solid defense by applying Connectivity a second time for our defended king blockers. Although this is a small bonus (perhaps too small) for a one-time application, the bench appears to change by a non-trivial amount, so I'm interested in seeing what this means in practice.
18-05-20 31m connectivity_pins diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 17299 W: 3509 L: 3557 D: 10233
sprt @ 10+0.1 th 1 Currently, the result appears essentially neutral. Ensure that this isn't due to inadequate bonus by doubling the effect.
18-05-19 31m soleDefendedPin diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 25171 W: 5101 L: 5113 D: 14957
sprt @ 10+0.1 th 1 Since the recommendation from "Alt Doom" was very specific, here's an attempt to interpret and implement it precisely. S(10, 10). I exclude the queen- and rook-specific cases for now to first evaluate how this performs on its own and to minimize the number of new bonuses to optimize.
18-05-19 31m weakpin diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 10686 W: 2142 L: 2221 D: 6323
sprt @ 10+0.1 th 1 Closer to the spirit of what the user "Alt Doom" posted on the FishCooking suggestions thread and not the current Overload, namely by removing ~attackedBy2[Us] as a condition. However, this is slightly different, excluding enemy pawns and using weak rather than ~attackedBy2[Them]. Give a S(10, 10) bonus for enemy king blockers that are weak.
18-05-19 31m tweak_threatOnQueen2 diff
LLR: -2.96 (-2.94,2.94) [0.00,4.00]
Total: 9302 W: 1758 L: 1884 D: 5660
sprt @ 10+0.1 th 1 The original tests were mostly mg effects, so try that alone. -20% to ThreatByRook[QUEEN] and ThreatByMinor[QUEEN] middlegame values.
18-05-17 31m tweak_threatOnQueen2 diff
LLR: -2.96 (-2.94,2.94) [0.00,4.00]
Total: 11013 W: 2165 L: 2285 D: 6563
sprt @ 10+0.1 th 1 Double effect: -20% of TOQ. Rescheduled with correct bounds (thanks Michael Chaly!).
18-05-17 31m tweak_threatOnQueen2^ diff
LLR: -2.95 (-2.94,2.94) [0.00,4.00]
Total: 9622 W: 1868 L: 1993 D: 5761
sprt @ 10+0.1 th 1 5-6 weeks ago, increasing TOQ reliably led to long yellows or even a green. A repeat of that 33K green (+2.5 Elo) yesterday failed red in 6400 games (-7 Elo). I presume something has changed in SF to make this a regression (any ideas what?), but since increasing TOQ is now terrible, maybe decreasing TOQ is a gain. Here, -10%.
18-05-17 31m tweak_threatOnQueen2 diff
LLR: 0.93 (-2.94,2.94) [0.00,5.00]
Total: 3795 W: 795 L: 737 D: 2263
sprt @ 10+0.1 th 1 Double effect: -20% of TOQ.
18-05-17 31m overload_pinned2 diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 10750 W: 2184 L: 2263 D: 6303
sprt @ 10+0.1 th 1 S(10, 10) was a 52K yellow. Try S(15, 15), and hope it's enough.
18-05-17 31m overload_pinned2 diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 23688 W: 4766 L: 4785 D: 14137
sprt @ 10+0.1 th 1 I'm surprised but optimistic that eg > mg might be the answer I've been searching for. Try S(10, 15).