Stockfish Testing Queue

Finished - 977 tests

18-04-22 31m ocb_strongPasser diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 2801 W: 479 L: 593 D: 1729
sprt @ 10+0.1 th 1 Similar to the most recent attempt, but try a slower rate of rise.
18-04-21 31m ocb_strongPasser^ diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 34585 W: 7002 L: 6970 D: 20613
sprt @ 10+0.1 th 1 +4 was better than +2, so maybe there's a pattern: try +6 to the scale factor for each passer whose promotion square the weak side's bishop cannot defend.
18-04-21 31m ocb_strongPasser diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 12936 W: 2652 L: 2720 D: 7564
sprt @ 10+0.1 th 1 Also try +8.
18-04-21 31m ocb_rooks diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 19813 W: 3965 L: 4002 D: 11846
sprt @ 10+0.1 th 1 +10 was better than +5, but there's not much room to continue increasing the scale factor (+10 is 56). Out of curiosity, let's try 64: do not scale down OCB endings at all if both sides have rooks. (I'll rewrite this more cleanly if it passes.)
18-04-21 31m ocb_rooks^^ diff
LLR: -2.97 (-2.94,2.94) [0.00,5.00]
Total: 30832 W: 6160 L: 6147 D: 18525
sprt @ 10+0.1 th 1 Take 2: +10 to the scale factor if both sides have rooks.
18-04-21 31m ocb_rooks^^^ diff
LLR: -2.97 (-2.94,2.94) [0.00,5.00]
Total: 21034 W: 4189 L: 4221 D: 12624
sprt @ 10+0.1 th 1 Wikipedia makes the interesting claim that, as opposed to other non-KBP material, "If each side has a rook in addition to the bishop, the stronger side has many more winning prospects." This shouldn't be difficult to test, but choosing the correct scale factors is difficult. I would like to schedule some SPRTs to get an idea for the correct values, or whether there is any Elo to be gained at all. Take 1: +5 to the scale factor if both sides have rooks.
18-04-21 31m ocb_rooks^ diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 19814 W: 3963 L: 4000 D: 11851
sprt @ 10+0.1 th 1 Take 3: +5 to the scale factor if both sides have rooks, otherwise -5.
18-04-21 31m ocb_rooks diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 10305 W: 2039 L: 2120 D: 6146
sprt @ 10+0.1 th 1 Take 4: +10 to the scale factor if both sides have rooks, otherwise -5.
18-04-21 31m ocb_strongPasser diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 22539 W: 4573 L: 4597 D: 13369
sprt @ 10+0.1 th 1 More aggressive version: +4 rather than +2.
18-04-21 31m ocb_strongPasser diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 19617 W: 3906 L: 3944 D: 11767
sprt @ 10+0.1 th 1 Similar to take 1 (the best result so far), but do nothing if the weak side has a passed pawn.
18-04-21 31m ocb_strongPasser diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 11373 W: 2318 L: 2394 D: 6661
sprt @ 10+0.1 th 1 Instead of all-or-none +5 to scale factor, apply +2 for each passed pawn which meets the condition. Apply in OCB endgames with non-KBP material only.
18-04-20 31m ocb_strongPasser diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 19656 W: 4071 L: 4107 D: 11478
sprt @ 10+0.1 th 1 Expected to fail, but try sf = strongPassedPawn ? 31 : 26 (that is, reduce scale factor by 5 for some KBP-only OCB endings).
18-04-20 31m ocb_strongPasser diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 8805 W: 1717 L: 1805 D: 5283
sprt @ 10+0.1 th 1 I'm not sure I expect this to work, but try the opposite: only increase scale factor in opposite-colored KBP endgames (i.e., no knights, rooks, or queens). sf = strongPassedPawn ? 36 : 31
18-04-20 31m ocb_strongPasser diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 28043 W: 5715 L: 5713 D: 16615
sprt @ 10+0.1 th 1 Apply take 1, but only when there are pieces on the board other than kings, bishops, and pawns. sf = strongPassedPawn ? 51 : 46
18-04-20 31m ocb_strongPasser diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 10296 W: 2022 L: 2103 D: 6171
sprt @ 10+0.1 th 1 Take 5. Take 1, with +20% rather than +10%.
18-04-20 31m ocb_strongPasser diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 22287 W: 4509 L: 4534 D: 13244
sprt @ 10+0.1 th 1 A more aggressive version of the last test, only when there are pieces on the board other than kings, bishops, and pawns. sf = strongPassedPawn ? 56 : 46
18-04-20 31m ocb_strongPasser^^ diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 35879 W: 7419 L: 7379 D: 21081
sprt @ 10+0.1 th 1 I know a lot of ideas have been tested in this area, so I apologize if I'm inadvertently repeating someone else's idea. If the strong side has a passer whose promotion square the weak side's bishop cannot defend, boost the scale factor by 10%.
18-04-20 31m ocb_strongPasser^ diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 18735 W: 3757 L: 3799 D: 11179
sprt @ 10+0.1 th 1 Take 2: also reduce scale factor by 10% if the condition is not true.
18-04-20 31m ocb_strongPasser diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 19907 W: 4007 L: 4043 D: 11857
sprt @ 10+0.1 th 1 Take 4: Similar to take 2, but with a key difference. Take 2 also reduces the scale factor if the strong side has no passers; this patch does not.
18-04-20 31m ocb_strongPasser diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 9928 W: 1975 L: 2058 D: 5895
sprt @ 10+0.1 th 1 Decrease the scale factor by 10% if all strong side passers promote on squares the weak side's bishop can defend, or if there are no strong side passers. I expect this test to fail, but it will inform further development. I am also considering applying these tests only to the case where there is non-bishop material also on the board, but I would like to see the results of these tests first.
18-04-20 31m rookOpenFiles2^ diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 22387 W: 4548 L: 4572 D: 13267
sprt @ 10+0.1 th 1 Reviving the best attempt of my first test. I've learned since then how to check an average value across bench positions. This patch changed the evaluation of a side's rooks 14 cp on average, or 37 cp when only one side had rooks. I'm no longer surprised it failed. Adjust the base rook values, middlegame and endgame, down to compensate (here, -37).
18-04-20 31m rookOpenFiles2 diff
LLR: -2.97 (-2.94,2.94) [0.00,5.00]
Total: 20037 W: 4054 L: 4090 D: 11893
sprt @ 10+0.1 th 1 Take 2: Similar to the last test, but adjust rook base values by -14 cp instead.
18-04-19 31m overload_extraQ3 diff
LLR: -2.99 (-2.94,2.94) [0.00,5.00]
Total: 15877 W: 3251 L: 3307 D: 9319
sprt @ 10+0.1 th 1 I was surprised that the most recent test in this branch actually performed rather well. Try a larger reduction in the base queen middlegame value (-25).
18-04-19 31m overload_extraQ3 diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 49750 W: 10105 L: 10003 D: 29642
sprt @ 10+0.1 th 1 The previous version was unexpectedly terrible, so as a sanity check, try the opposite change to the queen middlegame value.
18-04-19 31m combo_TOQ_captLMR diff
LLR: -2.94 (-2.94,2.94) [0.00,4.00]
Total: 19985 W: 4013 L: 4099 D: 11873
sprt @ 10+0.1 th 1 Combo of two unrelated [0, 4] patches that appeared promising at STC, but failed yellow after very long LTC runs. tweak_threatOnQueen passed STC in 33K games, then failed yellow at LTC in 105K games. @jerrydonaldwatson's captLMR passed STC in 32K games, then failed yellow at LTC in 151K games.
18-04-18 31m overload_extraQ3 diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 5361 W: 989 L: 1092 D: 3280
sprt @ 10+0.1 th 1 Try to revive this 48.5K yellow patch. Since the change applies a substantial middlegame penalty to overloaded queens, compensate by increasing the base middlegame value of the queen.
18-04-14 31m overload_ConnectedR diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 3953 W: 760 L: 871 D: 2322
sprt @ 10+0.1 th 1 Since the queue is empty, here's a quick sketch of an idea. Exclude the enemy's connected rooks from the overload calculation.
18-04-13 31m rookOnOnlyOpen diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 6908 W: 1376 L: 1473 D: 4059
sprt @ 10+0.1 th 1 Take 2: smaller bonus, S(5, 5).
18-04-13 31m rookOnOnlyOpen diff
LLR: -2.97 (-2.94,2.94) [0.00,5.00]
Total: 13810 W: 2770 L: 2835 D: 8205
sprt @ 10+0.1 th 1 Based on a comment by Bryan in the forum, although I've modified the idea slightly. Extra S(10, 5) bonus for a rook that is not only on an open file, but on the only open file. The idea is that this file is especially valuable.
18-04-10 31m combo_TOQ_DP diff
LLR: -2.95 (-2.94,2.94) [0.00,4.00]
Total: 35179 W: 7195 L: 7224 D: 20760
sprt @ 10+0.1 th 1 @Rocky640 recommended waiting for another promising yellow parameter patch and then submitting a combo. @xoto10's tweak looked very promising at STC but was more or less neutral at LTC; I'm curious to see how the combination with threatOnQueen performs at STC.
18-04-10 31m tune_threatOnQueen diff
29589/30000 iterations
59608/60000 games played
60000 @ 20+0.2 th 1 Previous tuning did not converge, but showed significant trends. Restrict tuning to the two variables that had meaningful changes and continue tuning.
18-04-10 31m tweak_threatOnQueen diff
LLR: -2.95 (-2.94,2.94) [0.00,4.00]
Total: 110808 W: 22599 L: 22347 D: 65862
sprt @ 10+0.1 th 1 The (hopefully) good news: SPSA has resulted in steady, consistent changes in the parameters. The bad news: it hasn't converged yet. Let's test what we have so far. I would advocate for holding off on further LTC tests of these tweaks for now (if the current one fails)--almost all the tests have required huge numbers of games to fail, and we've already spent more than 288,000 (and counting) expensive LTC games. Let's try to find the best possible STC first. I think repeated SPSA sessions until convergence give us the best possible chance, and this should actually be more efficient than several 100K yellow LTC SPRTs.
18-04-09 31m tune_threatOnQueen diff
27393/30000 iterations
55132/60000 games played
60000 @ 20+0.2 th 1 I think I've been overthinking this. +50% of Hanging was seemingly an improvement, but there's no reason further tweaks need to be proportional to Hanging. Return to the 50% test (STC 33K green, LTC 105K yellow) for initial parameters, then tune with SPSA.
18-04-09 31m tweak_threatOnQueen diff
LLR: -2.96 (-2.94,2.94) [0.00,4.00]
Total: 56865 W: 11613 L: 11562 D: 33690
sprt @ 10+0.1 th 1 I remain convinced that there is Elo to be gained here. Try a smaller reduction in base values (-5).
18-04-09 31m tweak_threatOnQueen diff
LLR: -2.95 (-2.94,2.94) [0.00,4.00]
Total: 27093 W: 5400 L: 5460 D: 16233
sprt @ 10+0.1 th 1 Since modifying ThreatByMinor[QUEEN] alone (50%) was decent, try applying a larger bonus (66%) to ThreatByMinor[QUEEN] only.
18-04-09 31m tweak_threatOnQueen diff
LLR: -2.95 (-2.94,2.94) [0.00,4.00]
Total: 66458 W: 13556 L: 13469 D: 39433
sprt @ 10+0.1 th 1 A milder, more spread-out version of the same idea: compensate by reducing base minor and rook values (middle and endgame) by 10.
18-04-09 31m tweak_threatOnQueen diff
LLR: -2.95 (-2.94,2.94) [0.00,4.00]
Total: 14426 W: 2870 L: 2977 D: 8579
sprt @ 10+0.1 th 1 As far as I can tell, the green STC substantially changes the average static eval (thanks @Stefano80 for teaching me how to inspect this). For some reason, subtracting the main tweak from the rook base value restores the average bench evaluation to roughly the same as master. Perhaps this is an improvement (or perhaps terrible).
18-04-08 31m tweak_threatOnQueen diff
LLR: -2.96 (-2.94,2.94) [0.00,4.00]
Total: 97344 W: 19739 L: 19539 D: 58066
sprt @ 10+0.1 th 1 +50% of Hanging, ThreatByMinor[QUEEN] only.
18-04-08 31m tweak_threatOnQueen diff
LLR: -2.95 (-2.94,2.94) [0.00,4.00]
Total: 116184 W: 23799 L: 23526 D: 68859
sprt @ 10+0.1 th 1 Take 2 LTC failed yellow after 105815 games...very close to passing. Take 3: +66% of Hanging.
18-04-08 31m tweak_threatOnQueen diff
LLR: -2.95 (-2.94,2.94) [0.00,4.00]
Total: 34902 W: 7148 L: 7178 D: 20576
sprt @ 10+0.1 th 1 +50% of Hanging, ThreatByRook[QUEEN] only.
18-04-08 31m tweak_threatOnQueen diff
LLR: -2.95 (-2.94,2.94) [0.00,4.00]
Total: 28768 W: 5779 L: 5833 D: 17156
sprt @ 10+0.1 th 1 As discussed with @Stefano80, here's the first of 4 variants of the 50% test. Middlegame only.
18-04-08 31m tweak_threatOnQueen diff
LLR: -2.95 (-2.94,2.94) [0.00,4.00]
Total: 16275 W: 3271 L: 3371 D: 9633
sprt @ 10+0.1 th 1 +50% of Hanging, endgame only.
18-04-08 31m tweak_threatOnQueen diff
LLR: -2.95 (-2.94,2.94) [0.00,4.00]
Total: 39713 W: 8121 L: 8133 D: 23459
sprt @ 10+0.1 th 1 Take 4: +33% of Hanging.
18-04-08 31m tweak_threatOnQueen diff
LLR: -2.95 (-2.94,2.94) [0.00,4.00]
Total: 69622 W: 14216 L: 14117 D: 41289
sprt @ 10+0.1 th 1 I apologize for submitting so many tests; lower priority on this one if necessary. Based on the results of hangingQueen and my discussion with @Rocky640 (see GitHub), try increasing ThreatByMinor[QUEEN] and ThreatByRook[QUEEN]. For now, a modest increase (compared to hangingQueen): 25% of the Hanging bonus.
18-04-08 31m tweak_threatOnQueen diff
LLR: -2.96 (-2.94,2.94) [0.00,4.00]
Total: 105815 W: 15856 L: 15698 D: 74261
sprt @ 60+0.6 th 1 LTC for passed take 2. Increase ThreatByMinor[QUEEN] and ThreatByRook[QUEEN] by 50% of the Hanging bonus. See discussion with @Rocky640 on the hangingQueen test.
18-04-08 31m tweak_threatOnQueen diff
LLR: -2.95 (-2.94,2.94) [0.00,4.00]
Total: 9146 W: 1804 L: 1931 D: 5411
sprt @ 10+0.1 th 1 Take 3: +75% of Hanging.
18-04-08 31m tweak_threatOnQueen diff
LLR: 2.95 (-2.94,2.94) [0.00,4.00]
Total: 33782 W: 6919 L: 6634 D: 20229
sprt @ 10+0.1 th 1 Since take 1 (+25% of Hanging) is performing reasonably well for a first attempt, here's take 2: +50%.
18-04-08 31m overload_extraQ2 diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 32647 W: 6616 L: 6593 D: 19438
sprt @ 10+0.1 th 1 It seems logical to cross my test with @Rocky640's. Use more_than_one from his test and S(25, 0) from mine.
18-04-08 31m overload_extraQ2 diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 23027 W: 4664 L: 4686 D: 13677
sprt @ 10+0.1 th 1 @Rocky640 and I apparently both had the idea to submit all-or-none queen overload within a few minutes of each other! Here's the version I was working on submitting. It differs by giving S(25, 0) and using bool() rather than more_than_one(). Includes some cleanup based on @Rocky640's code.
18-04-08 31m overload_extraQ diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 15639 W: 3152 L: 3208 D: 9279
sprt @ 10+0.1 th 1 Now that the bonus is applied in fewer cases (due to more_than_one), try increasing the bonus to S(30, 0) again.