Stockfish Testing Queue

Finished - 24151 tests

21-10-14 pr 3fold_1stMove diff
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 89417 W: 18169 L: 18175 D: 53073
sprt @ 15+0.05 th 1 fixing draw score on first repetition without affecting the search
20-10-14 jo previousDepth diff
ELO: -68.52 +-4.6 (95%) LOS: 0.0%
Total: 10000 W: 1337 L: 3284 D: 5379
10000 @ 5+0.05 th 1 do we not measure noise this time?
20-10-14 Ro KS_Corner_3 diff
LLR: -2.95 (-2.94,2.94) [-1.50,4.50]
Total: 10814 W: 2135 L: 2212 D: 6467
sprt @ 15+0.05 th 1 KS_Corner_3_1: only test that showed some ELO gain. Testing if was only by luck before testing at LTC
20-10-14 jo previousDepth diff
ELO: -0.02 +-3.1 (95%) LOS: 49.6%
Total: 19895 W: 4055 L: 4056 D: 11784
20000 @ 15+0.05 th 1 even more aggressive settings...
20-10-14 jo previousDepth diff
ELO: -0.02 +-3.1 (95%) LOS: 49.5%
Total: 18995 W: 3840 L: 3841 D: 11314
20000 @ 15+0.05 th 1 previousDepth with more aggressive settings.
20-10-14 Fi TTpolicy diff
ELO: -0.78 +-3.1 (95%) LOS: 30.9%
Total: 20000 W: 4052 L: 4097 D: 11851
20000 @ 15+0.05 th 1 Make TT replacement policy more symmetric and discerning of generations always saving the new entry first. Tuning 3
20-10-14 pe tm_shorter_book diff
LLR: -2.96 (-2.94,2.94) [0.00,4.00]
Total: 36462 W: 6208 L: 6244 D: 24010
sprt @ 60+0.05 th 1 STC. Now TM changes are done versus shorter book. But after change to shorter book, number of moves per game made by engine increased, and results of current tm tests may be influenced by this. So check if relevant parameter is still optimal.
20-10-14 sg backward2 diff
LLR: -0.87 (-2.94,2.94) [-1.50,4.50]
Total: 7324 W: 1475 L: 1486 D: 4363
sprt @ 15+0.05 th 1 less penalty if backward and stopper pawn far away
20-10-14 sg backward1 diff
LLR: -2.95 (-2.94,2.94) [-1.50,4.50]
Total: 23182 W: 4745 L: 4787 D: 13650
sprt @ 15+0.05 th 1 Double penalty for very weak backward pawns (stopped by two enemy pawns)
16-10-14 jh sp_nodes diff
LLR: 4.93 (-2.94,2.94) [-3.00,1.00]
Total: 118661 W: 21899 L: 21842 D: 74920
sprt @ 15+0.05 th 3 Try a small simplification.
19-10-14 pe tm_shorter_book diff
LLR: 2.95 (-2.94,2.94) [-1.50,4.50]
Total: 24191 W: 5004 L: 4829 D: 14358
sprt @ 15+0.05 th 1 STC. Now TM changes are done versus shorter book. But after change to shorter book, number of moves per game made by engine increased, and results of current tm tests may be influenced by this. So check if relevant parameter is still optimal.
19-10-14 jo previousDepth diff
ELO: 2.43 +-3.1 (95%) LOS: 93.8%
Total: 20000 W: 4204 L: 4064 D: 11732
20000 @ 15+0.05 th 1 teststuff
19-10-14 fw tm_depthbased_simplifie diff
ELO: -0.43 +-3.0 (95%) LOS: 39.0%
Total: 20000 W: 3994 L: 4019 D: 11987
20000 @ 15+0.05 th 1 Simplification (functionally equivalent to version that passed)
17-10-14 lb master diff
ELO: 27.58 +-1.9 (95%) LOS: 100.0%
Total: 40000 W: 7782 L: 4613 D: 27605
40000 @ 60+0.05 th 1 Regression test, standard conditions. Previous one 22.80 +-1.9
19-10-14 sn threats diff
LLR: -2.95 (-2.94,2.94) [-1.50,4.50]
Total: 26864 W: 5468 L: 5500 D: 15896
sprt @ 15+0.05 th 1 Take 3 : use maximum threat idea for hanging pieces.
19-10-14 fw tm_depthbased_simplifie diff
ELO: 0.10 +-3.0 (95%) LOS: 52.7%
Total: 20000 W: 4001 L: 3995 D: 12004
20000 @ 15+0.05 th 1 Simplification (slightly more aggressive than version that passed)
19-10-14 Fi TTpolicy diff
LLR: -2.95 (-2.94,2.94) [0.00,6.00]
Total: 39354 W: 6835 L: 6744 D: 25775
sprt @ 60+0.05 th 1 Make TT replacement policy more symmetric and discerning of generations. Tuning 1. Pri -1 Also got +3 in a local 10k test so let's see.
18-10-14 Fi TTpolicy diff
ELO: -0.02 +-3.1 (95%) LOS: 49.6%
Total: 20000 W: 4107 L: 4108 D: 11785
20000 @ 15+0.05 th 1 Make TT replacement policy more symmetric and discerning of generations. Tuning 2. Pri -1
19-10-14 Fi TTpolicy diff
LLR: 2.96 (-2.94,2.94) [-1.50,4.50]
Total: 36129 W: 7471 L: 7262 D: 21396
sprt @ 15+0.05 th 1 Make TT replacement policy more symmetric and discerning of generations. Tuning 1. Pri -1 Also got +3 in a local 10k test so let's see.
18-10-14 lb threats diff
ELO: -5.90 +-2.3 (95%) LOS: 0.0%
Total: 40000 W: 8704 L: 9383 D: 21913
40000 @ 9+0.03 th 1 take 3 = take 2 + spsa tuned values (40k iterations)
18-10-14 sn threats diff
LLR: -2.97 (-2.94,2.94) [-1.50,4.50]
Total: 3390 W: 630 L: 728 D: 2032
sprt @ 15+0.05 th 1 Take 2 : add some multithreats ideas
18-10-14 Fi TTpolicy diff
ELO: 1.22 +-3.1 (95%) LOS: 78.1%
Total: 20000 W: 4108 L: 4038 D: 11854
20000 @ 15+0.05 th 1 Make TT replacement policy more symmetric and discerning of generations. Tuning 1. Pri -1
17-10-14 lb threats diff
ELO: -8.71 +-2.3 (95%) LOS: 0.0%
Total: 40000 W: 8489 L: 9491 D: 22020
40000 @ 9+0.03 th 1 how far are we now?
17-10-14 jo delay_aspiration diff
LLR: -2.96 (-2.94,2.94) [-1.50,4.50]
Total: 46904 W: 9582 L: 9558 D: 27764
sprt @ 15+0.05 th 1 Reset aspiration window two iterations later.
17-10-14 sn threats diff
LLR: -2.96 (-2.94,2.94) [0.00,6.00]
Total: 42947 W: 7347 L: 7241 D: 28359
sprt @ 60+0.05 th 1 LTC: get rid of lsb() in threat evaluation by calculating maximum threat of each type
17-10-14 lb threats diff
ELO: -12.04 +-2.3 (95%) LOS: 0.0%
Total: 40000 W: 8364 L: 9750 D: 21886
40000 @ 9+0.03 th 1 how far do we get with that very crude/untuned replacement ?
17-10-14 lb threats^^ diff
ELO: -19.16 +-2.4 (95%) LOS: 0.0%
Total: 37824 W: 7654 L: 9738 D: 20432
40000 @ 9+0.03 th 1 what is the value of threats ?
17-10-14 fw timemanagement_depthbas diff
LLR: 2.97 (-2.94,2.94) [0.00,6.00]
Total: 27110 W: 4707 L: 4472 D: 17931
sprt @ 60+0.05 th 1 using interpolated values between initial and tuned values as a conservative estimate. This makes sense as values did not converge.
17-10-14 My pawn_checks diff
LLR: -2.96 (-2.94,2.94) [-1.50,4.50]
Total: 14824 W: 2993 L: 3059 D: 8772
sprt @ 15+0.05 th 1 Safe pawn checks
17-10-14 fw timemanagement_depthbas diff
LLR: 2.96 (-2.94,2.94) [-1.50,4.50]
Total: 11562 W: 2421 L: 2281 D: 6860
sprt @ 15+0.05 th 1 using interpolated values between initial and tuned values as a conservative estimate. This makes sense as values did not converge.
16-10-14 aj multithreats_D diff
LLR: -2.95 (-2.94,2.94) [-1.50,4.50]
Total: 47512 W: 9828 L: 9801 D: 27883
sprt @ 15+0.05 th 1 Multithreats, final attempt :STC
17-10-14 sn threats diff
LLR: 2.97 (-2.94,2.94) [-1.50,4.50]
Total: 10077 W: 2160 L: 2023 D: 5894
sprt @ 15+0.05 th 1 Get rid of lsb() in threat evaluation by calculating maximum threat of each type
16-10-14 fw timemanagement_depthbas diff
LLR: 2.96 (-2.94,2.94) [-1.50,4.50]
Total: 33329 W: 6897 L: 6696 D: 19736
sprt @ 15+0.05 th 1 tuned values
16-10-14 sn exchange diff
LLR: -2.96 (-2.94,2.94) [-1.50,4.50]
Total: 12342 W: 2488 L: 2561 D: 7293
sprt @ 15+0.05 th 1 Consider explicitly threats of winning an exchange : take 2
16-10-14 sn exchange diff
LLR: -2.97 (-2.94,2.94) [-1.50,4.50]
Total: 15648 W: 3191 L: 3255 D: 9202
sprt @ 15+0.05 th 1 Consider explicitly threats of winning an exchange : take 1
16-10-14 fw timemanagement_depthbas diff
9897/10000 iterations
20000/20000 games played
20000 @ 15+0.05 th 1 tuning with more variables starting at the optimum that was reached with fewer variables.
16-10-14 ur more_time_for_change_po diff
LLR: -2.95 (-2.94,2.94) [-1.50,4.50]
Total: 18014 W: 3609 L: 3666 D: 10739
sprt @ 15+0.05 th 1 Another change in time management and this time not easy move but simply using more time in case of changing the ponder move and less time if the ponder move is the same.
16-10-14 sn king_support diff
LLR: -2.95 (-2.94,2.94) [-1.50,4.50]
Total: 15724 W: 3192 L: 3255 D: 9277
sprt @ 15+0.05 th 1 King support and rule of square for passed pawns: last try, with tuned values.
16-10-14 lb outpost diff
LLR: -2.95 (-2.94,2.94) [-3.00,1.00]
Total: 17662 W: 3073 L: 3259 D: 11330
sprt @ 60+0.05 th 1 LTC for Darius: Check ELO of better tuned outpost values. If results are very bad, I will probably abandon this simplification idea,
16-10-14 lb outpost diff
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 44464 W: 9330 L: 9252 D: 25882
sprt @ 15+0.05 th 1 STC for Darius: Check ELO of better tuned outpost values. If results are very bad, I will probably abandon this simplification idea,
16-10-14 aj multithreats_C diff
LLR: -2.96 (-2.94,2.94) [-1.50,4.50]
Total: 9842 W: 1912 L: 1992 D: 5938
sprt @ 15+0.05 th 1 Don't compute the same threat twice while computing multithreats : STC
11-10-14 lu instant_mover diff
ELO: -2.08 +-4.4 (95%) LOS: 17.5%
Total: 10000 W: 2026 L: 2086 D: 5888
10000 @ 15+0.05 th 1 naive try at hyper-fast move following a ponder hit, depending on time consumption and branching factor of the previous one: quick measurement at STC, very low pri
14-10-14 lu history_aware_timemanag diff
LLR: -2.97 (-2.94,2.94) [-1.50,4.50]
Total: 2290 W: 425 L: 527 D: 1338
sprt @ 15+0.05 th 1 Take into account some stats from previous move when determining current move's importance (fixed): take 1
13-10-14 do outpost diff
LLR: -2.96 (-2.94,2.94) [-3.00,1.00]
Total: 87652 W: 15339 L: 15642 D: 56671
sprt @ 60+0.05 th 1 LTC: Outpost simplification with locally tuned parameters
15-10-14 fw easyMove@zeroCost6 diff
9914/10000 iterations
20000/20000 games played
20000 @ 15+0.05 th 1 UCI Options added CriticalNumberOfStableMoves is a measure of the required Stability of Preferred Move (as a function of Depth) Nextness is a measured, when the mechanism starts to get applied as a function of depth vs Previous Depth
15-10-14 ur king_safety_tune diff
ELO: -4.97 +-2.8 (95%) LOS: 0.0%
Total: 25584 W: 5463 L: 5829 D: 14292
40000 @ 7.5+0.05 th 1 testing at fixed number of games to evaluate the value of the change.
15-10-14 lb outpost diff
ELO: -4.49 +-2.3 (95%) LOS: 0.0%
Total: 40000 W: 8768 L: 9285 D: 21947
40000 @ 9+0.03 th 1 step 2. simplified outposts, untuned. do we partially bridge the elo gap? (if not stop there, if yes, tune).
15-10-14 lb outpost^ diff
ELO: -2.31 +-2.3 (95%) LOS: 2.4%
Total: 40000 W: 8881 L: 9147 D: 21972
40000 @ 9+0.03 th 1 step 1: measure outposts
15-10-14 sn king_support diff
9844/10000 iterations
20000/20000 games played
20000 @ 15+0.05 th 1 Tune parameters for king support and rule of square
15-10-14 do outpost diff
ELO: -0.19 +-2.3 (95%) LOS: 43.5%
Total: 40000 W: 9125 L: 9147 D: 21728
40000 @ 7+0.05 th 1 Check ELO of better tuned outpost values. If results are very bad, I will probably abandon this simplification idea,