Stockfish Testing Queue

Finished - 23222 tests

26-01-15 mc skill diff
ELO: 91.38 +-6.9 (95%) LOS: 100.0%
Total: 10000 W: 6139 L: 3568 D: 293
10000 @ 10+0.05 th 1 Double skill level resolution. Measure ELO gap: Level 6 vs Level 5 (take 2)
26-01-15 lb skill diff
ELO: -23.38 +-6.3 (95%) LOS: 0.0%
Total: 10000 W: 4001 L: 4673 D: 1326
10000 @ 15+0.05 th 1 Compare simplified skill to current one: level 8
26-01-15 mc skill diff
ELO: 26.95 +-6.8 (95%) LOS: 100.0%
Total: 10000 W: 5351 L: 4577 D: 72
10000 @ 10+0.05 th 1 Double skill level resolution. Measure ELO gap: Level 1 vs Level 0 (take 2)
26-01-15 sg scale_endgame diff
LLR: -2.97 (-2.94,2.94) [-1.50,4.50]
Total: 17589 W: 3465 L: 3524 D: 10600
sprt @ 15+0.05 th 1 Scale down endgame by 13/16 (Take 2)
26-01-15 mc skill diff
ELO: 28.94 +-6.8 (95%) LOS: 100.0%
Total: 10000 W: 5384 L: 4553 D: 63
10000 @ 10+0.05 th 1 Double skill level resolution. Measure ELO gap: Level 1 vs Level 0
26-01-15 mc skill diff
ELO: 69.24 +-6.9 (95%) LOS: 100.0%
Total: 10000 W: 5940 L: 3973 D: 87
10000 @ 10+0.05 th 1 Double skill level resolution. Measure ELO gap: Level 2 vs Level 1
26-01-15 lb skill diff
ELO: -48.18 +-6.5 (95%) LOS: 0.0%
Total: 10000 W: 3862 L: 5240 D: 898
10000 @ 15+0.05 th 1 Compare simplified skill to current one: level 6
26-01-15 lb skill diff
ELO: -14.88 +-6.2 (95%) LOS: 0.0%
Total: 10000 W: 3882 L: 4310 D: 1808
10000 @ 20+0.05 th 1 Compare simplified skill to current one: level 10
26-01-15 My chk_red diff
LLR: -3.00 (-2.94,2.94) [-1.50,4.50]
Total: 12154 W: 2448 L: 2523 D: 7183
sprt @ 15+0.05 th 1 Try reducing reduction for checks, low pri.
25-01-15 mc skill diff
ELO: -81.26 +-6.8 (95%) LOS: 0.0%
Total: 10000 W: 3613 L: 5910 D: 477
10000 @ 15+0.05 th 1 Compare simplified skill to current one: level 4
26-01-15 Ro PinnedPawn diff
LLR: -2.96 (-2.94,2.94) [-1.50,4.50]
Total: 6161 W: 1169 L: 1259 D: 3733
sprt @ 15+0.05 th 1 Removing Pinned Pawns from attacks. See Git Notes.
25-01-15 mc skill diff
ELO: -96.98 +-7.0 (95%) LOS: 0.0%
Total: 10000 W: 3495 L: 6216 D: 289
10000 @ 15+0.05 th 1 Compare simplified skill to current one: level 3
25-01-15 mc skill diff
ELO: -130.82 +-7.2 (95%) LOS: 0.0%
Total: 10000 W: 3144 L: 6741 D: 115
10000 @ 10+0.05 th 1 Compare simplified skill to current one: level 1
25-01-15 mc skill diff
ELO: -116.85 +-7.1 (95%) LOS: 0.0%
Total: 10000 W: 3274 L: 6516 D: 210
10000 @ 10+0.05 th 1 Compare simplified skill to current one: level 2
26-01-15 lb skill diff
ELO: -171.16 +-7.6 (95%) LOS: 0.0%
Total: 10000 W: 2671 L: 7234 D: 95
10000 @ 9+0.03 th 1 Compare simplified skill to current one: level 0
25-01-15 mc hashfull diff
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 61432 W: 10307 L: 10251 D: 40874
sprt @ 60+0.05 th 1 LTC: Regression test for hashfull patch.
25-01-15 mc hashfull diff
LLR: 0.16 (-2.94,2.94) [0.00,6.00]
Total: 4400 W: 757 L: 730 D: 2913
sprt @ 60+0.05 th 1 LTC: Regression test for hashfull patch.
25-01-15 lb test diff
LLR: -3.72 (-2.94,2.94) [-3.00,1.00]
Total: 103036 W: 20322 L: 20713 D: 62001
sprt @ 15+0.05 th 1 seems ok in local testing
25-01-15 mc hashfull diff
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 7352 W: 1548 L: 1401 D: 4403
sprt @ 15+0.05 th 1 Regression test for hashfull patch.
24-01-15 sg fix_skill_level diff
LLR: 2.94 (-2.94,2.94) [-3.00,1.00]
Total: 117279 W: 23585 L: 23642 D: 70052
sprt @ 15+0.05 th 1 Verify the skill level fix is no regression in standard ply
24-01-15 sg fix_skill_level diff
ELO: 534.29 +-11.7 (95%) LOS: 100.0%
Total: 20000 W: 19098 L: 863 D: 39
20000 @ 15+0.05 th 1 Disable move pruning at the root node to fix the reported problem if using skill levels (test with skill level 1).
24-01-15 n_ KingSafetySPSANoise diff
ELO: -0.80 +-3.0 (95%) LOS: 30.4%
Total: 20086 W: 4007 L: 4053 D: 12026
40000 @ 15+0.05 th 1 When the big movers gave a ELO-drop. Perhaps only the "noise" will do better then the original values.
24-01-15 sg scale_endgame diff
LLR: -2.97 (-2.94,2.94) [-1.50,4.50]
Total: 48483 W: 9772 L: 9745 D: 28966
sprt @ 15+0.05 th 1 Tuning indicate my start value is already good. So test this now with sprt.
24-01-15 sg spsa_scale_endgame diff
19353/20000 iterations
39670/40000 games played
40000 @ 15+0.05 th 1 My first tuning attempt breaks eval symmetry. So i stick now to my original approach. Mea culpa.
24-01-15 ro QuickMove diff
LLR: -1.97 (-2.94,2.94) [-1.50,4.50]
Total: 21000 W: 4223 L: 4236 D: 12541
sprt @ 15+0.05 th 1 Changed the alternate move search depth.
24-01-15 jo check_extension diff
LLR: -3.26 (-2.94,2.94) [-1.50,4.50]
Total: 6308 W: 1242 L: 1343 D: 3723
sprt @ 15+0.05 th 1 Also extend checks with negative SEE if remaining depth is small.
24-01-15 n_ KingSafetySPSANoise diff
ELO: -18.57 +-18.6 (95%) LOS: 2.5%
Total: 543 W: 96 L: 125 D: 322
40000 @ 15+0.05 th 1 When the big movers only gave a ELO-drop. Perhaps only the "noise" will do better then the original values.
23-01-15 n_ KingSafetySPSAClean diff
ELO: -2.04 +-2.1 (95%) LOS: 2.7%
Total: 43000 W: 8491 L: 8744 D: 25765
40000 @ 15+0.05 th 1 Using the SPSA-values but 'cleaned the noise', i.e. only using values with at least 5% change and by at least 2. Got stopped yesterday due to wrong bench (after a few thousand games but shouldn't that be done automatically and immediately if bench differ?) but cannot find why the bench from the test branch should differ from local. So retesting this hoping it was a quirk in fishtest.
23-01-15 sg spsa_scale_endgame diff
19833/20000 iterations
40000/40000 games played
40000 @ 15+0.05 th 1 The concept seems promising so try first optimize parameters (include idea of mindbreaker)
23-01-15 sg scale_endgame diff
ELO: 1.99 +-2.5 (95%) LOS: 94.2%
Total: 30000 W: 6103 L: 5931 D: 17966
30000 @ 15+0.05 th 1 Measure effect of scaling down endgame score. Perhaps this avoids a little bit straight exchanges into endgames.
22-01-15 mc KingSafetySPSAClean diff
LLR: -1.55 (-2.94,2.94) [-1.50,4.50]
Total: 2867 W: 548 L: 596 D: 1723
sprt @ 15+0.05 th 1 Using the SPSA-values but 'cleaned the noise', i.e. only using values with at least 5% change and by at least 2.
22-01-15 n_ KingSafetySPSAClean diff
ELO: -16.40 +-41.2 (95%) LOS: 21.7%
Total: 106 W: 18 L: 23 D: 65
40000 @ 15+0.05 th 1 Using the SPSA-values but 'cleaned the noise', i.e. only using values with at least 5% change and by at least 2.
21-01-15 n_ KingSafetySPSA diff
ELO: 2.07 +-2.1 (95%) LOS: 97.3%
Total: 34859 W: 5967 L: 5759 D: 23133
30000 @ 60+0.05 th 1 The new king safety values does not look like a significant regression in STC. See if there is a clear ELO-gain in LTC.
21-01-15 n_ KingSafetySPSA50 diff
ELO: -0.31 +-2.5 (95%) LOS: 40.1%
Total: 31006 W: 6258 L: 6286 D: 18462
40000 @ 15+0.05 th 1 Try SPSA-values with 50% more change.
20-01-15 n_ KingSafetySPSA diff
ELO: 1.38 +-2.1 (95%) LOS: 90.1%
Total: 41981 W: 8529 L: 8362 D: 25090
40000 @ 15+0.05 th 1 Testing the values obtained from the SPSA-session against the branch HighKingSafety.
21-01-15 pe tm diff
LLR: -2.96 (-2.94,2.94) [-3.00,1.00]
Total: 22326 W: 3561 L: 3750 D: 15015
sprt @ 60+0.05 th 1 LTC. Remove hard stop on unchanging root moves. Take 3. Test as simplification
20-01-15 pe tm diff
LLR: 3.19 (-2.94,2.94) [-3.00,1.00]
Total: 71981 W: 14341 L: 14300 D: 43340
sprt @ 15+0.05 th 1 Remove hard stop on unchanging root moves. Take 3. Test as simplification
20-01-15 pe tm diff
ELO: 1.33 +-2.4 (95%) LOS: 85.7%
Total: 30000 W: 5884 L: 5769 D: 18347
30000 @ 15+0.05 th 1 Remove hard stop on unchanging root moves. Take 3
20-01-15 sg prune_pv diff
LLR: -2.95 (-2.94,2.94) [0.00,6.00]
Total: 9905 W: 1638 L: 1693 D: 6574
sprt @ 60+0.05 th 1 LTC: move count pruning: don't allow pruning followup moves at PV nodes (Take 3)
20-01-15 sg prune_pv diff
LLR: 2.96 (-2.94,2.94) [-1.50,4.50]
Total: 14011 W: 2890 L: 2744 D: 8377
sprt @ 15+0.05 th 1 move count pruning: don't allow pruning followup moves at PV nodes (Take 3)
20-01-15 sg prune_pv diff
LLR: -2.95 (-2.94,2.94) [-1.50,4.50]
Total: 10017 W: 1888 L: 1967 D: 6162
sprt @ 15+0.05 th 1 move count pruning: don't allow pruning killer moves at PV nodes (Take 2)
18-01-15 jk master diff
ELO: 51.71 +-1.9 (95%) LOS: 100.0%
Total: 40000 W: 9633 L: 3723 D: 26644
40000 @ 60+0.05 th 1 SF6_RC1 vs. SF5
20-01-15 sg prune_pv diff
LLR: -2.95 (-2.94,2.94) [-1.50,4.50]
Total: 19173 W: 3774 L: 3828 D: 11571
sprt @ 15+0.05 th 1 After allowing pruning at PV nodes try to excluded specific moves. move count pruning: don't allow pruning counter moves at PV nodes (Take 1)
18-01-15 jk master diff
ELO: 48.10 +-1.9 (95%) LOS: 100.0%
Total: 40000 W: 8971 L: 3468 D: 27561
40000 @ 60+0.05 th 3 SF6_RC1 vs. SF5, 3 threads
19-01-15 lb safety diff
LLR: -2.95 (-2.94,2.94) [-0.50,4.50]
Total: 28016 W: 5588 L: 5613 D: 16815
sprt @ 15+0.05 th 1 try a couple of "logic" tweaks on top of Niklas' patch
19-01-15 sg backward_rank2 diff
LLR: -2.96 (-2.94,2.94) [-1.50,4.50]
Total: 6505 W: 1230 L: 1319 D: 3956
sprt @ 15+0.05 th 1 After failed spsa tuning try i simple linear rank based penalty for backward pawns
18-01-15 n_ HighKingSafety diff
LLR: 2.96 (-2.94,2.94) [0.00,4.00]
Total: 11529 W: 2075 L: 1882 D: 7572
sprt @ 60+0.05 th 1 LTC:Tuning attempt on king safety. Using the result of the three SPSA-sessions and the ELO-gain to make a linear extrapolation. High amount of change. These values seems to best at STC,
18-01-15 lb razortune diff
44230/40000 iterations
85000/85000 games played
85000 @ 15+0.05 th 1 tune razor margins *with* the "hack". indeed the "hack" is an elo gain by itself, as shown by previous unsuccessful tuning.
18-01-15 lb razortune diff
LLR: -3.88 (-2.94,2.94) [-0.50,4.50]
Total: 22354 W: 4420 L: 4506 D: 13428
sprt @ 15+0.05 th 1 test tuned values
18-01-15 n_ ExtremeKingSafety diff
ELO: 1.50 +-2.2 (95%) LOS: 91.2%
Total: 40000 W: 8257 L: 8084 D: 23659
40000 @ 15+0.05 th 1 Tuning attempt on king safety. Using the result of the three SPSA-sessions and the ELO-gain to make a linear extrapolation. Extreme amount of change. Let see if the positive results will hold and I get to test the CrazyKingSafety.