Stockfish Testing Queue

Pending - 1 tests 0.0 hrs

19-09-17 Ala master diff
ELO: 48.68 +-6.2 (95%) LOS: 100.0%
Total: 3118 W: 640 L: 206 D: 2272
40000 @ 30+0.3 th 8 Multicore regression/progression test against SF10 after "Raise stack size to 8MB for pthreads" of September 16th.

Active - 0 tests

Finished - 212 tests

19-09-17 Ala ScaleNNB diff
LLR: -2.96 (-2.94,2.94) [0.00,4.00]
Total: 30545 W: 6687 L: 6731 D: 17127
sprt @ 10+0.1 th 1 Scale down eval in NNB vs rook (and potential pawns) endgames, to avoid throwing a win like in SF-Komodo. Doesn't change bench, and probably triggers rarely, so I'm using [0, 4] bounds to avoid the test stopping too quickly.
19-09-16 Ala KRP_vars diff
LLR: -2.95 (-2.94,2.94) [0.00,3.50]
Total: 35403 W: 5796 L: 5873 D: 23734
sprt @ 60+0.6 th 1 Test at LTC as this was tuned at LTC. Let's see if this idea shows some promise...
19-09-14 Ala KRP_varsTune diff
57695/60000 iterations
120000/120000 games played
120000 @ 60+0.6 th 1 Experimental tune : different values for a set of parameters in KRP positions. (higher variance)
19-09-13 Ala KRP_varsTune diff
4586/60000 iterations
9545/120000 games played
120000 @ 60+0.6 th 1 Experimental tune : different values for a set of parameters in KRP positions.
19-09-10 Ala EvalParamPart diff
LLR: -2.96 (-2.94,2.94) [0.00,3.50]
Total: 42449 W: 6983 L: 7041 D: 28425
sprt @ 60+0.6 th 1 LTC tune, so test the result at LTC. Partial test with king danger, passed pawns and complexity (other variables were more noisy so I want to see how this compares).
19-09-10 Ala EvalParamFull diff
LLR: -2.96 (-2.94,2.94) [0.00,3.50]
Total: 100779 W: 17320 L: 17209 D: 66250
sprt @ 60+0.6 th 1 Test at LTC since tuning was done at LTC. All tuned parameters.
19-09-02 Ala EvalParamTune diff
93188/100000 iterations
200002/200000 games played
200000 @ 160+1.6 th 1 Eval param tune. Low TP for now so it's there for the next core surge. TC equivalent to 60+0.6 @ 1.6Mnps
19-09-02 Ala EvalParamTune diff
7062/7500 iterations
14734/15000 games played
15000 @ 25+0.25 th 1 2nd STC variance sanity check with updated values after the 1st unsuccessful try.
19-08-30 Ala SeeTune diff
41556/50000 iterations
90212/100000 games played
100000 @ 160+1.6 th 1 With trivial mistake in the patch (didn't change default depth bench so I missed it) fixed, thanks to rocky for the keen eye. The tune uses nodestime, the effective TC is 60+0.6 @ 1.6Mnps
19-08-31 Ala EvalParamTune diff
7188/7500 iterations
15000/15000 games played
15000 @ 25+0.25 th 1 15K STC variance sanity check for my eval param tune
19-08-30 Ala SeeTune diff
3252/50000 iterations
7882/100000 games played
100000 @ 160+1.6 th 1 Retry with a higher variance. The tune uses nodestime, the effective TC is 60+0.6 @ 1.6Mnps
19-08-30 Ala SeeTune diff
2906/50000 iterations
6113/100000 games played
100000 @ 160+1.6 th 1 This is a LTC tune using nodestime, the real TC is equivalent to 60+0.6 @ 1.6Mnps
19-08-26 Ala master diff
ELO: 44.52 +-1.8 (95%) LOS: 100.0%
Total: 40000 W: 8056 L: 2958 D: 28986
40000 @ 30+0.3 th 8 Multicore regression/progression test against SF10 after "Tweak Late Move Reduction at root" of August 26th.
19-08-26 Ala master diff
ELO: 35.63 +-1.9 (95%) LOS: 100.0%
Total: 40000 W: 8109 L: 4021 D: 27870
40000 @ 60+0.6 th 1 Regression/progression test against SF10 after "Tweak Late Move Reduction at root" of August 26th.
19-08-15 Ala FutilityArrayTune diff
42658/45000 iterations
88801/90000 games played
90000 @ 60+0.6 th 1 The first tune results are showing some promise. Further tuning with lower max depth and with negative range bound to allow for the improving at depth 1 case to go there.
19-08-15 Ala UnchallengeableOutpost diff
LLR: -2.96 (-2.94,2.94) [0.50,4.50]
Total: 7877 W: 1683 L: 1809 D: 4385
sprt @ 10+0.1 th 1 First take
19-08-12 Ala FutilityTune diff
47121/50000 iterations
98364/100000 games played
100000 @ 60+0.6 th 1 Futility margin tuning with an array.
19-07-25 Ala master diff
ELO: 37.49 +-1.7 (95%) LOS: 100.0%
Total: 40000 W: 7524 L: 3225 D: 29251
40000 @ 30+0.3 th 8 Multicore regression/progression test against SF10 after "Tweak of SEE pruning condition" of July 25th.
19-07-27 Ala VizKDTune diff
23406/50000 iterations
49276/100000 games played
100000 @ 60+0.6 th 1 Tune for Viz
19-07-26 Ala RookBlockedBP diff
LLR: -2.95 (-2.94,2.94) [0.50,4.50]
Total: 12182 W: 2631 L: 2735 D: 6816
sprt @ 10+0.1 th 1 Test correctly against the latest master (I forgot to update it because bench didn't change).
19-07-26 Ala RookBlockedBP diff
LLR: -0.12 (-2.94,2.94) [0.50,4.50]
Total: 731 W: 156 L: 159 D: 416
sprt @ 10+0.1 th 1 First take. The pattern happens in 0.02% of positions at default bench, in 0.07% of positions with bench 128 1 20.
19-07-25 Ala BishKDTweak diff
LLR: -2.96 (-2.94,2.94) [0.50,4.50]
Total: 18071 W: 4006 L: 4081 D: 9984
sprt @ 10+0.1 th 1 1st take, +16 for same color
19-06-20 Ala master diff
ELO: 30.76 +-1.7 (95%) LOS: 100.0%
Total: 40000 W: 6994 L: 3462 D: 29544
40000 @ 30+0.3 th 8 Multicore regression/progression test against SF10 after "More bonus for free passed pawn" of June 20th.
19-06-20 Ala master diff
ELO: 24.06 +-1.8 (95%) LOS: 100.0%
Total: 40000 W: 7313 L: 4547 D: 28140
40000 @ 60+0.6 th 1 Regression/progression test against SF10 after "More bonus for free passed pawn" of June 20th.
19-06-12 Ala 31ac538f96a54b294e79213 diff
ELO: 12.70 +-1.7 (95%) LOS: 100.0%
Total: 40000 W: 5529 L: 4068 D: 30403
40000 @ 30+0.3 th 8 Scheduling this as the framework is near empty. Multicore regression/progression test against SF10 after "A combo of parameter tweaks" of December, 13th. Low TP.
19-06-09 Ala master diff
ELO: 19.87 +-1.8 (95%) LOS: 100.0%
Total: 40000 W: 7081 L: 4796 D: 28123
40000 @ 60+0.6 th 1 Regression/progression test against SF10 after " Remove depth condition for ttPv" of June 9th.
19-06-07 Ala 651450023619ddea590f301 diff
ELO: 19.77 +-1.7 (95%) LOS: 100.0%
Total: 40000 W: 6002 L: 3728 D: 30270
40000 @ 30+0.3 th 8 Collect more historical data as framework is not too busy. Multicore regression/progression test against SF10 after "Less king danger if we have a knight near by to defend it." of February, 3rd. Low TP.
19-06-08 Ala AspW3 diff
LLR: -2.95 (-2.94,2.94) [0.50,4.50]
Total: 14170 W: 3145 L: 3239 D: 7786
sprt @ 10+0.1 th 1 Take 2
19-06-08 Ala AspW6 diff
LLR: -2.94 (-2.94,2.94) [0.50,4.50]
Total: 12957 W: 2877 L: 2977 D: 7103
sprt @ 10+0.1 th 1 Take 5
19-06-08 Ala AspW5 diff
LLR: -2.95 (-2.94,2.94) [0.50,4.50]
Total: 10009 W: 2179 L: 2294 D: 5536
sprt @ 10+0.1 th 1 Take 4
19-06-08 Ala AspW4 diff
LLR: -2.95 (-2.94,2.94) [0.50,4.50]
Total: 5923 W: 1241 L: 1376 D: 3306
sprt @ 10+0.1 th 1 Take 3
19-06-08 Ala AspW2 diff
LLR: -2.94 (-2.94,2.94) [0.50,4.50]
Total: 5599 W: 1150 L: 1286 D: 3163
sprt @ 10+0.1 th 1 Tweak aspiration window (first take as an elo gainer - if none succeed I'll try a version only affecting high eval values to improve analysis).
19-05-28 Ala 5446e6f408f2ed7fa281dbe diff
ELO: 16.17 +-1.7 (95%) LOS: 100.0%
Total: 40000 W: 5747 L: 3887 D: 30366
40000 @ 30+0.3 th 8 Framework is empty, so collect some historical data. Multicore regression/progression test against SF10 after "Remove pvExact" of January, 10th
19-05-15 Ala master diff
ELO: 28.93 +-1.7 (95%) LOS: 100.0%
Total: 40000 W: 6896 L: 3573 D: 29531
40000 @ 30+0.3 th 8 Multicore regression/progression test against SF10 after "Update failedHighCnt rule" of May 15th. Low TP.
19-05-15 Ala master diff
ELO: 19.76 +-1.8 (95%) LOS: 100.0%
Total: 40000 W: 6938 L: 4665 D: 28397
40000 @ 60+0.6 th 1 Regression/progression test against SF10 after "Update failedHighCnt rule" of May 15th.
19-05-06 Ala BadOutpost diff
LLR: -2.95 (-2.94,2.94) [0.50,4.50]
Total: 5647 W: 1205 L: 1342 D: 3100
sprt @ 10+0.1 th 1 Rate a knight outpost depending on how many enemy pieces (non-pawn) are close to it. First attempt with guessed values.
19-05-05 Ala TrappedRookPlus diff
LLR: -2.95 (-2.94,2.94) [0.50,4.50]
Total: 17225 W: 3780 L: 3859 D: 9586
sprt @ 10+0.1 th 1 If a rook on rank 1 is completely surrounded by friendly pieces, consider its mobility to be 0.
19-05-03 Ala BishopPawnShelter diff
LLR: -2.95 (-2.94,2.94) [0.50,4.50]
Total: 15188 W: 3377 L: 3466 D: 8345
sprt @ 10+0.1 th 1 Fixed version
19-05-03 Ala BishopPawnShelter diff
LLR: -2.96 (-2.94,2.94) [0.50,4.50]
Total: 10289 W: 2227 L: 2341 D: 5721
sprt @ 10+0.1 th 1 Bench is the same at default depth, but it changes at higher depth.
19-04-29 Ala GlobalMobility diff
LLR: -1.69 (-2.94,2.94) [0.50,4.50]
Total: 421 W: 63 L: 158 D: 200
sprt @ 10+0.1 th 1 The idea behind this test is to not only evaluate each piece mobility separately, but to look at all the mobilities together to better detect cramped positions. First crude attempt.
19-04-24 Ala master diff
ELO: 16.39 +-1.8 (95%) LOS: 100.0%
Total: 40000 W: 6519 L: 4634 D: 28847
40000 @ 60+0.6 th 1 Regression/progression test against SF10 after "Remove useless initializations" of April 24th.
19-04-19 Ala 82ad9ce9cfb0eff33f1d781 diff
ELO: 24.33 +-1.7 (95%) LOS: 100.0%
Total: 40000 W: 6430 L: 3633 D: 29937
40000 @ 30+0.3 th 8 As the framework is empty, try out a multicore regression test (same commit as last regular regression test) as suggested in issue #2094. Very low TP.
19-04-07 Ala KDconversionTune diff
9676/10000 iterations
20000/20000 games played
20000 @ 10+0.1 th 1 STC verification for a new set of improved variance parameters.
19-04-07 Ala KDconversionTune diff
1343/10000 iterations
2779/20000 games played
20000 @ 10+0.1 th 1 The conversion formula from king danger to mg and eg scores is somewhat arbitrary. This tune is meant to check the resilience of the kd->score mapping, and if the values deviate significantly, to guide future attempts at new formulas. This is a STC verification for variance parameters, as it is critical that the values move slowly enough to not mess up ordering, but that they still move.
19-04-06 Ala CenterBlockPawns diff
LLR: -1.78 (-2.94,2.94) [0.00,3.50]
Total: 62974 W: 10958 L: 10883 D: 41133
sprt @ 60+0.6 th 1 This was tuned at LTC and ended up neutral at STC. Spec LTC, low TP.
19-04-06 Ala CenterBlockPawns diff
LLR: -2.95 (-2.94,2.94) [0.50,4.50]
Total: 32508 W: 7235 L: 7238 D: 18035
sprt @ 10+0.1 th 1 Test tune results
19-04-02 Ala CenterBlockPawnsTune diff
71115/75000 iterations
148293/150000 games played
150000 @ 60+0.6 th 1 Variance lowered after STC test indicating it should still allow values to move. Very low TP to start to not slow down the current spec LTCs, I may adapt later depending on fishtest load.
19-04-02 Ala CenterBlockPawnsTune diff
6700/7500 iterations
14263/15000 games played
15000 @ 10+0.1 th 1 This short STC tuning run is meant to sanity-check the variance values before running a long LTC tune,
19-04-01 Ala master diff
ELO: 16.58 +-1.8 (95%) LOS: 100.0%
Total: 40000 W: 6649 L: 4742 D: 28609
40000 @ 60+0.6 th 1 Regression/progression test against SF10 after "Assorted trivial cleanups 3/2019" of March 31st.
19-03-14 Ala VizMinorPawnTune diff
48030/50000 iterations
100000/100000 games played
100000 @ 30+0.3 th 1 Restart tune for Viz with higher variance as values moved less than expected