Stockfish Testing Queue

Pending - 0 tests 0.0 hrs

None

Active - 0 tests

Finished - 258 tests

19-11-04 Ala master diff
ELO: 52.90 +-1.8 (95%) LOS: 100.0%
Total: 40000 W: 8645 L: 2601 D: 28754
40000 @ 30+0.3 th 8
Multicore regression/progression test against SF10 after "Rook PSQT Tuned" of November 5th.
19-11-04 Ala master diff
ELO: 42.20 +-1.9 (95%) LOS: 100.0%
Total: 40000 W: 8521 L: 3686 D: 27793
40000 @ 60+0.6 th 1
Regression/progression test against SF10 after "Rook PSQT Tuned" of November 5th.
19-10-21 Ala KS_ES_Tune diff
7116/7500 iterations
14842/15000 games played
15000 @ 10+0.1 th 1
Tune inspired by Bryan's suggestion of using safe escape squares to modulate safe check penalty. Short STC tune to sanity-check the variance.
19-10-19 Ala KSTunedFull diff
LLR: -2.94 (-2.94,2.94) [0.00,3.50]
Total: 34468 W: 5560 L: 5640 D: 23268
sprt @ 60+0.6 th 1
I don't really expect this to pass, but as this was tuned at LTC, let's see how it goes... Low TP.
19-10-19 Ala KSTunedFull diff
LLR: -2.96 (-2.94,2.94) [0.50,4.50]
Total: 11509 W: 2448 L: 2556 D: 6505
sprt @ 10+0.1 th 1
Test final tuned values
19-10-15 Ala KDTweakTune diff
57476/60000 iterations
120003/120000 games played
120000 @ 60+0.6 th 1
King shelter and king danger are usually tweaked or tuned separately. There is a serious flaw in doing so : the king shelter gives both a flat eval bonus and its mg part also contribute to king danger. It is likely that the ideal underlying value for the flat bonus and for the king danger reduction are different, but we would not find out with our usual methodology. This tune is an attempt at changing both together.
19-10-18 Ala RookSemiOpenFile diff
LLR: -2.95 (-2.94,2.94) [0.50,4.50]
Total: 12163 W: 2618 L: 2722 D: 6823
sprt @ 10+0.1 th 1
Take 1
19-10-17 Ala DoubledPawn1 diff
LLR: -2.95 (-2.94,2.94) [0.00,3.50]
Total: 8572 W: 1360 L: 1512 D: 5700
sprt @ 60+0.6 th 1
STC failed. Was tuned at LTC, spec LTC low TP.
19-10-17 Ala DoubledPawn1 diff
LLR: -2.95 (-2.94,2.94) [0.50,4.50]
Total: 21534 W: 4733 L: 4791 D: 12010
sprt @ 10+0.1 th 1
Test final tuned values
19-10-14 Ala DoubledPawnTune diff
71241/75000 iterations
150010/150000 games played
150000 @ 60+0.6 th 1
Now that the new terms have better approximations, continue the tune at LTC and with lower variance
19-10-16 Ala BishopSeventh diff
LLR: -2.95 (-2.94,2.94) [0.50,4.50]
Total: 33051 W: 7157 L: 7159 D: 18735
sprt @ 10+0.1 th 1
Take 2, value 40
19-10-16 Ala BishopSeventh diff
LLR: -2.95 (-2.94,2.94) [0.50,4.50]
Total: 13060 W: 2788 L: 2888 D: 7384
sprt @ 10+0.1 th 1
Penalty for a bishop trapped behind enemy pawns, take 1
19-10-15 Ala KDShelterTuned diff
LLR: -0.88 (-2.94,2.94) [0.00,3.50]
Total: 12642 W: 2073 L: 2090 D: 8479
sprt @ 60+0.6 th 1
The partial results are not doing so well at STC. Does this also hold at LTC ? Low TP.
19-10-15 Ala SpaceTweak diff
LLR: -2.95 (-2.94,2.94) [0.50,4.50]
Total: 23703 W: 5185 L: 5232 D: 13286
sprt @ 10+0.1 th 1
Take 1
19-10-15 Ala KDShelterTuned diff
LLR: -1.96 (-2.94,2.94) [0.50,4.50]
Total: 15264 W: 3305 L: 3339 D: 8620
sprt @ 10+0.1 th 1
Test partial tune results
19-10-15 Ala OwnPawnComplexity diff
LLR: -2.95 (-2.94,2.94) [0.50,4.50]
Total: 12205 W: 2578 L: 2682 D: 6945
sprt @ 10+0.1 th 1
Less eg initiative for the strong side if pawnless.
19-10-14 Ala DoubledPawn1 diff
LLR: -2.95 (-2.94,2.94) [0.50,4.50]
Total: 23538 W: 5166 L: 5214 D: 13158
sprt @ 10+0.1 th 1
Test partial tune results
19-10-13 Ala DoubledPawnTune diff
56493/60000 iterations
119993/120000 games played
120000 @ 20+0.2 th 1
Short TC first to get decent approximations faster for the new eval terms.
19-10-13 Ala DoubledIsolated diff
LLR: -2.95 (-2.94,2.94) [0.50,4.50]
Total: 9361 W: 1997 L: 2115 D: 5249
sprt @ 10+0.1 th 1
Extra penalty for pawns that are doubled and isolated.
19-10-11 Ala KingShelterTune diff
34248/75000 iterations
71589/150000 games played
150000 @ 160+1.6 th 1
Experimental King Shelter tune. Uses nodestime, equivalent to 60+0.6s TC (fixed)
19-10-12 Ala KingPawnFile diff
LLR: -2.95 (-2.94,2.94) [0.50,4.50]
Total: 12034 W: 2128 L: 2233 D: 7673
sprt @ 27+2.7 th 1
The bulk of the computations done in this patch only depend on king and pawns positions, and should be cached for efficiency. However, I get bench changes when I try to do so. Before further troubleshooting, I'd like to check if, aside from the slowdown, this is an improvement over the current master. Hence, nodestime to artificially ignore the slowdown affecting the tested version.
19-10-11 Ala KingShelterTune diff
96/75000 iterations
234/150000 games played
150000 @ 160+1.6 th 1
Experimental King Shelter tune. Uses nodestime, equivalent to 60+0.-s TC
19-10-11 Ala BishopFlanks diff
LLR: -2.96 (-2.94,2.94) [0.50,4.50]
Total: 18860 W: 4026 L: 4098 D: 10736
sprt @ 10+0.1 th 1
Endgame bonus for bishops when there are pawns on both sides, take 1.
19-10-11 Ala ImbalanceComplexity2 diff
LLR: -2.96 (-2.94,2.94) [0.50,4.50]
Total: 12668 W: 2699 L: 2801 D: 7168
sprt @ 10+0.1 th 1
Take 2 : 50% effect
19-10-11 Ala ImbalanceComplexity diff
LLR: -2.96 (-2.94,2.94) [0.50,4.50]
Total: 11677 W: 2515 L: 2622 D: 6540
sprt @ 10+0.1 th 1
Give a bonus to the attacker for positions with a more complex material imbalance. Take 1.
19-10-11 Ala ImbalanceComplexity3 diff
LLR: -2.96 (-2.94,2.94) [0.50,4.50]
Total: 8848 W: 1895 L: 2016 D: 4937
sprt @ 10+0.1 th 1
Take 3 : 150% effect
19-10-06 Ala master diff
ELO: 51.76 +-1.8 (95%) LOS: 100.0%
Total: 40000 W: 8543 L: 2628 D: 28829
40000 @ 30+0.3 th 8
Multicore regression/progression test against SF10 after "Introduce separate counter-move tables for captures" of October 6th.
19-10-06 Ala EvalWindow01 diff
LLR: 2.95 (-2.94,2.94) [0.00,3.50]
Total: 155553 W: 25745 L: 25141 D: 104667
sprt @ 60+0.6 th 1
LTC for take 3
19-10-06 Ala EvalWindow01 diff
LLR: 2.96 (-2.94,2.94) [0.50,4.50]
Total: 60102 W: 13327 L: 12868 D: 33907
sprt @ 10+0.1 th 1
Take 3
19-10-06 Ala EvalWindow01 diff
LLR: -2.96 (-2.94,2.94) [0.50,4.50]
Total: 9837 W: 2072 L: 2188 D: 5577
sprt @ 10+0.1 th 1
Take 2
19-10-06 Ala EvalWindow01 diff
LLR: -2.95 (-2.94,2.94) [0.50,4.50]
Total: 16040 W: 3497 L: 3582 D: 8961
sprt @ 10+0.1 th 1
First take
19-10-06 Ala master diff
ELO: 41.96 +-1.9 (95%) LOS: 100.0%
Total: 40000 W: 8553 L: 3746 D: 27701
40000 @ 60+0.6 th 1
Regression/progression test against SF10 after "Introduce separate counter-move tables for captures" of October 6th.
19-10-04 Ala SupportedPawnArray diff
LLR: -2.96 (-2.94,2.94) [0.00,3.50]
Total: 36238 W: 5966 L: 6041 D: 24231
sprt @ 60+0.6 th 1
Spec LTC, as it was tuned at LTC and got a 50K yellow.
19-10-04 Ala SupportedPawnArray diff
LLR: -2.96 (-2.94,2.94) [0.50,4.50]
Total: 50406 W: 11019 L: 10936 D: 28451
sprt @ 10+0.1 th 1
Test tune results.
19-10-03 Ala SPATune2 diff
70495/75000 iterations
148898/150000 games played
150000 @ 60+0.6 th 1
Tuning for a different take on the supported pawn array. Reusing previous data as a starting point.
19-10-03 Ala LoneQueen diff
LLR: -2.94 (-2.94,2.94) [0.50,4.50]
Total: 33798 W: 7340 L: 7338 D: 19120
sprt @ 10+0.1 th 1
Take 2 (SF=20)
19-10-03 Ala LoneQueen diff
LLR: -2.95 (-2.94,2.94) [0.50,4.50]
Total: 35480 W: 7810 L: 7799 D: 19871
sprt @ 10+0.1 th 1
Reduce scale factor in lone queen vs 2+ enemy pieces endgames.
19-10-03 Ala LoneQueen diff
LLR: -2.95 (-2.94,2.94) [0.50,4.50]
Total: 19496 W: 4230 L: 4298 D: 10968
sprt @ 10+0.1 th 1
Take 3 (SF=12)
19-09-28 Ala TrappedRookA diff
LLR: -2.95 (-2.94,2.94) [0.50,4.50]
Total: 18418 W: 4058 L: 4131 D: 10229
sprt @ 10+0.1 th 1
Probably too much bitboard operations to work at STC, but let's see how it fares.
19-09-27 Ala SupportedPawnArray diff
LLR: -0.94 (-2.94,2.94) [0.00,3.50]
Total: 188177 W: 31173 L: 30708 D: 126296
sprt @ 60+0.6 th 1
Tuned at LTC so test at LTC. Values still are somewhat messy, but they should be close enough overall to see if the idea works.
19-09-26 Ala SPATune diff
71696/75000 iterations
149859/150000 games played
150000 @ 60+0.6 th 1
Values have not stabilized yet, so continue the tune.
19-09-26 Ala 667d24f22743959ceddda6a diff
ELO: 38.97 +-1.9 (95%) LOS: 100.0%
Total: 40000 W: 8325 L: 3857 D: 27818
40000 @ 60+0.6 th 1
Regression/progression test against SF10 after "Increase weight for supported pawns" of September 24th.
19-09-26 Ala PinnableQueen diff
LLR: -2.96 (-2.94,2.94) [0.50,4.50]
Total: 10646 W: 2288 L: 2400 D: 5958
sprt @ 10+0.1 th 1
First take
19-09-26 Ala SPATune diff
56910/60000 iterations
119653/120000 games played
120000 @ 60+0.6 th 1
Restart with higher variance (not sure how having tons of machine influence SPSA, but it looked too static)
19-09-26 Ala SPATune diff
465/60000 iterations
5527/120000 games played
120000 @ 60+0.6 th 1
LTC Tuning : differentiate supported pawn value depending on file and supporting pawn status.
19-09-17 Ala master diff
ELO: 46.57 +-1.8 (95%) LOS: 100.0%
Total: 39250 W: 7994 L: 2764 D: 28492
40000 @ 30+0.3 th 8
Multicore regression/progression test against SF10 after "Raise stack size to 8MB for pthreads" of September 16th.
19-09-17 Ala ScaleNNB diff
LLR: -2.96 (-2.94,2.94) [0.00,4.00]
Total: 30545 W: 6687 L: 6731 D: 17127
sprt @ 10+0.1 th 1
Scale down eval in NNB vs rook (and potential pawns) endgames, to avoid throwing a win like in SF-Komodo. Doesn't change bench, and probably triggers rarely, so I'm using [0, 4] bounds to avoid the test stopping too quickly.
19-09-16 Ala KRP_vars diff
LLR: -2.95 (-2.94,2.94) [0.00,3.50]
Total: 35403 W: 5796 L: 5873 D: 23734
sprt @ 60+0.6 th 1
Test at LTC as this was tuned at LTC. Let's see if this idea shows some promise...
19-09-14 Ala KRP_varsTune diff
57695/60000 iterations
120000/120000 games played
120000 @ 60+0.6 th 1
Experimental tune : different values for a set of parameters in KRP positions. (higher variance)
19-09-13 Ala KRP_varsTune diff
4586/60000 iterations
9545/120000 games played
120000 @ 60+0.6 th 1
Experimental tune : different values for a set of parameters in KRP positions.