Stockfish Testing Queue

Pending - 0 tests 0.0 hrs

None

Active - 0 tests

Finished - 1061 tests

19-04-21 jos psqt2 diff
ELO: -147.40 +-5.6 (95%) LOS: 0.0%
Total: 10000 W: 1204 L: 5209 D: 3587
10000 @ 10+0.1 th 1 A quick check of manual rook values.
19-04-21 jos rpt diff
LLR: -2.96 (-2.94,2.94) [0.00,4.00]
Total: 17684 W: 3841 L: 3936 D: 9907
sprt @ 10+0.1 th 1 Test manual values against master.
19-04-17 jos psqt_file_rank diff
ELO: -408.94 +-15.5 (95%) LOS: 0.0%
Total: 3833 W: 106 L: 3274 D: 453
20000 @ 10+0.1 th 1 Check new values. Baseline was -154 elo!
19-04-16 jos tune_psqt_new diff
95512/100000 iterations
200000/200000 games played
200000 @ 20+0.2 th 1 One more try with different settings, now with different ck values for each oiece type. tc=20+0.2 nodestime=600, Pawn ck=20, Knight ck=60, Bishop ck=40, Rook ck=20, Queen ck=30, King ck=80, rk=0.010 (x10 compared to 1st session to allow faster change of values!).
19-04-16 jos tune_psqt_new diff
3005/100000 iterations
6275/200000 games played
200000 @ 20+0.2 th 1 One more try with different settings, tc=20+0.2 nodestime=600, ck=80, rk=0.020 (x20 compared to 1st session to allow faster change of values!).
19-04-16 jos psqt_file_rank diff
ELO: -154.91 +-4.0 (95%) LOS: 0.0%
Total: 20000 W: 2337 L: 10707 D: 6956
20000 @ 10+0.1 th 1 Check values after 2nd tuning session for progress. Baseline was -152 elo.
19-04-15 jos tune_psqt_new diff
95510/100000 iterations
199919/200000 games played
200000 @ 20+0.2 th 1 Second tuning session, after which I will check progress to decide whether it's worth continuing or not.
19-04-14 jos tune_psqt_new diff
95466/100000 iterations
199888/200000 games played
200000 @ 20+0.2 th 1 First tuning session, tc=20+0.2 nodestime=600, ck=60, rk=0.0010. Running with 8moves book to tune for more 'common' opening lines. (I hope everything is setup correctly and fishtest is now able to handle 192 parameters ...)
19-04-14 jos psqt_file_rank diff
ELO: -154.32 +-5.7 (95%) LOS: 0.0%
Total: 10000 W: 1189 L: 5360 D: 3451
10000 @ 10+0.1 th 1 Resetting PSTs to zero and calculate by file and rank. This significantly reduces the parameter space. (A quick first measurement as baseline before tuning).
19-03-20 jos vector_pv diff
LLR: 1.76 (-2.94,2.94) [-3.00,1.00]
Total: 43404 W: 7309 L: 7287 D: 28808
sprt @ 60+0.6 th 1 LTC: Using vectors to build the pv. Seems simpler and easier to understand than the current solution. Test as simplification.
19-03-19 jos vector_pv diff
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 38117 W: 8500 L: 8411 D: 21206
sprt @ 10+0.1 th 1 Using vectors to build the pv. Seems simpler and easier to understand than the current solution. Test as simplification.
19-03-11 jos no_qs_at_root diff
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 134343 W: 29782 L: 29884 D: 74677
sprt @ 10+0.1 th 1 Now that we no longer enter qsearch() while still at root node, we can simplify away 2 changes and modify one assert to catch this more easily in the future. Test for no regression.
19-03-06 jos mcts_aspiration diff
ELO: -6.05 +-4.0 (95%) LOS: 0.1%
Total: 10000 W: 1626 L: 1800 D: 6574
10000 @ 60+0.6 th 1 LTC estimate for information purpose only! (1/3 throughput)
19-03-06 jos mcts_aspiration diff
LLR: -2.96 (-2.94,2.94) [0.50,4.50]
Total: 5575 W: 1195 L: 1333 D: 3047
sprt @ 10+0.1 th 1 Shift the aspiration window towards the mcts-like score. I would expect this to work better at longer time-controls (if at all!), but try it anyways.
19-03-02 jos bugfix2_red diff
LLR: -2.95 (-2.94,2.94) [0.50,4.50]
Total: 21374 W: 4633 L: 4692 D: 12049
sprt @ 10+0.1 th 1 How is this one doing?
19-02-23 jos bugfix_red diff
LLR: -2.96 (-2.94,2.94) [-3.00,1.00]
Total: 81307 W: 17583 L: 17907 D: 45817
sprt @ 10+0.1 th 1 Non-regression test for PR#2017
19-02-22 jos multicut_tweak diff
LLR: -2.96 (-2.94,2.94) [0.00,4.00]
Total: 86892 W: 19163 L: 18985 D: 48744
sprt @ 10+0.1 th 1 Not sure if this can be considered a bugfix, so test as parameter tweak. Allow to return value out of AB window like everywhere (fail-soft).
19-01-24 jos nullR1 diff
LLR: -2.95 (-2.94,2.94) [0.50,4.50]
Total: 19026 W: 4195 L: 4265 D: 10566
sprt @ 10+0.1 th 1 Retry this old idea.
18-12-21 jos knight_outpost6 diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 15140 W: 3221 L: 3278 D: 8641
sprt @ 10+0.1 th 1 Take 2.
18-12-21 jos knight_outpost6 diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 14444 W: 3058 L: 3119 D: 8267
sprt @ 10+0.1 th 1 Extra bonus for a knight outpost on the 6th rank.
18-12-20 jos avg_root_score diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 904 W: 154 L: 283 D: 467
sprt @ 10+0.1 th 1 Experimental root score averaging. Take 1.
18-11-24 jos qssimple diff
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 51067 W: 8625 L: 8553 D: 33889
sprt @ 60+0.6 th 1 Also run at LTC because qsearch is on the hot path. Qsearch simplification. Don't do an extra TT update in case of a fail-high, but simply break off the moves loop and let the TT update at the end of qsearch do this job. Same workflow/logic as in our main search function. Test for no regression.
18-11-24 jos qssimple diff
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 30237 W: 6665 L: 6560 D: 17012
sprt @ 10+0.1 th 1 Qsearch simplification. Don't do an extra TT update in case of a fail-high, but simply break off the moves loop and let the TT update at the end of qsearch do this job. Same workflow/logic as in our main search function. Test for no regression.
18-11-21 jos kingRing8 diff
LLR: -2.96 (-2.94,2.94) [0.00,4.00]
Total: 64546 W: 14218 L: 14128 D: 36200
sprt @ 10+0.1 th 1 Extend kingRing also for a king on rank 8. As a consequence we can now use the faster rank_of(s). See also PR#1824 (Test as parameter tweak.)
18-11-13 jos castling diff
LLR: -2.95 (-2.94,2.94) [0.00,4.00]
Total: 23662 W: 5057 L: 5128 D: 13477
sprt @ 10+0.1 th 1 Slightly differentiate between hypothetical and real castling.
18-11-13 jos kss diff
LLR: -2.95 (-2.94,2.94) [-3.00,1.00]
Total: 3576 W: 714 L: 886 D: 1976
sprt @ 10+0.1 th 1 Take 2, without a castle bonus.
18-11-13 jos kss diff
LLR: -2.96 (-2.94,2.94) [-3.00,1.00]
Total: 15464 W: 3236 L: 3430 D: 8798
sprt @ 10+0.1 th 1 Simplify further and apply a fixed bonus for castling. Pro: possibly faster and scores before and after castling will now be different in almost all cases. Contra: possibly less precise, and another value to tune.
18-10-31 jos no_pv_pruning diff
ELO: -3.61 +-2.4 (95%) LOS: 0.1%
Total: 30000 W: 5219 L: 5531 D: 19250
30000 @ 30+0.3 th 1 No pruning of any kind at PV nodes. Measure at intermediate tc how much we would have to sacrifice to get non-confusing PVs. Patch to also output long fail-high/fail-low lines included. Half throughput.
18-10-30 jos delay_contempt diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 51553 W: 10646 L: 10533 D: 30374
sprt @ 10+0.1 th 1 No static contempt during opening phase. SF's evals during the opening are most often way off!
18-10-27 jos bestmove_pv diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 8737 W: 1830 L: 1918 D: 4989
sprt @ 10+0.1 th 1 Try a bit harder to get a bestMove at PV nodes. Or, skip Late Move Pruning at PV nodes without a bestMove. See also commit notes.
18-08-11 jos no_value_eg diff
LLR: -2.95 (-2.94,2.94) [-3.00,1.00]
Total: 17469 W: 3820 L: 4020 D: 9629
sprt @ 10+0.1 th 1 Run a first simplification test. The basic question is do we really need to assign big scores to (some) won endgames? This not only creates eval 'holes', but also leads to more complicated code. (Value and ScaleFactor based endgames.)
18-06-24 jos simplifyKP diff
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 46069 W: 7949 L: 7870 D: 30250
sprt @ 60+0.6 th 1 LTC: Simplify KingProtector penalty. Apply penalty only to knights and bishops.
18-06-24 jos simplifyKP diff
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 20873 W: 4592 L: 4469 D: 11812
sprt @ 10+0.1 th 1 Simplify KingProtector penalty. Apply penalty only to knights and bishops.
18-06-08 jos time_tweak^ diff
LLR: -2.95 (-2.94,2.94) [0.00,4.00]
Total: 60308 W: 12191 L: 12128 D: 35989
sprt @ 10+0.1 th 1 Tweak timeReduction factor 1.20. Take 1.
18-06-08 jos time_tweak diff
LLR: -2.96 (-2.94,2.94) [0.00,4.00]
Total: 11271 W: 2220 L: 2339 D: 6712
sprt @ 10+0.1 th 1 Tweak timeReduction factor 1.30. Take 2.
18-05-22 jos scaling_last diff
LLR: -2.96 (-2.94,2.94) [-3.00,1.00]
Total: 46540 W: 9166 L: 9412 D: 27962
sprt @ 10+0.1 th 1 Take 2. Move setting of scaling function up yet under the calculation of imbalance eval, and don't return SCALE_FACTOR_NONE.
18-05-22 jos scaling_last diff
LLR: -2.96 (-2.94,2.94) [-3.00,1.00]
Total: 8192 W: 1497 L: 1671 D: 5024
sprt @ 10+0.1 th 1 This simplification patch combines several ideas and is a continuation of ceebo's original idea. 1. Make sure the main eval uses consistent data for imbalance eval in any case. Also don't calculate imbalance when material is even. 2. Make sure, e->factor[c] is fully computed before setting a scaling function. The scaling function might return SCALE_FACTOR_NONE with e->factor[c] still set to SCALE_FACTOR_NORMAL. This can happen for instance in KBPKB or KBPPKB endgames. 3. This allows to move the ScaleFactor computation from the main evaluation into the material section where it belongs, imho.
18-05-14 jos mcp2 diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 1701 W: 295 L: 416 D: 990
sprt @ 10+0.1 th 1 Another try at movecount pruning.
18-05-07 jos reset_scores diff
ELO: 0.43 +-2.2 (95%) LOS: 65.1%
Total: 40000 W: 8046 L: 7997 D: 23957
40000 @ 10+0.1 th 1 Measure the effect of resetting all root move scores at each new iteration. See https://github.com/official-stockfish/Stockfish/pull/1579
18-04-26 jos fullDepthSearch diff
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 37821 W: 7751 L: 7703 D: 22367
sprt @ 10+0.1 th 1 STC: Do a full depth search if we don't LMR a move.
18-04-06 jos ct_experiment diff
ELO: 171.42 +-3.0 (95%) LOS: 100.0%
Total: 30000 W: 15546 L: 1839 D: 12615
30000 @ 10+0.1 th 1 How is this one doing against a weaker opponent (SF7)? Baseline is fisherman's latest test http://tests.stockfishchess.org/tests/view/5ac4a01f0ebc590305f0f425
18-03-31 jos ct_experiment diff
ELO: -1.68 +-2.1 (95%) LOS: 5.8%
Total: 40000 W: 7464 L: 7657 D: 24879
40000 @ 10+0.1 th 1 A first measurement of a contempt experiment. Can we achieve the same by shifting the draw score instead of the evaluation? (Rescheduled, modifying doesn't put it back into queue ...)
18-03-30 jos ct_experiment diff
ELO: -2.01 +-2.8 (95%) LOS: 7.8%
Total: 22847 W: 4273 L: 4405 D: 14169
40000 @ 10+0.1 th 1 A first measurement of a contempt experiment. Can we achieve the same by shifting the draw score instead of the evaluation?
18-03-26 jos qsft diff
LLR: -2.97 (-2.94,2.94) [0.00,5.00]
Total: 25816 W: 5244 L: 5253 D: 15319
sprt @ 10+0.1 th 1 Take 2.
18-03-26 jos qsft diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 45327 W: 9305 L: 9222 D: 26800
sprt @ 10+0.1 th 1 After the recent changes retest this old idea. Take 1.
18-03-21 jos init1 diff
LLR: -2.95 (-2.94,2.94) [0.00,4.00]
Total: 7215 W: 1420 L: 1554 D: 4241
sprt @ 10+0.1 th 1 Take 3.
18-03-20 jos init1 diff
LLR: -2.95 (-2.94,2.94) [0.00,4.00]
Total: 7052 W: 1346 L: 1480 D: 4226
sprt @ 10+0.1 th 1 Tuned values (SPSA), take 2.
18-03-18 jos prefetch_probcut diff
ELO: -0.18 +-2.2 (95%) LOS: 43.9%
Total: 27586 W: 4125 L: 4139 D: 19322
30000 @ 60+0.6 th 1 Let's see at LTC with bigger hash if there is any benefit in having the prefetch in ProbCut.
18-03-18 jos prefetch_probcut diff
LLR: -2.96 (-2.94,2.94) [0.00,5.00]
Total: 19086 W: 3881 L: 3921 D: 11284
sprt @ 10+0.1 th 1 Add a prefetch in ProbCut. Not sure this has been tried before.
18-03-16 jos init1 diff
LLR: -2.95 (-2.94,2.94) [0.00,4.00]
Total: 19642 W: 4040 L: 4127 D: 11475
sprt @ 10+0.1 th 1 Try some early tuning values for complexity factors.