Stockfish Testing Queue

Pending - 0 tests 0.0 hrs

None

Active - 0 tests

Finished - 20 tests

19-05-17 Cof novoting diff
LLR: -2.95 (-2.94,2.94) [0.00,3.50]
Total: 47835 W: 7077 L: 7127 D: 33631
sprt @ 60+0.6 th 4 After several simplifications re-check complete removal of voting scheme. I think voting scheme is worst with a little number of threads and LTC. Test removal for elo gain, if it fails everything is fine. In case this test passes, more tests have to be made. No functional change on one thread.
19-04-15 Cof novoting diff
LLR: -2.96 (-2.94,2.94) [0.00,3.50]
Total: 13519 W: 1919 L: 2057 D: 9543
sprt @ 60+0.6 th 4 Prove that current master is bad with 4 threads and 60+0.6 time control
19-04-14 Cof novoting diff
LLR: -2.95 (-2.94,2.94) [-3.00,1.00]
Total: 10964 W: 1856 L: 2030 D: 7078
sprt @ 10+0.1 th 8 Complete the test series of novoting,to check scaling with time: 10+0.1 4 threads, 60+0.6 4 threads and 60+0.6 8 threads.
19-03-26 Cof novoting diff
LLR: -2.96 (-2.94,2.94) [-3.00,1.00]
Total: 117824 W: 15660 L: 15966 D: 86198
sprt @ 60+0.6 th 8 novoting seems to be an elo gain on 60+0.6 4 threads, so test it on 60+0.6 8 threads, too. No functional change on 1 thread
19-03-26 Cof novoting diff
LLR: 2.94 (-2.94,2.94) [-3.00,1.00]
Total: 12549 W: 1942 L: 1810 D: 8797
sprt @ 60+0.6 th 4 LTC to check the scaling of the thread voting scheme. Is it strong enough to refute a [-3,1] on 60+0.6 with 4 threads? I will stop the test if number of games goes into 100k range.
19-03-25 Cof novoting diff
LLR: -2.95 (-2.94,2.94) [-3.00,1.00]
Total: 44740 W: 8529 L: 8768 D: 27443
sprt @ 10+0.1 th 4 Scaling check for thread voting scheme. I want to test at 10+0.1 4 threads and then (more important!) at 60+0.6 4 threads, no functional change on 1 thread
19-02-15 Cof noskip2 diff
LLR: 2.95 (-2.94,2.94) [0.00,3.50]
Total: 30243 W: 4052 L: 3815 D: 22376
sprt @ 120+1.2 th 4 Show Elo gain of noskip (derived from latest master) on VLTC 4 threads
19-01-29 Cof novoting diff
LLR: 0.33 (-2.94,2.94) [-3.00,1.00]
Total: 157875 W: 20295 L: 20483 D: 117097
sprt @ 60+0.6 th 8 quick scaling check for no thread voting on 60+0.6 8 threads. Should fail quickly. No functional change on 1 thread.
19-01-25 Cof noskip diff
ELO: -1.56 +-3.2 (95%) LOS: 17.1%
Total: 10000 W: 1097 L: 1142 D: 7761
10000 @ 60+0.6 th 30 Try to measure performance of noskip on 30 threads. I hope 2GB hash is OK for this test. We can stop this test, when it takes too much time.
19-01-25 Cof noskip diff
Pending...
sprt @ 60+0.6 th 8 2nd run of noskip at 8 threads and 60+0.06, just to avoid that first run was a fluke.
19-01-20 Cof noskip diff
LLR: -2.96 (-2.94,2.94) [-3.00,1.00]
Total: 53093 W: 8198 L: 8431 D: 36464
sprt @ 20+0.2 th 8 LTC 20+0.2 8 thread for removing thread skipping scheme. No functional change for 1 thread. Well, 5+0.05 test failed quickly, skipping scheme seems to be ~4 Elo at 5+0.05 8 threads. Reason for this test is, that the skipping scheme does NOT scale well with time, so I expect between 0 and -2 Elo. In my local tests (Amd FX CPU) it was 0 Elo, I would like to see it run on fishtest noob hardware. I will limit max games to 20000, to limit the resource usage. If I am wrong that scaling of the skipping scheme is bad, this test will fail quickly (like 5+0.05 run before).
19-01-23 Cof noskip diff
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 15138 W: 2005 L: 1877 D: 11256
sprt @ 60+0.6 th 8 Remove thread skipping scheme: For 8 threads we have now 2 data points on fishtest for "noskip" 20000 games 5+0.05: -4.93 +-3.0 21500 games 20+0.2: -0.05 [-2.66,2.56] So what happens on longer time control??? let's try 60+0.6. It will take a lot of resources, so I will set a limit of 20000 games.
19-01-19 Cof noskip diff
LLR: -2.95 (-2.94,2.94) [-3.00,1.00]
Total: 5042 W: 880 L: 1047 D: 3115
sprt @ 5+0.05 th 8 Remove skipping scheme, no functional change on 1 thread
18-12-04 Cof master diff
LLR: -2.95 (-2.94,2.94) [0.00,4.00]
Total: 18796 W: 3002 L: 3096 D: 12698
sprt @ 60+0.6 th 1 Do the same test for LTC, too.I think 64MB is sufficient for 60+0.6, but let's see.
18-12-04 Cof master diff
LLR: -2.96 (-2.94,2.94) [0.00,4.00]
Total: 45589 W: 9878 L: 9864 D: 25847
sprt @ 10+0.1 th 1 The last test clearly showed that 8MB is better for STC. Let's double the hash size one more time and check, if there is still an Elo gain [0,4].
18-12-04 Cof master diff
LLR: 2.96 (-2.94,2.94) [0.00,4.00]
Total: 18116 W: 4089 L: 3853 D: 10174
sprt @ 10+0.1 th 1 According to my local tests 8MB TT is stronger then 4MB on 10+0.1. Test on fishtest with [0,4], if it passes, we should change the default TT size for STC.
18-09-06 Cof pullreq1663 diff
LLR: -0.16 (-2.94,2.94) [0.00,4.00]
Total: 10085 W: 1618 L: 1595 D: 6872
sprt @ 5+0.05 th 63 Test for pullreq1663 for speed gain. 2nd try, I wanted to test with [0,4] only. target machine: AMD 64 core computer.
18-02-07 Cof O2 diff
LLR: -3.88 (-2.94,2.94) [0.00,4.00]
Total: 22746 W: 4918 L: 5044 D: 12784
sprt @ 10+0.1 th 1 Switch to O2 in general, enable lto for Windows-mingw. Huge gain on my Windows PC with gcc 7.3. I am not sure, if it will work on every Windows machine and hopefully no regression under Linux. Test for Elo gain.
17-12-17 Cof NumaBug diff
ELO: 9.24 +-4.6 (95%) LOS: 100.0%
Total: 5000 W: 634 L: 501 D: 3865
5000 @ 10+0.1 th 39 No functional change for Linux! Only relevant for Windows AND multi-Numa-node machines. I will put a 40 Core machine to fishtest, when test is approved. please note the Priority +1
17-06-17 Cof rev_pref_earlier diff
LLR: -2.96 (-2.94,2.94) [0.00,4.00]
Total: 71365 W: 13006 L: 12919 D: 45440
sprt @ 10+0.1 th 1 sorry, it's my first test, wrote wrong numbers for signature. Non functional change revert prefetch earlier as parameter patch [0,4]