Chess2u
Would you like to react to this message? Create an account in a few clicks or log in to continue.

Chess2uLog in

descriptionself q-learning : experiments Emptyself q-learning : experiments

more_horiz
Today i'm trying the uci option Self Q-learning from Brainlearn. For each run, cutechess-cli plays 2000 games at tc30+1 without opening nor book then i merge the individual experience files from each thread (concurrency = 39) and ordo calcultates the elo gain.


RUN 1 : "Self Q-learning from scratch"
brainlearn only          : 1t, 128MB, no opening, no experience data, Read only learning true
brainlearn experience : 1t, 128MB, no opening, no experience data, Read only learning false, Self Q-learning true, Concurrent Experience true

Code:

  # PLAYER                   :  RATING  ERROR  POINTS  PLAYED   (%)    W     D    L  D(%)  OppAvg  OppN
   1 brainlearn only          :       0   ----  1018.0    2000  50.9  118  1800   82  90.0      -6     1
   2 brainlearn experience    :      -6     11   982.0    2000  49.1   82  1800  118  90.0       0     1

White advantage = 27.42 +/- 5.55
Draw rate (equal opponents) = 50.00 % +/- 0.00

Lot of draws. [brainlearn experience] suffered to access to 39 individual experience files too.


RUN 2 : "Self Q-learning after 2000 games"
brainlearn only          : 1t, 128MB, no opening, no experience data, Read only learning true
brainlearn experience : 1t, 128MB, no opening, experience data run1, Read only learning false, Self Q-learning true, Concurrent Experience true

Code:

  # PLAYER                   :  RATING  ERROR  POINTS  PLAYED   (%)    W     D    L  D(%)  OppAvg  OppN
   1 brainlearn only          :       0   ----  1003.0    2000  50.1   78  1850   72  92.5      -1     1
   2 brainlearn experience    :      -1     11   997.0    2000  49.9   72  1850   78  92.5       0     1

White advantage = 17.54 +/- 5.62
Draw rate (equal opponents) = 50.00 % +/- 0.00

Lot of draws for sure. [brainlearn experience] loses less games (118 => 78) but wins not enough games (72 vs 78).


RUN 3 : "Self Q-learning after 4000 games"
brainlearn only          : 1t, 128MB, no opening, no experience data, Read only learning true
brainlearn experience : 1t, 128MB, no opening, experience data run2, Read only learning false, Self Q-learning true, Concurrent Experience true

Code:

  # PLAYER                   :  RATING  ERROR  POINTS  PLAYED   (%)    W     D    L  D(%)  OppAvg  OppN
   1 brainlearn experience    :       8     11  1022.0    2000  51.1   84  1876   40  93.8       0     1
   2 brainlearn only          :       0   ----   978.0    2000  48.9   40  1876   84  93.8       8     1

White advantage = 16.49 +/- 5.62
Draw rate (equal opponents) = 50.00 % +/- 0.00

Lot of draws yet but [brainlearn only] wins 3 times less games in this run. [brainlearn experience] impresses here.


Conclusions :

Code:

  # PLAYER                   :  RATING  ERROR  POINTS  PLAYED   (%)    W     D    L  D(%)  OppAvg  OppN
   1 brainlearn experience    :       0      6  3001.0    6000  50.0  238  5526  236  92.1       0     1
   2 brainlearn only          :       0   ----  2999.0    6000  50.0  236  5526  238  92.1       0     1

White advantage = 20.47 +/- 3.14
Draw rate (equal opponents) = 50.00 % +/- 0.00

After 6000 games, "0 elo" is misleading because [brainlearn_experience] wastes time reading the experience files so he was weaker from scratch but he came back to the score.

self q-learning : experiments LzPfcFx
Self Q-learning works slowly.

[brainlearn_experience] is almost the only one to have tried c4, Nf3, g3, e3 :
self q-learning : experiments Y1KOMFu

1 thread at tc30+1 lead to an average D20.
6000 games bring about 230k moves into the cumulated experience file :
self q-learning : experiments XHrsEIr

All data will be uploaded to my MEGA account soon... (more runs to come wink)

descriptionself q-learning : experiments EmptyRe: self q-learning : experiments

more_horiz
What inside the experience file after 6000 games with Self Q-learning :
self q-learning : experiments FTskDkL
self q-learning : experiments BSStzNG
[img]https://i.imgur.com/In5BNJa.png[img]
self q-learning : experiments Up5A4UT
self q-learning : experiments W6zTBBS

descriptionself q-learning : experiments EmptyRe: self q-learning : experiments

more_horiz
During the following runs, i tested without activating Self Q-learning so with the default learning function.

RUN 4 : "default learning from scratch"
brainlearn only          : 1t, 128MB, no opening, no experience data, Read only learning true
brainlearn experience : 1t, 128MB, no opening, no experience data, Read only learning false, Self Q-learning false, Concurrent Experience true

Code:

  # PLAYER                   :  RATING  ERROR  POINTS  PLAYED   (%)    W     D    L  D(%)  OppAvg  OppN
   1 brainlearn only          :       0   ----  1000.5    2000  50.0  107  1787  106  89.3      -0     1
   2 brainlearn experience    :       0     11   999.5    2000  50.0  106  1787  107  89.3       0     1

White advantage = 28.63 +/- 5.56
Draw rate (equal opponents) = 50.00 % +/- 0.00

Always a lot of draws but unlike we saw with the run1 (=Self Q-learning true), here [brainlearn experience] less suffered to access to 39 individual experience files.

RUN 5 : "default learning after 2000 games"
brainlearn only          : 1t, 128MB, no opening, no experience data, Read only learning true
brainlearn experience : 1t, 128MB, no opening, experience data run4, Read only learning false, Self Q-learning false, Concurrent Experience true

Code:

  # PLAYER                   :  RATING  ERROR  POINTS  PLAYED   (%)    W     D    L  D(%)  OppAvg  OppN
   1 brainlearn experience    :       5     11  1013.5    2000  50.7   81  1865   54  93.3       0     1
   2 brainlearn only          :       0   ----   986.5    2000  49.3   54  1865   81  93.3       5     1

White advantage = 18.42 +/- 5.62
Draw rate (equal opponents) = 50.00 % +/- 0.00

After "only" 2000 games, [brainlearn experience] even took the lead !?

RUN 6 : "default learning after 4000 games"
brainlearn only          : 1t, 128MB, no opening, no experience data, Read only learning true
brainlearn experience : 1t, 128MB, no opening, experience data run5, Read only learning false, Self Q-learning false, Concurrent Experience true

Code:

  # PLAYER                   :  RATING  ERROR  POINTS  PLAYED   (%)    W     D    L  D(%)  OppAvg  OppN
   1 brainlearn experience    :       8     11  1022.5    2000  51.1  174  1697  129  84.8       0     1
   2 brainlearn only          :       0   ----   977.5    2000  48.9  129  1697  174  84.8       8     1

White advantage = 43.17 +/- 5.54
Draw rate (equal opponents) = 50.00 % +/- 0.00

Here less draws because [brainlearn experience] got 2 times more wins (81 => 174), awesome performance !

Conclusions :

Code:

  # PLAYER                   :  RATING  ERROR  POINTS  PLAYED   (%)    W     D    L  D(%)  OppAvg  OppN
   1 brainlearn experience    :       4      6  3035.5    6000  50.6  361  5349  290  89.2       0     1
   2 brainlearn only          :       0   ----  2964.5    6000  49.4  290  5349  361  89.2       4     1

White advantage = 30.04 +/- 3.13
Draw rate (equal opponents) = 50.00 % +/- 0.00

With the default learning function, [brainlearn experience] is already far away (+71 vs +2).

self q-learning : experiments ZiDp6AO
We even have seen such trend (negative elo at beginning then the curve goes up) but the default learning function decreased less so the positive elo comes sooner and ended higher.

[brainlearn_experience] is almost the only one to have tried c4 :
self q-learning : experiments YrQp3SL
Only the 4 main first moves...

1 thread at tc30+1 lead to an average D20.
6000 games bring about 230k moves into the cumulated experience file :
self q-learning : experiments DWQg7wP
As expected with less variety at first moves, there is less variety at common openings.

descriptionself q-learning : experiments EmptyRe: self q-learning : experiments

more_horiz
What inside the experience file after 6000 games without Self Q-learning (=default learning function) :
self q-learning : experiments 75ceVzY
self q-learning : experiments L29Qfop
self q-learning : experiments IT0m17A
self q-learning : experiments OkMjGC7

descriptionself q-learning : experiments EmptyRe: self q-learning : experiments

more_horiz
With the Self Q-learning, the scores looked smaller. With the default learning function, the scores appear more common. I suppose the Self Q-learning alters the scores reliable to the results of the played games...

descriptionself q-learning : experiments EmptyRe: self q-learning : experiments

more_horiz
https://mega.nz/folder/b5h0RIwC#XcwOGgtjM-uG62Ky6-PSTA

descriptionself q-learning : experiments EmptyRe: self q-learning : experiments

more_horiz
@deeds wrote:
With the Self Q-learning, the scores looked smaller. With the default learning function, the scores appear more common. I suppose the Self Q-learning alters the scores reliable to the results of the played games...


Hi Deeds,

If you repeat the tests only with concurrency = 1 I think the results would change ...
Because concurrency = 39 I think it affects the correctness of the results.

All the best.

descriptionself q-learning : experiments EmptyRe: self q-learning : experiments

more_horiz
0 forfeit time during 12 000 games, 1 full available thread for the system, all cores between 51-53 degrees, no problem. Both learning functions work well.

descriptionself q-learning : experiments EmptyRe: self q-learning : experiments

more_horiz
During the following runs, i tested if the default learning function without using nnue can reduce the gap against the engine using nnue.

RUN 7 : "default learning without nnue from scratch"
brainlearn only          : 1t, 128MB, no opening, no experience data, Read only learning true
brainlearn class. exp. : 1t, 128MB, no opening, Use NNUE false, EvalFile <empty>, no experience data, Read only learning false, Self Q-learning false, Concurrent Experience true

Code:


   # PLAYER                    :  RATING  ERROR  POINTS  PLAYED   (%)     W     D     L  D(%)  OppAvg  OppN
   1 brainlearn only           :       0   ----  8370.0   10000  83.7  6809  3122    69  31.2    -301     1
   2 brainlearn class. exp.    :    -301      7  1630.0   10000  16.3    69  3122  6809  31.2       0     1

White advantage = 86.54 +/- 3.66
Draw rate (equal opponents) = 50.00 % +/- 0.00

So without using nnue, [brainlearn classical experience] is at -300 elo. The net doesn't really bring 300 elo because the "classical" engine is weaker and weaker after each release since nnue era.


RUN 8 : "default learning without nnue after 10 000 games"
brainlearn only          : 1t, 128MB, no opening, no experience data, Read only learning true
brainlearn class. exp. : 1t, 128MB, no opening, Use NNUE false, EvalFile <empty>, experience data run7, Read only learning false, Self Q-learning false, Concurrent Experience true

Code:


   # PLAYER                    :  RATING  ERROR  POINTS  PLAYED   (%)     W     D     L  D(%)  OppAvg  OppN
   1 brainlearn only           :       0   ----  8116.5   10000  81.2  6310  3613    77  36.1    -265     1
   2 brainlearn class. exp.    :    -265      7  1883.5   10000  18.8    77  3613  6310  36.1       0     1

White advantage = 72.42 +/- 3.54
Draw rate (equal opponents) = 50.00 % +/- 0.00

The default learning function reduces the gap thanks to the experience file merged from the 39 "run 7" experience files.


RUN 9 : "default learning without nnue after 20 000 games"
brainlearn only          : 1t, 128MB, no opening, no experience data, Read only learning true
brainlearn class. exp. : 1t, 128MB, no opening, Use NNUE false, EvalFile <empty>, experience data run8, Read only learning false, Self Q-learning false, Concurrent Experience true

Code:


   # PLAYER                    :  RATING  ERROR  POINTS  PLAYED   (%)     W     D     L  D(%)  OppAvg  OppN
   1 brainlearn only           :       0   ----  8133.0   10000  81.3  6326  3614    60  36.1    -265     1
   2 brainlearn class. exp.    :    -265      7  1867.0   10000  18.7    60  3614  6326  36.1       0     1

White advantage = 64.55 +/- 3.42
Draw rate (equal opponents) = 50.00 % +/- 0.00

There is like a landing, the default learning function no longer teaches.


Conclusions :

Code:


   # PLAYER                    :  RATING  ERROR   POINTS  PLAYED   (%)      W      D      L  D(%)  OppAvg  OppN
   1 brainlearn only           :       0   ----  24619.5   30000  82.1  19445  10349    206  34.5    -276     1
   2 brainlearn class. exp.    :    -276      4   5380.5   30000  17.9    206  10349  19445  34.5       0     1

White advantage = 73.53 +/- 2.04
Draw rate (equal opponents) = 50.00 % +/- 0.00

After 30k games without using nnue, the default learning function managed to find 25 elo from the start position.

self q-learning : experiments 4E0zopf
Considering the pace of learning, perhaps a million games would not be enough to completely close such a gap.

[brainlearn_classical_experience] is almost the only one to have tried Nf3, e3, c4 :
self q-learning : experiments Chid9ey

1 thread at tc30+1 leads to an average D20.
30 000 games bring about 762k moves into the cumulated experience file :
self q-learning : experiments OMl55Vk

descriptionself q-learning : experiments EmptyRe: self q-learning : experiments

more_horiz
What inside the experience file after 30 000 games by the default learning function without using nnue :
self q-learning : experiments POQCGJB
self q-learning : experiments QubBfx5
self q-learning : experiments MCzLNUV
self q-learning : experiments Hp4o9rq
self q-learning : experiments EQHMTOd

descriptionself q-learning : experiments EmptyRe: self q-learning : experiments

more_horiz
During the following runs, i tested if the Self Q-learning function without using nnue can reduce the gap against the engine using nnue.

RUN 10 : "Self Q-learning without nnue from scratch"
brainlearn only          : 1t, 128MB, no opening, no experience data, Read only learning true
brainlearn class. exp. : 1t, 128MB, no opening, Use NNUE false, EvalFile <empty>, no experience data, Read only learning false, Self Q-learning true, Concurrent Experience true

Code:


   # PLAYER                    :  RATING  ERROR  POINTS  PLAYED   (%)     W    D     L  D(%)  OppAvg  OppN
   1 brainlearn only           :       0   ----  1629.5    2000  81.5  1271  717    12  35.9    -273     1
   2 brainlearn class. exp.    :    -273     15   370.5    2000  18.5    12  717  1271  35.9       0     1

White advantage = 85.79 +/- 7.62
Draw rate (equal opponents) = 50.00 % +/- 0.00

So without using nnue, the Self Q-learning function starts with a better result (-273 elo @ 2000 games vs -301 elo @ 10 000 games).


RUN 11 : "Self Q-learning without nnue after 2000 games"
brainlearn only          : 1t, 128MB, no opening, no experience data, Read only learning true
brainlearn class. exp. : 1t, 128MB, no opening, Use NNUE false, EvalFile <empty>, experience data run10, Read only learning false, Self Q-learning true, Concurrent Experience true

Code:


   # PLAYER                    :  RATING  ERROR  POINTS  PLAYED   (%)     W    D     L  D(%)  OppAvg  OppN
   1 brainlearn only           :       0   ----  1610.0    2000  80.5  1230  760    10  38.0    -271     1
   2 brainlearn class. exp.    :    -271     15   390.0    2000  19.5    10  760  1230  38.0       0     1

White advantage = 113.80 +/- 7.67
Draw rate (equal opponents) = 50.00 % +/- 0.00

Not sure the Self Q-learning function learns something here...


RUN 12 : "Self Q-learning without nnue after 4000 games"
brainlearn only          : 1t, 128MB, no opening, no experience data, Read only learning true
brainlearn class. exp. : 1t, 128MB, no opening, Use NNUE false, EvalFile <empty>, experience data run11, Read only learning false, Self Q-learning true, Concurrent Experience true

Code:


   # PLAYER                    :  RATING  ERROR  POINTS  PLAYED   (%)     W    D     L  D(%)  OppAvg  OppN
   1 brainlearn only           :       0   ----  1619.5    2000  81.0  1252  735    13  36.8    -267     1
   2 brainlearn class. exp.    :    -267     15   380.5    2000  19.0    13  735  1252  36.8       0     1

White advantage = 86.11 +/- 7.53
Draw rate (equal opponents) = 50.00 % +/- 0.00

Without using nnue, the Self Q-learning function finds almost nothing.


Conclusions :

Code:


   # PLAYER                    :  RATING  ERROR  POINTS  PLAYED   (%)     W     D     L  D(%)  OppAvg  OppN
   1 brainlearn only           :       0   ----  4859.0    6000  81.0  3753  2212    35  36.9    -270     1
   2 brainlearn class. exp.    :    -270      8  1141.0    6000  19.0    35  2212  3753  36.9       0     1

White advantage = 95.98 +/- 4.44
Draw rate (equal opponents) = 50.00 % +/- 0.00

These conditions don't allow the Self Q-learning function to learn enough good moves. I don't think more games could help too.

self q-learning : experiments WNuckY6
For sure, a million games would not be enough to completely close such a gap here.

[brainlearn_classical_experience] is almost the only one to have tried Nf3, e3, c3, c4, Nc3 :
self q-learning : experiments MrJnWid
the Self Q-learning function show a better variety here.

1 thread at tc30+1 leads to an average D20.
6000 games bring about 184k moves into the cumulated experience file :
self q-learning : experiments C8Xc3iz

descriptionself q-learning : experiments EmptyRe: self q-learning : experiments

more_horiz
What inside the experience file after 6000 games by the Self Q-learning function without using nnue :
self q-learning : experiments 509b170
self q-learning : experiments TSyHDZt
self q-learning : experiments VKtLean
self q-learning : experiments WD0ZDpu

descriptionself q-learning : experiments EmptyRe: self q-learning : experiments

more_horiz
During the following runs, i tested if the Self Q-learning function can improve the MCTS Multi search algorithm.

RUN 13 : "MCTS multi with Self Q-learning from scratch"
brainlearn only             : 1t, 128MB, no opening, no experience data, Read only learning true
brainlearn mcts exp.   : 1t, 128MB, no opening, MCTS Multi, no experience data, Read only learning false, Self Q-learning true, Concurrent Experience true

Code:


   # PLAYER                  :  RATING  ERROR  POINTS  PLAYED   (%)    W     D    L  D(%)  OppAvg  OppN
   1 brainlearn only         :       0   ----  1005.0    2000  50.3   93  1824   83  91.2      -2     1
   2 brainlearn mcts exp.    :      -2     11   995.0    2000  49.8   83  1824   93  91.2       0     1

White advantage = 24.22 +/- 5.59
Draw rate (equal opponents) = 50.00 % +/- 0.00

MCTS Multi did a better job here than the default search algorithm (-2 elo vs -6 elo @ run1).


RUN 14 : "MCTS multi with Self Q-learning after 2000 games"
brainlearn only             : 1t, 128MB, no opening, no experience data, Read only learning true
brainlearn mcts exp.   : 1t, 128MB, no opening, MCTS Multi, experience data run13, Read only learning false, Self Q-learning true, Concurrent Experience true

Code:


   # PLAYER                  :  RATING  ERROR  POINTS  PLAYED   (%)    W     D    L  D(%)  OppAvg  OppN
   1 brainlearn mcts exp.    :       1     11  1003.5    2000  50.2   65  1877   58  93.8       0     1
   2 brainlearn only         :       0   ----   996.5    2000  49.8   58  1877   65  93.8       1     1

White advantage = 15.26 +/- 5.63
Draw rate (equal opponents) = 50.00 % +/- 0.00

The Self Q-learning really improved the MCTS Multi search algorithm. The default search algorithm did -1 elo after 2000 games (run2).


RUN 15 : "MCTS multi with Self Q-learning after 4000 games"
brainlearn only             : 1t, 128MB, no opening, no experience data, Read only learning true
brainlearn mcts exp.   : 1t, 128MB, no opening, MCTS Multi, experience data run14, Read only learning false, Self Q-learning true, Concurrent Experience true

Code:


   # PLAYER                  :  RATING  ERROR  POINTS  PLAYED   (%)    W     D    L  D(%)  OppAvg  OppN
   1 brainlearn mcts exp.    :       9     11  1024.5    2000  51.2   79  1891   30  94.5       0     1
   2 brainlearn only         :       0   ----   975.5    2000  48.8   30  1891   79  94.5       9     1

White advantage = 11.75 +/- 5.59
Draw rate (equal opponents) = 50.00 % +/- 0.00

Here, 3 times less wins by [brainlearn only] (93 => 30) ! The Self Q-learning mostly teached the MCTS Multi search algorithm to avoid to loose.


Conclusions :

Code:


   # PLAYER                  :  RATING  ERROR  POINTS  PLAYED   (%)    W     D    L  D(%)  OppAvg  OppN
   1 brainlearn mcts exp.    :       3      6  3023.0    6000  50.4  227  5592  181  93.2       0     1
   2 brainlearn only         :       0   ----  2977.0    6000  49.6  181  5592  227  93.2       3     1

White advantage = 17.07 +/- 3.14
Draw rate (equal opponents) = 50.00 % +/- 0.00

The MCTS Multi search algorithm confirms to do better than the default one. (+3 elo vs 0 elo @ run1+2+3)

self q-learning : experiments Vws54Hc
Self Q-learning works also with the MCTS Multi search algorithm.

[brainlearn_mcts_exp] is almost the only one to have tried c4, Nf3, e3 :
self q-learning : experiments 5Vhphqb
93% of draws

1 thread at tc30+1 lead to an average D20.
6000 games bring about 216k moves into the cumulated experience file :
self q-learning : experiments DuTJR8B

descriptionself q-learning : experiments EmptyRe: self q-learning : experiments

more_horiz
What inside the experience file after 6000 games with Self Q-learning and MCTS Multi :
self q-learning : experiments Yt6KMKm
self q-learning : experiments C6rvq5y
self q-learning : experiments VYgbv1Q
self q-learning : experiments 6BK3aXw
self q-learning : experiments L3WAbmR
The moves scores are low because the Self q-learning function updates them reliable to the results of the played games.

descriptionself q-learning : experiments EmptyRe: self q-learning : experiments

more_horiz
self q-learning : experiments HYsH1Eo

descriptionself q-learning : experiments EmptyRe: self q-learning : experiments

more_horiz
A score of 50.9 to 49.1 on two almost equal engines is no indication that learning works!
Take a single opening and play several games. Then you will see if the engine learns or not. And please don't play hyperbullet games. Give the engines time.
I have NEVER found that learning like Eman, Brainlearn, Shashchess are doing it really help to increase the playing strength.

descriptionself q-learning : experiments EmptyRe: self q-learning : experiments

more_horiz
Sorry for you, here both brainlearn's learning functions work in all configurations that i have tested until now.

About "taking a single opening", maybe you are misinformed but i already did it several times with Eman :
https://outskirts.altervista.org/forum/viewtopic.php?f=6&t=2379
https://outskirts.altervista.org/forum/viewtopic.php?f=6&t=2417
https://www.open-chess.org/viewtopic.php?f=3&t=3317

Brainlearn used 1 thread at tc30+1 and Eman used 7 threads at tc60+1 and their learning functions worked reliable to the choosen openings so giving them more time or not changed nothing.

descriptionself q-learning : experiments EmptyRe: self q-learning : experiments

more_horiz
During the following runs, i tested if the default learning function can improve the MCTS Multi search algorithm.

RUN 16 : "MCTS multi with default learning function from scratch"
brainlearn only           : 1t, 128MB, no opening, no experience data, Read only learning true
brainlearn mcts exp.   : 1t, 128MB, no opening, MCTS Multi, no experience data, Read only learning false, Self Q-learning false, Concurrent Experience true

Code:


   # PLAYER                  :  RATING  ERROR  POINTS  PLAYED   (%)    W     D    L  D(%)  OppAvg  OppN
   1 brainlearn only         :       0   ----  1004.5    2000  50.2  115  1779  106  89.0      -2     1
   2 brainlearn mcts exp.    :      -2     11   995.5    2000  49.8  106  1779  115  89.0       0     1

White advantage = 29.33 +/- 5.54
Draw rate (equal opponents) = 50.00 % +/- 0.00

MCTS Multi lost more games here than the default search algorithm (-2 elo vs 0 elo @ run4).


RUN 17 : "MCTS multi with default learning function after 2000 games"
brainlearn only           : 1t, 128MB, no opening, no experience data, Read only learning true
brainlearn mcts exp.   : 1t, 128MB, no opening, MCTS Multi, experience data run16, Read only learning false, Self Q-learning false, Concurrent Experience true

Code:


   # PLAYER                  :  RATING  ERROR  POINTS  PLAYED   (%)    W     D    L  D(%)  OppAvg  OppN
   1 brainlearn mcts exp.    :      10     11  1028.0    2000  51.4  145  1766   89  88.3       0     1
   2 brainlearn only         :       0   ----   972.0    2000  48.6   89  1766  145  88.3      10     1

White advantage = 32.72 +/- 5.58
Draw rate (equal opponents) = 50.00 % +/- 0.00

After 2000 games, [brainlearn mcts experience] took the lead and even did better than the default search algorithm (+10 elo vs +5 elo @ run5).


RUN 18 : "MCTS multi with default learning function after 4000 games"
brainlearn only           : 1t, 128MB, no opening, no experience data, Read only learning true
brainlearn mcts exp.   : 1t, 128MB, no opening, MCTS Multi, experience data run17, Read only learning false, Self Q-learning false, Concurrent Experience true

Code:


   # PLAYER                  :  RATING  ERROR  POINTS  PLAYED   (%)    W     D    L  D(%)  OppAvg  OppN
   1 brainlearn mcts exp.    :      11     11  1030.5    2000  51.5  144  1773   83  88.7       0     1
   2 brainlearn only         :       0   ----   969.5    2000  48.5   83  1773  144  88.7      11     1

White advantage = 33.96 +/- 5.58
Draw rate (equal opponents) = 50.00 % +/- 0.00

After 2000 games, [brainlearn mcts experience] took the lead and even did better than the default search algorithm (+10 elo vs +5 elo @ run5).
+11 elo is the best run ever shown here until now !


Conclusions :

Code:


   # PLAYER                  :  RATING  ERROR  POINTS  PLAYED   (%)    W     D    L  D(%)  OppAvg  OppN
   1 brainlearn mcts exp.    :       6      6  3054.0    6000  50.9  395  5318  287  88.6       0     1
   2 brainlearn only         :       0   ----  2946.0    6000  49.1  287  5318  395  88.6       6     1

White advantage = 31.99 +/- 3.12
Draw rate (equal opponents) = 50.00 % +/- 0.00

As with the Self Q-learning function (run13+14+15), now with the default learning function the MCTS Multi search algorithm does better than the default search algorithm (+6 elo vs +4 elo @ run4+5+6).

self q-learning : experiments 2YCaRc4
The default learning function works at its best with the MCTS Multi search algorithm.

[brainlearn_mcts_exp] is almost the only one to have tried Nf3 and [brainlearn_only] is the only one to have tried c4 :
self q-learning : experiments 2KktEOd
Only the 3 main first moves (c4 only played 2 times here).

1 thread at tc30+1 lead to an average D20.
6000 games bring about 232k moves into the cumulated experience file :
self q-learning : experiments IRPxOIz
Less variety than the Self Q-learning function too.

descriptionself q-learning : experiments EmptyRe: self q-learning : experiments

more_horiz
What inside the experience file after 6000 games with the default learning function and MCTS Multi :
self q-learning : experiments MKtRua5
self q-learning : experiments LcDbg3I
self q-learning : experiments A8IIi0C
self q-learning : experiments SbsUSpD

descriptionself q-learning : experiments EmptyRe: self q-learning : experiments

more_horiz
Permissions in this forum:
You cannot reply to topics in this forum