based on the statistics and probability in math

the question is

cuz of course running a tourney for 40k rounds is good

cuz the error fluctuation is +2/-2

so the result is very accurate

but ppl can't run that many rounds

unless ppl have supercomputers

so realistically we have to decrease the total number of rounds for one Vs. one

***

if engine A and engine B play 30 matches

I think the error fluctuation is +100 and -100 for both engines

so the total error level is +200 or -200

so the tourney result is useless cuz the error gap is 200

if engine A and engine B play 100 matches

I think the error fluctuation is +50 and -50 for both engines

so the total error level is +100 or -100

so the tourney result is still useless cuz the error gap is 100

if engine A and engine B play 300 matches

I think the error fluctuation is +36 and -36 for both engines

so the total error level is +72 or -72

so the tourney result is kind of usable cuz the error gap is still big that is 72

but I guess realistically

ppl can run usually 100 matches

and 300 at most for most occasions

the question is

how many matches should you run for one engine Vs. one engine

when we consider the sweet spothow many matches should you run for one engine Vs. one engine

when we consider the sweet spot

cuz of course running a tourney for 40k rounds is good

cuz the error fluctuation is +2/-2

so the result is very accurate

but ppl can't run that many rounds

unless ppl have supercomputers

so realistically we have to decrease the total number of rounds for one Vs. one

***

if engine A and engine B play 30 matches

I think the error fluctuation is +100 and -100 for both engines

so the total error level is +200 or -200

so the tourney result is useless cuz the error gap is 200

if engine A and engine B play 100 matches

I think the error fluctuation is +50 and -50 for both engines

so the total error level is +100 or -100

so the tourney result is still useless cuz the error gap is 100

if engine A and engine B play 300 matches

I think the error fluctuation is +36 and -36 for both engines

so the total error level is +72 or -72

so the tourney result is kind of usable cuz the error gap is still big that is 72

but I guess realistically

ppl can run usually 100 matches

and 300 at most for most occasions

**so we have to find the sweet spot****the sweet number of total matches between 2 engines****while not decreasing the accuracy of the tourney result**