An interesting article that should raise some questions:

Help and comments from Tim Mann, Peter Berger, Andreas Schwartmann, Severi Salminen, Roger Brown,

Verifying Fairness
"Be wary because some programs can hog CPU resources. It is imperative that you ensure that any program in your tournament is not using up all the resources when it runs or has an unfair advantage. For example the choice of interface can affect results. The old Chessbase Winboard adapter deliberately put all Winboard engines at a disadvantage compared to its own products when run under the Fritz interface. Even the new Chessbase UCI adapter can cause problems, especially when setting the non-native hash sizes correctly. Sometimes there are work-arounds for these problems and you may have to dig deep to find an answer to your problem.

Computer chess results can be adversely affected by poor opening books. In fact, it has being suggested that some commercial programs have strong “killer” books, which accounts for much of their strength, and each new version improves because of a better book rather through the strength of the engine. This is of course, an extreme view. But no one cannot deny that opening books play a big part in determining the results.

Using Nunn positions for testing is an attempt to avoid the variability in the quality of opening books by starting all the chess programs from the same 16 default opening positions. However, just as humans use openings that suit their playing style, opening books are designed to allow each chess engine to play to its best capability and are an integral part of the chess program. Another problem is that the Nunn's test assumes that a program is stronger than another only if it demonstrates its strength over its rival in the 16 opening positions that are played. This is extremely artificial. In your own mind replace “two programs” with “two human GMs” and consider the consequences. Forcing one opening book on all engines or removing all opening books runs into the exact same problems as the Nunn testing.

[Speaking as a chess engine author I can say that Nunn tournaments and common book tournaments have much less meaning to me than tournaments where my engine is allowed its own book. I have spent many, many hours tuning my opening book and correcting mistakes. It is quite frustrating to see the book I have labored hard over taken away because of flawed reasoning. If opening books are disabled or Nunn positions are used, or common books are employed I feel it is quite proper to question the tournament director as to what, exactly, he is trying to test. Certainly it will not be a real-world comparison because people are expected to use the opening book supplied with an engine – it is there for a specific purpose! The questions that come to my mind are these: Does the proposed Nunn test serve any useful purpose whatsoever? Is forcing a common opening book just a way to promote the tournament director's own book? – It may be a fine book, I am not arguing this, but what does it have to do with testing the relative strength of engines?]

Similarly, disabling tablebases or using a subset of available ones will handicap some engines (the ones without endgame logic) and reward others (the ones that do). All of these factors are important in deciding what should be common to all engines and what should not.
There is a question of whether learning should be turned on, especially in a tournament with engines that don't have this feature. My own personal view is that learning should be turned on, since the lack of the learning feature in one engine shouldn't handicap/hinder the use of it in a engine that does. On the other hand you might get a lot of repeated losses by the non-learning engine if the learning engine has aggressive book learning. Of course all learning files should be purged before a tournament so that learning engines do not have an unfair advantage before the tournament starts.

Some engines hold their opening books in memory while others read theirs on-the-fly from disk. Engines holding the book in memory will always use more memory than those that don't. Some tournament directors seem to be concerned about total memory usage and subtract an engine's book memory from the hash size in an attempt to “equalize”. It seems to me that this is not equalization but an unfair advantage given to the disk-reading engines or to engines with a very compact way of storing the opening moves."