janalsncm
4 days ago
I am curious why the author chose a genetic algorithm rather than standard backprop to distill the evil. Logistic regression seems like a pretty reasonable choice and it’ll be a lot faster than a genetic algorithm. Add an L1 penalty for sparsity.
In the past I’ve tried distilling not just the eval function but the outcome of the search (something like depth 20) in a neural net. It kind of works, but not very well until the net is pretty big. Deepmind did something similar.
obrhubr
4 days ago
I didn’t think of that, but I guess the conclusion as to performance of the model would have remained the same.
janalsncm
4 days ago
Reading over your article again, it seems like you made the same “mistake” as I did. In other words, the evals you’re seeing in the Lichess PGN aren’t the raw outputs of the Stockfish evaluation function, they’re the output of the search, which is a highly nonlinear function of millions of evals. If I recall from their docs, it’s something like depth 18, so your chances of distilling it with 20k parameters is essentially zero.
(I put mistake in quotes here because Deepmind showed that with a server farm of TPUs it is possible to distill the search as well. So it’s not impossible per se.)
But that’s ok! You’ve created a super fast evaluation function. Instead of trying to distill the depth 18 output, it will be much more realistic to distill the depth zero output, the NNUE. If you rerun those FENs in Stockfish you can pretty quickly create an easier dataset.