AsyncVibes
12 hours ago
I've been working on GENREG (Genetic Regulatory Networks), an evolutionary learning system that trains neural networks without gradients or backpropagation. Instead of calculating loss derivatives, genomes accumulate "trust" based on task performance and reproduce through trust-based selection. Training uses GPU but inference runs on low-end CPUs. Today I hit 81.47% accuracy on the official MNIST test set using pure evolutionary pressure. The Setup Architecture: Simple MLP (784 → 64 → 10) Population: 200 competing genomes Selection: Trust-based (high performers reproduce) Mutation: Gaussian noise on offspring weights Training time: ~600 generations, ~40 minutes Results MNIST (64 neurons, 50K params): 81.47% test accuracy Best digits: 0 (94%), 1 (97%), 6 (85%) Hardest: 5 (61%), 8 (74%), 3 (75%) The 32-neuron version (25K params) achieved 72.52% - competitive performance with half the parameters. UMAP embeddings reveal the learning strategy: 32-neuron model: Can't separate all 10 digits. Masters 0 and 1 (>90%) but confusable digits like 5/3/8 collapse into overlapping clusters. 64-neuron model: Clean 10-cluster topology with distinct regions. Errors at decision boundaries between similar digits. Key Discoveries
Fitness signal stability is critical: Training plateaued at 65% with 1 random image per digit. Variance was too high. Switching to 20 images per digit fixed this immediately. Child mutation drives exploration: Mutation during reproduction matters far more than mutating existing population. Disabling it completely flatlined learning. Capacity forces trade-offs: The 32-neuron model initially masters easy digits (0, 1) then evolutionary pressure forces it to sacrifice some accuracy there to improve hard digits. Different optimization dynamic than gradient descent.
Most MNIST baselines reach 97-98% using 200K+ parameters. GENREG achieves 81% with 50K params and 72% with 25K params, showing strong parameter efficiency despite lower absolute ceiling. Other Results Alphabet recognition (A-Z): 100% mastery in ~1800 generations Currently testing generalization across 30 font variations Limitations Speed: ~40 minutes to 81% vs ~5-10 minutes for gradient descent Accuracy ceiling: Haven't beaten gradient baselines yet Scalability: Unclear how this scales to larger problems Current Experiments Architecture sweep (16/32/64/128/256 neurons) Mutation rate ablation studies Curriculum learning emergence Can we hit 90%+ on MNIST? Minimum viable capacity for digit recognition?