ttul
20 hours ago
This is a cool result. Deep learning image models are trained on enormous amounts of data and the information recorded in their weights continues to astonish me. Over in the Stable Diffusion space, hobbyists (as opposed to professional researchers) are continuing to find new ways to squeeze intelligence out of models that were trained in 2022 and are considerably out of date compared with the latest “flow matching” models like Qwen Image and Flux.
Makes you wonder what intelligence is lurking in a 10T parameter model like Gemini 3 that we may not discover for some years yet…
cheald
12 hours ago
Stable Diffusion 1.5 is a great model for hacking on. It's powerful enough that it encodes some really rich semantics, but small and light enough that iterative hacking on it is quick enough that it can be done by hobbyists.
I've got a new potential LoRA implementation that I've been testing locally (using a transformed S matrix with frozen U and V weights from an SVD decomposition of the base matrix) that seems to work really well, and I've been playing with both changes to the forward-noising schedule and the loss functions which seem to yield empirically superior results of the standard way of doing things. Epsilon prediction may be old and busted (and working on it makes me really appreciate flow matching!) but there's some really cool stuff happening in its training dynamics that are a lot of fun to explore.
It's just a lot of fun. Great playground for both learning how these things work and for trying out new ideas.
throwaway314155
7 hours ago
I’d love to follow your work. Got a GitHub?
cheald
6 hours ago
I do (same username), but I haven't published any of this (and in fact my Github has sadly languished lately); I keep working on it with the intent to publish eventually. The big problem with models like this is that the training dynamics have so many degrees of freedom that every time I get close to something I want to publish I end up chasing down another set of rabbit holes.
https://gist.github.com/cheald/7d9a436b3f23f27b8d543d805b77f... - here's a quick dump of my SVDLora module though. I wrote it for use in OneTrainer though it should be adaptable to other frameworks easily enough. If you want to try it out, I'd love to hear what you find.
ethmarks
7 hours ago
Gemini 3 is a 10 trillion parameter model?
smerrill25
19 hours ago
Hey, do you know how you figured out about this information? I would be super curious to keep track of current ad-hoc ways of pushing older models to do cooler things. LMK
ttul
13 hours ago
1) Reading papers. 2) Reading "Deep Learning: Foundations and Concepts". 3) Taking Jeremy Howard's Fast.ai course