cooljoseph
9 days ago
This sounds somewhat like a normalizing flow from a discrete space to a continuous space. I think there's a way you can rewrite your DDN layer as a normalizing flow which avoids the whole split and prune method.
1. Replace the DDN layer with a flow between images and a latent variable. During training, compute in the direction image -> latent. During inference, compute in the direction latent -> image. 2. For your discrete options 1, ..., k, have trainable latent variables z_1, ..., z_k. This is a "code book".
Training looks like the following: Start with an image and run a flow from the image to the latent space (with conditioning, etc.). Find the closest option z_i, and compute the L2 loss between z_i and your flowed latent variable. Additionally, add a loss corresponding to the log determinant of the Jacobian of the flow. This second loss is the way a normalizing flow avoids mode collapse. Finally, I think you should divide the resulting gradient by the softmax of the negative L2 losses for all the latent variables. This gradient division is done for the same reason as dividing the gradient when training a mixture-of-experts model.
During inference, choose any latent variable z_i and flow from that to a generated image.
diyer22
8 days ago
Thanks for the idea, but DDN and flow can’t be flipped into each other that easily.
1. DDN doesn’t need to be invertible. 2. Its latent is discrete, not continuous. 3. As far as I know, flow keeps input and output the same size so it can compute log|detJ|. DDN’s latent is 1-D and discrete, so that condition fails. 4. To me, “hierarchical many-shot generation + split-and-prune” is simpler and more general than “invertible design + log|detJ|.” 5. Your design seems to have abandoned the characteristics of DDN. (ZSCG, 1D tree latent, lossy compression)
The two designs start from different premises and are built differently. Your proposal would change so much that whatever came out wouldn’t be DDN any more.
godelski
4 days ago
Fwiw, I'm not convinced its a Flow and that's my niche. But there are some interesting similarities that actually make me uncertain. A deeper dive is needed.
But you address your points
> 1. DDN doesn’t need to be invertible
The flow doesn't need to be invertible at every point in the network. As long as you can do the mapping the condition will hold. Like the classic layer is [x_0, s(x_1)*x_0 + t(x_1)]. s,t are parametrized by an arbitrary neural network. But some of your layers look more like invertible convolutions.I think it is worth checking. FWIW I don't think an equivalence would undermine the novelty here.
> 2. Its latent is discrete, not continuous.
That's perfectly fine. Flows aren't restricted that way. Technically all flows aren't exactly invertible as you noise the data to dequantize it.Also note that there are discrete flows. I'm not sure I've seen an implementation where each flow step is discrete but that's more an implementation issue.
> 3. As far as I know, flow keeps input and output the same size so it can compute log|detJ|.
You have a unet, right? You're full network is doing T:R^n -> R^n? Or at least excluding the extra embedding information? Either way I think you might not interested in "Approximation Capabilities of Neural ODEs and Invertible Residual Networks". At minimum their dimensionality discussion and reference to the Whitney Embedding Theorem is likely valuable to you (I don't think they say it by name?).You may also want to look at RealNVP since they have a hierarchical architecture which does splitting.
Do note that NODEs are flows. You can see Ricky Chen's works on i-resnets.
As for the Jacobian, I actually wouldn't call that a condition for a flow but it sure is convenient. The typical Flows people are familiar with use a change of variables formula via the Jacobian but the isomorphism is really the part that's important. If it were up to me I'd change the name but it's not lol.
> 5. Your design seems to have abandoned the characteristics of DDN. (ZSCG, 1D tree latent, lossy compression)
I think you're on the money here. I've definitely never seen something like your network before. Even if it turns out to not be its own class I don't think that's an issue. It's not obviously something else but I think it's worth digging into.FWIW I think it looks more like a diffusion model. A SNODE. Because I think you're right that the invertibility conditions likely don't hold. But in either case remember that even though you're estimating multiple distributions that that's equivalent to estimating a single distribution.
I think the most interesting thing you could do is plot the trajectories like you'll find in Flow and diffusion papers. If you get crossing you can quickly rule out flows.
I'm definitely going to spend more time with this work. It's really interesting. Good job!