hansvm
9 months ago
The general strategy of creating a differentiable representation of a problem and simply describing the constraints is pretty powerful. See also databases (allowing arbitrary knowledge storage to be a tightly integrated part of a larger ML problem), graph layouts (you can do _way_ better than something like graphviz if you add arbitrary differentiable characteristics as constraints in generation -- mixing and matching between the better parts of normal layout routines using your human intuition to say what's important about this graph in particular), ....
nobuyuki83
9 months ago
I am one of the authors, Nobuyuki Umetani from the University of Tokyo, who implemented all the codes. I am delighted to see so many good discussions here!
I fully understand that the quality of the result is low for the architectural standard. This research does not mean computing a perfect floor plan ready for architectural design. It's more like it gives the architect some rough design choices in the very early design stage.
This research presents a new shape representation (differentiable Voronoi) for solving layout problems. These challenging problems typically involve geometrical and topological constraints. Existing shape representations are classified mainly into explicit (mesh) and implicit (using an implicit function) representations. While mesh representation is challenging in handling topological change, implicit representation is problematic because the boundary is not explicitly defined. Thus, it is difficult to compute the loss function based on the boundary shape. Differential Voronoi representation has both implicit and explicit representation properties (the wall is defined explicitly while the distance function from sites determines the shape) and is suitable for optimizing geometry and topology together.
Thank you! I am happy that our research reaches such a large audience!!
CMCDragonkai
9 months ago
Could you expand on the graph layout constraints? I'm working on something that expresses social system architecture and I want something better than force and yed's tooling was interesting but not opensource.
hansvm
9 months ago
Sure! The core of the concept is just building up a library of differentiable constraints you might like to apply to a graph and then mixing and matching (including higher/lower weights on some of the constraints) according to the human intuition you have about the graph you'd like to display.
- Suppose you want the graph to roughly fit on a page, being neither too big nor too small. This is the basic force-directed graph constraint (edges pull nodes together, all pairs of nodes push each other apart). It's often sub-par by itself because it doesn't explicitly do anything interesting with labels or edges and because it tends to emphasize local clusters while letting the rest of the graph remain fairly busy. If you _want_ to spot clusters and don't care much about other edges (maybe in disease tracking for example), that might not be a bad final result, but it's more useful as a building block onto which you can tack on other behaviors.
- Suppose the graph has meaningful topology you want to highlight. E.g., you've done a topological level sort on a DAG and want nodes in the same level to be closer to each other than nodes in other levels. Add an error term pulling nodes in the same level together. The above force-directed constraint will then handle the rest of the layout.
- Laying out nodes is a bit different from laying out edges. If you model your edges as something curvy and differentiable (splines aren't a bad choice, though perhaps add a term minimizing wiggliness too), you can add an error term penalizing too much "busyness" in the graph, e.g. by saying that it's good if you have large regions of whitespace (add up the squared areas of all the connected regions created by your edges and the edges of your bounding box, then negate that to get an error term). This will tend to coalesce edges into meaningful clusters.
- You might want to add a "confusion" term to your edges. One I like is maximizing the _average_ (not total, otherwise you'll encourage crossings) cosine distance at edge crossings.
- Force directed layouts naturally minimize edge crossings, but you might want to add an explicit term for that if we're adding in other competing constraints.
- If you have any human intuition as to what's important in the graph (apologies, "social system architecture" isn't parsing in my brain as a cohesive concept, so I can't provide any concrete examples) then you can tag nodes (and/or edges) with an "importance" factor. The error term you'll likely want is some combination of higher repulsive forces between those nodes and the nodes in the rest of the graph (making the important information distinct), and weaker repulsive forces between the nodes in the other part of the graph (so they take up less space relative to the important structure). A fairly clean way to handle that is a repulsive force proportional to the largest importance squared divided by the smallest importance (or some monotonic function on the constituent large/small parts).
- Labels are something else you might want to handle well. As with the rest of this, there are lots of error terms you might choose. An easy one is adding a repulsive force from the edges of the label to the other nodes, an attractive force to the parts of the graph the labels should be labeling, and a repulsive force from the edges of the label to the graph edges.
- An important point is that you can represent basically all of the other interesting graph layouts as just some projected [0] gradient constraint being optimized. E.g., to get a hive plot [1] you impose a radial constraint on nodes according to their assigned groups and just use rules minimizing edge crossings, minimizing confusion at the existing crossings, and maximizing large chunks of whitespace in the graph to meaningfully cluster the edges (also something minimizing wiggliness if your edge description is complicated).
- And so on. This is quite long. Maybe a good general principle is that if you don't like something about a given layout, if you can quantify what you don't like then you can add that as an explicit error term and get a better graph.
[0] If you have "hard" constraints which can't be violated, replace gradient descent with projected gradient descent. Take a step according to the gradient, then "project" the parameter vector being optimized to the nearest location not violating those constraints (or, more simply, just project each node/edge/... individually, though convergence will likely be a bit slower).
[1] https://ggraph.data-imaginist.com/articles/Layouts.html#hive...
CMCDragonkai
9 months ago
Thanks for the detailed response, these are all very interesting constraints, but how would you convert this in an algorithm? It sounds a bit like simulated annealing? I'm aware of gradient descent algorithms used in machine learning, but how would you end up applying this to a graph layout problem?
Regarding social system architecture, I just mean system architecture, but applied to social systems.
> - Suppose the graph has meaningful topology you want to highlight. E.g., you've done a topological level sort on a DAG and want nodes in the same level to be closer to each other than nodes in other levels. Add an error term pulling nodes in the same level together. The above force-directed constraint will then handle the rest of the layout.
An example is like a loose graph of nodes, where we have 2 kinds of nodes, identity nodes and authority nodes. Identities are important - because they are publically discoverable. I want the identity nodes to be more important - and I guess that's where things like topological sort is interesting.
sleepydog
9 months ago
Do you know of a good introduction to this topic?
myhf
9 months ago
Differentiable Programming from Scratch
Aeolun
9 months ago
> It all starts with the definition you learn in calculus class
Whoops, went a step too far already xD
notjoemama
9 months ago
Me too. While I recall Calculus class, I’ve been paid to operate at the algebra level for years. I think this walkthrough is a retread of differential equations, in particular the kind that attempt to approximate the result of the function as it approaches the limit of a point. I get lost at the bits involving dimensionality. I think it’s evident but it’s been years since I’ve read mathematics at this level.
philipov
9 months ago
It took me a moment to realize you were using "Algebra" in a pejorative sense to mean a level of mathematics beneath Calculus, rather than as "The study of composable operators on quantitative spaces." Algebra is a very deep subject, no less advanced than Analysis (The branch of mathematics that includes Calculus).
kqr
9 months ago
A lot of people say "algebra" when they mean "arithmetic"!
hansvm
9 months ago
Not really. I can find or write one if you're very interested.
- Gradients (slopes) are incredibly powerful computationally. If you can compute a function, you can compute its slope with almost no overhead. Relating to physical objects, imagine walking at the top of an igloo (where it's pretty flat and you won't lose your balance) vs the sides of an igloo (where it's vertical, and you'll fall over or otherwise break something). For real-world functions. you can ask where you are (top, bottom, middle, ... of the igloo) _and also_ what the slope is (flat, steep, ...) with the second step happening nearly for free.
- The whole "technique" is that when the things you want aren't being found, walk in the direction of least badness. Gradients define that direction, and since they're nearly free computationally you can abuse them as much as you want.
- The rest of the "technique" is figuring out how to describe ordinary comp-sci data structures as differentiable entities. E.g., the simplest way to differentiate most discrete objects (like database insertions) is to insert them probabilistically and apply your derivatives/slopes to the likelihood of the insertion succeeding.
rsktaker
9 months ago
I wish all explanations made complicated concepts this manageable!
> the simplest way to differentiate most discrete objects (like database insertions) is to insert them probabilistically and apply your derivatives/slopes to the likelihood of the insertion succeeding.
I hope it wouldn't trouble you to go a bit further with this idea? Namely, the idea of differentiating an object or database insertion sounds like it shouldn't be structurally possible.
And why would there be a probability function for a db insertion succeeding? How could it fail - if the db was full or if there are too many requests?
hansvm
9 months ago
Apologies, I'll flesh that out a bit:
Imagine the particular data structure you'd like to use for your database has if/else branching logic. A simple example thereof is some kind of a BinaryTree<Key, Value> type. Till your termination condition, if the key is bigger go one direction, else go the other direction as you walk down the tree.
The problem you'll find is that the derivative of the error with respect to the _key_ is always zero. Intuitively, nudging the key just a little bit won't change the query result. Imagine some kind of a step function which is 0 for negative inputs and 1 for positive inputs. The slope at every input (except for something interesting happening at 0 -- not a "slope" mathematically, but for your real-world problem you might care about what happens there) is exactly zero.
Phrased another way, the optimization problem you'd like to solve via derivatives and gradients is discrete, requiring combinatorial techniques instead of gradient descent to solve -- at least, that's true if choosing the right database key is an important part of the problem.
If you're able to somehow make the composite algorithm tolerant of probabilistic answers, by far the easiest tweak you can make is to replace if/else logic with a probabilistic if/else. When the key is a lot bigger, almost certainly walk right in the tree. When it's a little bigger, you have a nontrivial chance of walking left instead. When it's a little smaller, you have a bigger chance of walking left and a nontrivial chance of walking right.
Doing so allows you to represent the output as `p * f(right) + (1 - p) * f(left)`, which is differentiable in `p`. There's a bit of nuance in how you get the proper gradient if you only actually take one path (basically, some of the time you'll take the right branch, some of the time you'll take the left, and you want to re-weight the gradient whenever you take a given branch so that if you made that decision many times and averaged the results you'd converge to the formula above), but since you have something differentiable in `p` you're able to propagate that information back to the key you used to derive the probability.
The space where I most see people wanting differentiable databases is in creating giant ML thingamabobs. In that case, the high dimensionality of the value being stored can work in your favor. E.g., you could query the DB a few times, record the probabilities of each path, and use those to do a weighted average of the values you found. High-probability paths will dominate the result, and low-probability paths might still mix in a little interesting information.
That technique would likely not work if you wanted to, e.g., just drop in a differentiable database into your favorite web app and try to follow the gradients to find the inputs causing a certain bug. You'd need something fancier, probably not based on trees at all (at least not explicitly; you might use a tree internally to represent the data, but the conceptual computation wouldn't have those if/else step-function cliffs).
rsktaker
9 months ago
So there will be some key given to the database. And the key either gets results from the right branch or the left (at decision points, as opposed to leaves which would be the result itself). But if you hardcode the branching condition to be something like a greater than or less than, it's not very useful in finding the best result for the key.
But if you have a probabilistic setup, then the key has some chance of retrieving data from the right and from the left. Then we have {expected goodness = p * f(right) + (1 - p) * f(left)}.
Where f is some objective function that tells us the 'goodness' of the resulting data the key got from going right or left. If you differentiate the expected goodness with respect to p you get f(right) - f(left). Which is how much the expected goodness will change if I increase the probability of going right.
Suppose the derivative f(right) - f(left) was positive, so going right was better than going left, then I can write some script to change p positively so that next time when I get that key the probability of going right is higher. That way we can optimize p for given keys.
Very interesting! I hope I got everything right; where I couldn't understand, I used gpt to help me out (when it told me f was a 'goodness' function, I had a breakthrough lol). The discrete way feels like clean, understandable logic while the probabilistic way with changing p values feels really messy, but I can see intuitively why the latter gives better results.
CMCDragonkai
9 months ago
I think in the same way that the sigmoid is to the step function. Fuzzy logic too.