cgadski
3 hours ago
> The implications of these geometric properties are staggering. Let's consider a simple way to estimate how many quasi-orthogonal vectors can fit in a k-dimensional space. If we define F as the degrees of freedom from orthogonality (90° - desired angle), we can approximate the number of vectors as [...]
If you're just looking at minimum angles between vectors, you're doing spherical codes. So this article is an analysis of spherical codes… that doesn't reference any work on spherical codes… seems to be written in large part by a language model… and has a bunch of basic inconsistencies that make me doubt its conclusions. For example: in the graph showing the values of C for different values of K and N, is the x axis K or N? The caption says the x axis is N, the number of vectors, but later they say the value C = 0.2 was found for "very large spaces," and in the graph we only get C = 0.2 when N = 30,000 and K = 2---that is, 30,000 vectors in two dimensions! On the other hand, if the x axis is K, then this article is extrapolating a measurement done for 2 vectors in 30,000 dimensions to the case of 10^200 vectors in 12,888 dimensions, which obviously is absurd.
I want to stay positive and friendly about people's work, but the amount of LLM-driven stuff on HN is getting really overwhelming.
sdenton4
35 minutes ago
Spherical codes are kinda of obscure: I haven't heard of them before, and Wikipedia seems to have barely heard of them. And most of the Google results seem to be about playing golf in small dimensions (ie, how many can we optimally pack in n<32 dimensions?).
People do indeed rediscover previously existing math, especially when the old content is hidden under non-obvious jargon.
jryio
3 hours ago
Agree. What writing is better for understanding geometric properties or information in high dimensional vector spaces + spherical codes?
cgadski
10 minutes ago
There's a lot of beautiful writing on these topics on "pure math" side, but it's hard to figure out what results are important for deep learning and to put them in a form that doesn't take too much of an investment in pure math.
I think the first chapter of [1] is a good introduction to general facts about high-dimensional stuff. I think this is where I first learned about "high-dimensional oranges" and so on.
For something more specifically about the problem of "packing data into a vector" in the context of deep learning, last year I wrote a blog post meant to give some exposition [2].
One really nice approach to this general subject is to think in terms of information theory. For example, take the fact that, for a fixed epsilon > 0, we can find exp(C d) vectors in R^d with all pairwise inner products smaller than epsilon in absolute value. (Here C is some constant depending on epsilon.) People usually find this surprising geometrically. But now, say you want to communicate a symbol by transmitting d numbers through a Gaussian channel. Information theory says that, on average, I should be able to use these d numbers to transmit C d nats of information. (C is called the channel capacity, and depends on the magnitude of the noise and e.g. the range of values I can transmit.) The statement that there exist exp(C d) vectors with small inner products is related to a certain simple protocol to transmit a symbol from an alphabet of size exp(C d) with small error rate. (I'm being quite informal with the constants C.)
[1] https://people.math.ethz.ch/~abandeira//BandeiraSingerStrohm... [2] https://cgad.ski/blog/when-numbers-are-bits.html
jvanderbot
3 hours ago
The problem with saying something is LLM generated is it cannot be proven and is a less-helpful way of saying it has errors.
Pointing out the errors is a more helpful way if stating problems with the article, which you have also done.
In that particular picture, you're probably correct to interpret it as C vs N as stated.
Blackthorn
an hour ago
> The problem with saying something is LLM generated is it cannot be proven and is a less-helpful way of saying it has errors.
It's a very helpful way of saying it shouldn't be bothered to be read. After all, if they couldn't be bothered to write it, I can't be bothered to read it.