Animats
9 months ago
Does anyone know what they mean by "wave synchronization"? That's supposedly their trick to prevent all those parallel CPUs from blocking waiting for data. Found a reference to something called that for transputers, from 1994.[1] May be something else.
Historically, this has been a dead end. Most problems are hard to cut up into pieces for such machines. But now that there's much interest in neural nets, there's more potential for highly parallel computers. Neural net operations are very regular. The inner loop for backpropagation is about a page of code. This is a niche, but it seems to be a trillion dollar niche.
Neural net operations are so regular they belong on purpose-built hardware. Something even more specialized than a GPU. We're starting to see "AI chips" in that space. It's not clear that something highly parallel and more general purpose than a GPU has a market niche. What problem is it good for?
[1] https://www.sciencedirect.com/science/article/abs/pii/014193...
bhouston
9 months ago
GPUs have wavefronts so I assume it is similar? Here is a page that explains it:
adrian_b
9 months ago
Nope.
AMD's "wavefront" is an obfuscated word for what NVIDIA calls "warp".
NVIDIA's "warp" is an obfuscated word for what has been called for many decades in the computer literature as "thread". (NVIDIA's "thread" is an obfuscated word that means something else than what it means in the non-NVIDIA literature.)
NVIDIA has thought that it is a good idea to create their own terminology where many traditional terms have been renamed without any reason. AMD has thought that it is a good idea to take the entire NVIDIA terminology and replace again all terms with other words.
mystified5016
9 months ago
I'd assume that 'warp' is taken from textiles: https://en.m.wikipedia.org/wiki/Warp_and_weft
A warp is a thread, but a thread within a matrix of other threads.
I'm not into GPU programming, but doesn't nvidia have some notion of arranging threads in a matrix sort of like this?
adrian_b
9 months ago
Nope.
In NVIDIA parlance, a thread is the body of a "parallel for" structure, i.e. the sequence of operations that are executed for an array element, which are executed by one SIMD lane of a GPU.
A "warp" is a set of "threads", normally of 32 "threads" for the NVIDIA GPUs, the number of "threads" in a "warp" being the number of SIMD lanes of the execution units.
CUDA uses what Hoare (1978) has named "array of processes" and which in many programming languages is named "parallel for" or "parallel do".
This looks like a "for" loop, but its body is not executed sequentially in a loop, but the execution is performed concurrently for all elements of the array.
A modern CPU or GPU consists of many cores, each cores can execute multiple threads and each thread can execute SIMD instructions that perform an operation for multiple array elements, on distinct SIMD lanes.
When a parallel for is launched in execution, the array elements are distributed over all existing cores, threads and SIMD lanes. In the case of NVIDIA, the distribution is handled by the CUDA driver, so it is transparent for the programmer, who does not have to know the structure of the GPU.
The use by NVIDIA of the word "thread" would have corresponded with the reality if the GPU would not have used SIMD execution units. Real GPUs use SIMD instructions that process a number of array elements typically between 16 and 64. NVIDIA's "warp" is the real thread executed by the GPU, which processes multiple array elements, while NVIDIA's "thread" is what would have been executed by a thread of a fictitious GPU that does not use SIMD, so it would process only one array element per thread.
mystified5016
9 months ago
I dunno, it still sounds to me that nvidia is taking their (admittedly inaccurate) concept of a thread, putting a bunch in parallel, and calling that a warp to be cute.
I think the analogy still makes a kind of sense if you accept it at face value and not worry about the exact definitions. Which is really all it needs to do, IMO.
Again, I don't really know anything about GPUs, just speculating on the analogy.
gregw2
9 months ago
Agreed that warp is a marketing term, but it is definitely not something that should be called "threads" except in the very loosest sense of the term.
A bunch of threads in parallel implies MIMD parallelism- multiple instructions multiple data.
A warp implies SIMD parallelism - single instruction multiple data (although technically SIMT, single instructions multiple threads https://en.wikipedia.org/wiki/Single_instruction,_multiple_t...).
From both a hardware and software perspective those are very different types of parallelism that Nvidia's architects and the architects of its predecessors at Sun/SGI/Cray/elsewhere were intimately familiar with. See: https://en.wikipedia.org/wiki/Flynn%27s_taxonomy
narag
9 months ago
We're starting to see "AI chips" in that space.
"Positronic" came to my mind.
darby_nine
9 months ago
My god the future sucks far more than we could have ever imagined. Imagine being sold a chatbot and being told it's an android!
narag
9 months ago
FWIW, I already carry an android in my pocket.
mikewarot
9 months ago
The reason problems are hard to fit into most of what's tried is that everyone is trying to save precious silicon space and fit a specific problem, adding special purpose blocks, etc. It's my belief that this is an extremely premature optimization to make.
Why not break it apart into homogeneous bitwise operations? That way everything will always fit. It would also simplify compilation.
aaron695
9 months ago
[dead]