codingdave
a day ago
The discussions around this point are taking it too seriously, even when they are 100% correct. LLMs are not deterministic, so they are not compilers. Sure, if you specify everything - every tiny detail, you can often get them to mostly match. But not 100%. Even if you do fix that, at that point you are coding using English, which is an inefficient language for that level of detail in a specification. And even if you accept that problem, you still have gone to a ton of work just to fight the fundamental non-deterministic nature of LLMs.
It all feels to me like the guys who make videos of using using electric drills to hammer in a nail - Sure, you can do that, but it is the wrong tool for the job. Everyone knows the phrase: "When all you have is a hammer, everything looks like a nail." But we need to also keep in mind the other side of that coin: "When all you have is nails, all you need is a hammer." LLMs are not a replacement for everything that happens to be digital.
alpaylan
a day ago
I think the point I wanted to make was that even if it was deterministic (which you can technically make it to be I guess?) you still shouldn’t live in a world where you’re guided by the “guesses” that the model makes when solidifying your intent into concrete code. Discounting hallucinations (I know this a is a big preconception, I’m trying to make the argument from a disadvantaged point again), I think you need a stronger argument than determinism in the discussion against someone who claims they can write in English, no reason for code anymore; which is what I tried to make here. I get your point that I might be taking the discussion to seriously though.
liveoneggs
a day ago
The future is about embracing absolute chaos. The great reveal of LLMs is that, for the most part, nothing actually mattered except the most shallow approximation of a thing.
belZaah
20 hours ago
This is true only for a small subset of problems. If you write crypto or hardware drivers, details do matter.
holden_nelson
7 hours ago
I’m not an AI evangelical, but I think it remains to be seen what the size of that subset is. Those that write crypto and hardware drivers are certainly a small subset of programmers. Most of us are pumping out enterprise crud and arguing with our PMs.
pjmlp
3 hours ago
Glueing SaaS systems together with low code tools, many of which now also have AI driven configurations.
wizzwizz4
a day ago
The great reveal of LLMs is that our systems of checks and balances don't really work, and allow grifters to thrive, but despite that most people were actually trying to do their jobs properly. Perhaps nothing matters to you except the most shallow approximation of a thing, but there are usually people harmed by such negligence.
liveoneggs
21 hours ago
I'm just as upset as you are about it, believe me. Unfortunately I have to live in the world as I see it and what I've observed in the last 18-ish months is a complete breakdown of prior assumptions.
skydhash
a day ago
Imagine if the amount of a bank transfer does not matter, but it can only be an approximation, also you can approximate the selected account too. Or the system for monitoring the temperature of blood stockage for transfusion…
Often it seems like tech maximalists are the most against tech reliability.
wavemode
20 hours ago
Well, the person who vibe-coded the banking app also vibe-coded a bunch of test cases, so this will only affect a small percentage of customers. When it does and they lose a bunch of money, well, you have a PR team and they don't, so just sweep the story under the rug.
Imagine that - you got your project done ahead of schedule (which looks great on your OKRs) AND finally achieved your dream of no longer being dependent on those stupid overpaid, antisocial software engineers, and all it cost you was the company's reputation. Boeing management would be proud.
Lots of business leaders will do the math and decide this is the way to operate from now on.
snovv_crash
21 hours ago
No need to be so practical.
I suggest when their pointer dereferences, it can go a bit forward or backwards in memory as long as it is mostly correct.
SecretDreams
21 hours ago
Let's give people a choice. My banking will be deterministic, others can have probabilistic banking. Every so often, they transfer me some money by random chance, but at least they can say their banking is run by LLMs. Totally fair trade.
ModernMech
18 hours ago
I think the exact opposite is true: LLMs revealed that when you average everything together, it's really bland and uninteresting no matter how technically good. It's the small choices that bring life into a thing and transform it from slop into something interesting and worthy of attention.
liveoneggs
18 hours ago
I think we agree but my prediction is that the slop will win
raw_anon_1111
20 hours ago
Before LLMs and now more than a decade ago in my career, I was assigned a task and my job was to translate that task into a working implementation. I was guided by the “guesses” that other developers made. I had to trust that they could do FizzBuzz competently without having to tell them to use the mod operator
Then my job became I am assigned a larger implementation and depending on how large the implementation was, I had to design specifications for others to do some or all of the work and validate the final product for correctness. I definitely didn’t pore over every line of code - especially not for front end work that I stopped doing around the same time.
The same is true for LLMs. I treat them like junior developers and slowly starting to treat them like halfway competent mid level ticket takers.
blazinglyfast
a day ago
> even if it was deterministic (which you can technically make it to be I guess?)
No. LLMs are undefined behavior.
xixixao
a day ago
OP means “given the same input, produce the same output” determinism. This isn’t really much different from normal compilers, you might have a language spec, but at the end of the day the results are determined by the concrete compiler’s implementation.
But most LLM services on purpose introduce randomness, so you don’t get the same result for the same input you control as a user.
zaphar
19 hours ago
You can get deterministic output if just turn the temperature all the way down. The problem is that you usually get really bad results, deterministically. It turns out the randomness helps in finding solutions.
recursive
18 hours ago
You can also get deterministic output if you use whatever temperature you want and use an arbitrary fixed RNG seed.
CGMthrowaway
a day ago
>LLMs are not deterministic, so they are not compilers.
"Deterministic" is not the the right constraint to introduce here. Plenty of software is non-deterministic (such as LLMs! But also, consensus protocols, request routing architecture, GPU kernels, etc) so why not compilers?
What a compiler needs is not determinism, but semantic closure. A system is semantically closed if the meanings of its outputs are fully defined within the system, correctness can be evaluated internally and errors are decidable. LLMs are semantically open. A semantically closed compiler will never output nonsense, even if its output is nondeterministic. But two runs of a (semantically closed) nondeterministic compiler may produce two correct programs, one being faster on one CPU and the other faster on another. Or such a compiler can be useful for enhancing security, e.g. programs behave identically, resist fingerprinting.
Nondeterminism simply means the compiler selects any element of an equivalence class. Semantic closure ensures the equivalence class is well‑defined.
thwarted
21 hours ago
No, deterministic means that given the same inputs—source code, target architecture, optimization level, memory and runtime limits (because if the optimizer has more space/time it might find better optimizations), etc—a compiler will produce the same exact output. This is what reproducible builds is about: tightly controlling the inputs so the same output is produced.
That a compiler might pick among different specific implementations in the same equivalency class is exactly what you want a multi-architecture optimizing compiler to do. You don't want it choosing randomly between different optimization choices within an optimization level, that would be non-deterministic at compile time and largely useless assuming that there is at most one most optimized equivalent. I always want the compiler to choose to xor a register with itself to clear it if that's faster than explicitly setting it to zero if that makes the most sense to do given the inputs/constraints.
CGMthrowaway
20 hours ago
Determinism may be required for some compiler use cases, such as reproducible builds, and several replies have pointed that out. My point isn't that determinism is unimportant, but that it isn't intrinsic to compilation itself.
There are legitimate compiler use cases e.g. search‑based optimization, superoptimization, diversification etc where reproducibility is not the main constraint. It's worth leaving conceptual space for those use cases rather than treating deterministic output as a defining property of all compilers
thwarted
16 hours ago
Given the same inputs, the desire for search-based optimization, superoptimization, or diversification should still be predictable and deterministic, even if it produces something that is initially unanticipated. It makes no sense that that a given superoptimization search would produce different output—would determine some other method is now more optimized than another—if the initial input and state is exactly the same. It is either the most optimal given the inputs and the state or it is not.
You are attempting to hedge and leave room for a non-deterministic compiler, presumably to argue that something like vibe-compilation is valuable. However, you've offered no real use cases for a non-deterministic compiler, and I assert that such a tool would largely be useless in the real world. There is already a huge gap between requirements gathering, the expression of those requirements, and their conversion into software. Adding even more randomness at the layer of translating high level programming languages into low level machine code would be a gross regression.
sureglymop
21 hours ago
Don't LLMs create the same outputs based on the same inputs if the temperature is 0? Maybe I'm just misunderstanding.
AlotOfReading
19 hours ago
Unfortunately not. Various implementation details like attention are usually non-deterministic. This is one of the better blog posts I'm aware of:
https://thinkingmachines.ai/blog/defeating-nondeterminism-in...
8note
10 hours ago
i dont think theres anything that makes it essentiall that llms are non-deterministic though
if you rewrote the math to be all fixed point precision on big ints, i think you would still get the useful LLM results?
if somebody really wanted to make a compiler in an LLM, i dont think that nondetermism is problem
id really imagine an llm compiler being a set of specs, dependency versions, and test definitions to use though, and you'd introduce essential nondetermism by changing a version number, even if the only change was the version name from "experimental" to "lts"
moregrist
21 hours ago
Perhaps you're comfortable with a compiler that generates different code every time you run it on the same source with the same libraries (and versions) and the same OS.
I am not. To me that describes a debugging fiasco. I don't want "semantic closure," I want correctness and exact repeatability.
pjmlp
3 hours ago
That is exactly how JIT compilers work, you cannot guarantee 100% machines code generation across runs, unless you can reproduce the whole universe that lead to the same heuristics and decision tree.
candiddevmike
21 hours ago
I wish these folks would tell me how you would do a reproducible build, or reproducible anything really, with LLMs. Even monkeying with temperature, different runs will still introduce subtle changes that would change the hash.
mvr123456
21 hours ago
This reminds me of how you can create fair coins from biased ones and vice versa. You toss your coin repeatedly, and then get the singular "result" in some way by encoding/decoding the sequence. Different sequences might map to the same result, and so comparing results is not the same as comparing the sequences.
Meanwhile, you press the "shuffle" button, and code-gen creates different code. But this isn't necessarily the part that's supposed to be reproducible, and isn't how you actually go about comparing the output. Instead, maybe two different rounds of code-generation are "equal" if the test-suite passes for both. Not precisely the equivalence-class stuff parent is talking about, but it's simple way of thinking about it that might be helpful
cjbgkagh
21 hours ago
There is nothing intrinsic to LLM prevents reproducibility. You can run them deterministically without adding noise, it would just be a lot slower to have a deterministic order of operations, which takes an already bad idea and makes it worse.
candiddevmike
21 hours ago
Please tell me how to do this with any of the inference providers or a tool like llama.cpp, and make it work across machines/GPUs. I think you could maybe get close to deterministic output, but you'll always risk having some level of randomness in the output.
cjbgkagh
21 hours ago
Just because you can’t do it with your chosen tools it does not mean it cannot be done. I’ve already granted the premise that it is impractical. Unless there is a framework that already guarantees determinism you’ll have to roll your own, which honestly isn’t that hard to do. You won’t get competitive performance but that’s already being sacrificed for determinism so you wouldn’t get that anyway.
wat10000
18 hours ago
It's just arithmetic, and computer arithmetic is deterministic.
On a practical level, existing implementations are nondeterministic because they don't take care to always perform mathematically commutative operations in the same order every time. Floating-point arithmetic is not commutative, so those variations change the output. It's absolutely possible to fix this and perform the operations in the same order every time, implementors just don't bother. It's not very useful, especially when almost everything runs with a non-zero temperature.
I think the whole nondeterminism thing is overblown anyway. Mathematical nondeterminism and practical nondeterminism aren't the same thing. With a compiler, it's not just that identical input produces identical output. It's also that semantically identical input produces semantically identical output. If I add an extra space somewhere whitespace isn't significant in the language I'm using, this should not change the output (aside from debug info that includes column numbers, anyway). My deterministic JSON decoder should not only decode the same values for two runs on identical JSON, a change in one value in the input should produce the same values in the output except for the one that changed.
LLMs inherently fail at this regardless of temperature or determinism.
raw_anon_1111
18 hours ago
Once I create code with an LLM, the code is not going to magically change between runs because it was generated by an LLM unless it did an “#import chaos_monkey”
SecretDreams
21 hours ago
Agree. I'm not sure what circle of software hell the OP is advocating for. We need consistent outputs from our most basic building blocks. Not performance probability functions. Many softwares run congruently across multiple nodes. What a nightmare it would be if you had to balance that for identical hardware.
bigstrat2003
21 hours ago
> What a compiler needs is not determinism, but semantic closure.
No, a compiler needs determinism. The article is quite correct on this point: if you can't trust that the output of a tool will be consistent, you can't use it as a building block. A stochastic compiler is simply not fit for purpose.
pjmlp
3 hours ago
Kind of, dynamic compilers, are called dynamic exactly because they depending on profiling and heuristics.
What matters is observable execution.
hackinthebochs
21 hours ago
Compiler output can be inconsistent and correct. For any source code there is an infinite number of machine code sequences that maintain the semantic constraints of the source code. Correctness is defined semantically, not by consistency.
cv5005
21 hours ago
Bitwise identical output from a compiler is important for verification to protect against tampering, supply chain attacks, etc.
8note
10 hours ago
its a useful way to solve those problems, but i dont think that means its the only way?
tjr
21 hours ago
Sometimes determinism is exactly what one wants. For avionics software, being able to claim complete equivalence between two builds (minus an expected, manually-inspected timestamp) is used to show that the same software was used / present in both cases, which helps avoid redundant testing, and ensure known-repeatable system setups.
bee_rider
a day ago
Are conventional compilers actually deterministic, with all the bells and whistles enabled? PGO seems like it ought to have a random element.
vlovich123
21 hours ago
No, modulo bugs generally the same set of inputs to a compiler are guaranteed to produce the same output bit for bit which is the definition of determinism.
There’s even efforts to guarantee this for many packages on Linux - it’s a core property of security because it lets you validate that the compilation process or environment wasn’t tampered with illicitly by being able to verify by building from scratch.
Now actually managing to fix all inputs and getting deterministic output can be challenging, but that’s less to do with the compiler and more to do with the challenge of completely taking the entire environment (the profile you are using for PGO, isolating paths on the build machine being injected into the binary, programs that have things in their source or build system that’s non deterministic (e.g. incorporating the build time into the binary)
jcranmer
20 hours ago
It is generally considered a bug in a compiler if its output is nondeterministic. Of course, compilers are large, complex beasts, and nondeterminism is so easy to accidentally introduce (e.g., do a "for each" in a map where the key is a pointer), that it's probably not too hard to find cases that have nondeterminism.
> PGO seems like it ought to have a random element.
PGO should be deterministic based on the runs used to generate the profile. The runs are tracking information that should be deterministic--how many times does the the branch get taken versus not taken, etc. HWPGO, which relies on hardware counters to generate profiling information, may be less deterministic because the hardware counters end up having some statistical slip to them.
9rx
11 hours ago
> It is generally considered a bug in a compiler if its output is nondeterministic.
Reproducible builds are oft considered a desirable goal, but not having them is hardly a bug. Not even some of the most well known compilers out there, like gcc and clang, are deterministic by default — and not by accident; in those cases you can enable determinism if you wish.
pjmlp
19 hours ago
Not at all, when talking about managed runtimes.
Hence why it is hard to do benchmarks with various kinds of GC and dynamic compilers.
You can't even expect deterministic code generation for the same source code across various compilers.
123malware321
a day ago
well considering you use components like DFA to build compilers, yes they are determenistic. you also have reproducible builds etc.
or does your binary always come out differently each time you compile the same file??
You can try it. try to compile the same file 10 times and diff the resultant binaries.
Now try to prompt a bunch of LLMs 10 times and diff the returned rubbish.
sigbottle
21 hours ago
I think one of the best ways to understand the "nice property" of compilers we like isn't necessarily determinacy, but "programming models".
There's this really good blog post about how autovectorization is not a programming model https://pharr.org/matt/blog/2018/04/18/ispc-origins
The point is that you want to reliably express semantics in the top level language, tool, API etc. because that's the only way you can build a stable mental model on top of that. Needing to worry about if something actually did something under the hood is awful.
Now of course, that depends on the level of granularity YOU want. When writing plain code, even if it's expressively rich in the logic and semantics (e.g. c++ template metaprogramming), sometimes I don't necessarily care about the specific linker and assembly details (but sometimes I do!)
The issue I think is that building a reliable mental model of an LLM is hard. Note that "reliable" is the key word - consistent. Be it consistently good or bad. The frustrating thing is that it can sometimes deliver great value and sometimes brick horribly and we don't have a good idea for the mental model yet.
To constrain said possibility space, we tether to absolute memes (LLMs are fully stupid or LLMs are a superset of humans).
Idk where I'm going with this
whattheheckheck
7 hours ago
Now you know how directors and executives feel
wat10000
16 hours ago
PGO takes the profile as one of the inputs. Give it the same profile and you should get the same output. If you have a pipeline that does something like build, run and profile performance tests, then rebuild with PGO, then that won't be deterministic. But you've brought it on yourself in that case.
candiddevmike
21 hours ago
Yes, they will output the same file hash every time, short of some build time mutation. Thus we can have nice things like reproducible builds and integrity checks.
WithinReason
a day ago
LLMs are deterministic at minimal temperature. Talking about determinism completely misses the point. The human brain is also non-deterministic and I don't see anybody dismiss human written code based on that. If you remove randomness and choose tokens deterministically, that doesn't magically solve the problems of LLMs.
SecretDreams
21 hours ago
> The human brain is also non-deterministic and I don't see anybody dismiss human written code based on that.
Humans, in all their non deterministic brain glory, long ago realized they don't want their software to behave like their coworkers after a couple of margaritas.
WithinReason
21 hours ago
You seem to be under the impression that I'm promoting LLMs, not sure where you got that idea. The argument is that non-determinism has nothing to do with the issues of LLMs.
9rx
a day ago
> LLMs are not deterministic
They are designed to be where temperature=0. Some hardware configurations are known defy that assumption, but when running on perfect hardware they most definitely are.
What you call compilers are also nondeterministic on 'faulty' hardware, so...
xigoi
19 hours ago
While they’re technically deterministic, they’re still chaotic, in the sense that changing irrelevant details in the input (such as writing “color” versus “colour”) can make the output completely different.
vlovich123
21 hours ago
Even with temperature and a batch size of 1 and fixed seed LLMs should be deterministic. Of course batch size of 1 is not economical.
troupo
21 hours ago
with temperature=0 and no context. That is, a clean run with t=0, pk=0 etc. etc. will produce the same output for the same question. However if you ask the same question in the same session, output will be different.
To say the least, this is garbage compared to compilers
9rx
21 hours ago
> However if you ask the same question in the same session, output will be different.
When isn't that true?
int main() {
printf("Continue?\n");
}
and int main() {
printf("Continue?\n");
printf("Continue?\n");
}
do not see the compiler produce equivalent outputs and I am not sure how they ever could. They are not equivalent programs. Adding additional instructions to a program is expected to see a change in what the compiler does with the program.troupo
17 hours ago
If you ask the compiler to compile the same input, it will produce the same output.
With LLMs the output depends on the phases of the moon.
9rx
17 hours ago
> If you ask the compiler to compile the same input, it will produce the same output.
As with LLMs, unless you ask for the output to be nondeterministic. But any compiler can be made nondeterministic if you ask for it. That's not something unique to LLMs.
> With LLMs the output depends on the phases of the moon.
If you are relying on a third-party service to run the LLM, quite possibly. Without control over the hardware, configuration, etc. then there is all kinds of fuckery that they can introduce. A third-party can make any compiler nondeterministic.
But that's not a limitation of LLMs. By design, they are deterministic.
troupo
13 hours ago
> But any compiler can be made nondeterministic if you ask for it. That's not something unique to LLMs.
Not unique as in: no one makes their compilers deterministic, and you have to work to make a non-deterministic one. LLMs are non-deterministic by default, and you have to contort them to the point of uselessness to make them deterministic
> If you are relying on a third-party service to run the LLM, quite possibly. Without control over the hardware, configuration, etc.
Again. Even if you control everything, the only time they produce deterministic output is when they are completely neutered:
- workaround for GPUs with num_thread 1
- temperature set to 0
- top_k to 0
- top_p to 0
- context window to 0 (or always do a single run from a new session)
9rx
12 hours ago
> no one makes their compilers deterministic
Go (gc) was designed for reproducible builds by default, so clearly that's not true, but you are right that it isn't the norm.
Even the most widely recognized and used compilers, like gcc, clang, even rustc, are non-deterministic by default. Only if you work hard and control all the variables (e.g. -frandom-seed) can you make these compilers deterministic.
It's fascinating that anyone on HN thinks that compilers converge on always being deterministic or always being non-deterministic. I thought we were supposed to know things about computers around here?
troupo
5 hours ago
> no one makes their compilers deterministic
I was typing too fast. "No one makes their compilers nondeterministic.
> Even the most widely recognized and used compilers, like gcc, clang, even rustc, are non-deterministic by default.
Wat?
In which world are these compilers producing non-deterministic output if you run them again and again?
> It's fascinating that anyone on HN thinks that compilers converge on always being deterministic
It's called reality that is also very trivially verified.
> I thought we were supposed to know things about computers around here?
That's what I thought, too.
9rx
3 hours ago
> In which world are these compilers producing non-deterministic output if you run them again and again?
The one where deterministic iteration order (think use of unordered maps/sets, parallel execution, etc.) isn't considered a priority or is even seen as something to avoid as determinism here can lead to slower compiler times. Where there is use of heuristics that can have a "tie score" without effort to choose the winner deterministically. Where there is a need to support features like __DATE__ and __TIME__. So on and so on. It is a little out of the way so you've probably never heard of it, but its inhabitants call it "Earth".
troupo
2 hours ago
And those fictional compilers are? Like what fictional compilers when you run them on source code produce a program that may or may not run, may or may not produce desired result etc.?
Strange how literally no one is talking about this fictional non-determinism where the output of the compiler is completely different from run to run, but for the "LLMs are strictly deterministic" non-determinism is the norm and everyone expects it.
Edit also note how you went from "any compiler can be made nondeterministic if you ask for it." (compilers are deterministic, and you have to work to make them non-deterministic) to "most widely recognized and used compilers are non-deterministic by default."
There are specific things (undefined behavior) and specific things (like floating point precision between implementations) that may be specified as non-deterministic. To pretend that this is somehow equal to LLMs is not even being unrealistic. It's to exist on a plane orthogonal to our reality.
9rx
2 hours ago
> Like what fictional compilers when you run them on source code produce a program that may or may not run, may or may not produce desired result etc.?
Oh, you are talking about program determinism. I don't know about fictional compilers, but gcc and clang will produce programs that behave differently from compile to compile where __DATE__ or __TIME__ is used. There are languages like Church that are explicitly designed for non-deterministic programs, so a Church compiler would obviously need to adhere to that. And, of course, ick has the mysterious -mystery flag!
But we were talking about compiler determinism. Where for a stable input the compiler produces a stable output. A non-deterministic compiler does not necessarily equate to a non-deterministic program. Obvious to anyone who has used a computer before, the structure of the binary produced by a compiler and the execution of that binary are separate concerns.
If you wanted to talk about something completely different why not start a new thread?