hackernews client

LLMs could be, but shouldn't be compilers

117 pointsposted a day ago

134 Comments

codingdave

a day ago

The discussions around this point are taking it too seriously, even when they are 100% correct. LLMs are not deterministic, so they are not compilers. Sure, if you specify everything - every tiny detail, you can often get them to mostly match. But not 100%. Even if you do fix that, at that point you are coding using English, which is an inefficient language for that level of detail in a specification. And even if you accept that problem, you still have gone to a ton of work just to fight the fundamental non-deterministic nature of LLMs.

It all feels to me like the guys who make videos of using using electric drills to hammer in a nail - Sure, you can do that, but it is the wrong tool for the job. Everyone knows the phrase: "When all you have is a hammer, everything looks like a nail." But we need to also keep in mind the other side of that coin: "When all you have is nails, all you need is a hammer." LLMs are not a replacement for everything that happens to be digital.

alpaylan

a day ago

I think the point I wanted to make was that even if it was deterministic (which you can technically make it to be I guess?) you still shouldn’t live in a world where you’re guided by the “guesses” that the model makes when solidifying your intent into concrete code. Discounting hallucinations (I know this a is a big preconception, I’m trying to make the argument from a disadvantaged point again), I think you need a stronger argument than determinism in the discussion against someone who claims they can write in English, no reason for code anymore; which is what I tried to make here. I get your point that I might be taking the discussion to seriously though.

liveoneggs

a day ago

The future is about embracing absolute chaos. The great reveal of LLMs is that, for the most part, nothing actually mattered except the most shallow approximation of a thing.

belZaah

20 hours ago

This is true only for a small subset of problems. If you write crypto or hardware drivers, details do matter.

holden_nelson

7 hours ago

I’m not an AI evangelical, but I think it remains to be seen what the size of that subset is. Those that write crypto and hardware drivers are certainly a small subset of programmers. Most of us are pumping out enterprise crud and arguing with our PMs.

pjmlp

3 hours ago

Glueing SaaS systems together with low code tools, many of which now also have AI driven configurations.

wizzwizz4

a day ago

The great reveal of LLMs is that our systems of checks and balances don't really work, and allow grifters to thrive, but despite that most people were actually trying to do their jobs properly. Perhaps nothing matters to you except the most shallow approximation of a thing, but there are usually people harmed by such negligence.

liveoneggs

21 hours ago

I'm just as upset as you are about it, believe me. Unfortunately I have to live in the world as I see it and what I've observed in the last 18-ish months is a complete breakdown of prior assumptions.

skydhash

a day ago

Imagine if the amount of a bank transfer does not matter, but it can only be an approximation, also you can approximate the selected account too. Or the system for monitoring the temperature of blood stockage for transfusion…

Often it seems like tech maximalists are the most against tech reliability.

wavemode

20 hours ago

Well, the person who vibe-coded the banking app also vibe-coded a bunch of test cases, so this will only affect a small percentage of customers. When it does and they lose a bunch of money, well, you have a PR team and they don't, so just sweep the story under the rug.

Imagine that - you got your project done ahead of schedule (which looks great on your OKRs) AND finally achieved your dream of no longer being dependent on those stupid overpaid, antisocial software engineers, and all it cost you was the company's reputation. Boeing management would be proud.

Lots of business leaders will do the math and decide this is the way to operate from now on.

snovv_crash

21 hours ago

No need to be so practical.

I suggest when their pointer dereferences, it can go a bit forward or backwards in memory as long as it is mostly correct.

SecretDreams

21 hours ago

Let's give people a choice. My banking will be deterministic, others can have probabilistic banking. Every so often, they transfer me some money by random chance, but at least they can say their banking is run by LLMs. Totally fair trade.

ModernMech

18 hours ago

I think the exact opposite is true: LLMs revealed that when you average everything together, it's really bland and uninteresting no matter how technically good. It's the small choices that bring life into a thing and transform it from slop into something interesting and worthy of attention.

liveoneggs

18 hours ago

I think we agree but my prediction is that the slop will win

raw_anon_1111

20 hours ago

Before LLMs and now more than a decade ago in my career, I was assigned a task and my job was to translate that task into a working implementation. I was guided by the “guesses” that other developers made. I had to trust that they could do FizzBuzz competently without having to tell them to use the mod operator

Then my job became I am assigned a larger implementation and depending on how large the implementation was, I had to design specifications for others to do some or all of the work and validate the final product for correctness. I definitely didn’t pore over every line of code - especially not for front end work that I stopped doing around the same time.

The same is true for LLMs. I treat them like junior developers and slowly starting to treat them like halfway competent mid level ticket takers.

blazinglyfast

a day ago

> even if it was deterministic (which you can technically make it to be I guess?)

No. LLMs are undefined behavior.

xixixao

a day ago

OP means “given the same input, produce the same output” determinism. This isn’t really much different from normal compilers, you might have a language spec, but at the end of the day the results are determined by the concrete compiler’s implementation.

But most LLM services on purpose introduce randomness, so you don’t get the same result for the same input you control as a user.

zaphar

19 hours ago

You can get deterministic output if just turn the temperature all the way down. The problem is that you usually get really bad results, deterministically. It turns out the randomness helps in finding solutions.

recursive

18 hours ago

You can also get deterministic output if you use whatever temperature you want and use an arbitrary fixed RNG seed.

CGMthrowaway

a day ago

>LLMs are not deterministic, so they are not compilers.

"Deterministic" is not the the right constraint to introduce here. Plenty of software is non-deterministic (such as LLMs! But also, consensus protocols, request routing architecture, GPU kernels, etc) so why not compilers?

What a compiler needs is not determinism, but semantic closure. A system is semantically closed if the meanings of its outputs are fully defined within the system, correctness can be evaluated internally and errors are decidable. LLMs are semantically open. A semantically closed compiler will never output nonsense, even if its output is nondeterministic. But two runs of a (semantically closed) nondeterministic compiler may produce two correct programs, one being faster on one CPU and the other faster on another. Or such a compiler can be useful for enhancing security, e.g. programs behave identically, resist fingerprinting.

Nondeterminism simply means the compiler selects any element of an equivalence class. Semantic closure ensures the equivalence class is well‑defined.

thwarted

21 hours ago

No, deterministic means that given the same inputs—source code, target architecture, optimization level, memory and runtime limits (because if the optimizer has more space/time it might find better optimizations), etc—a compiler will produce the same exact output. This is what reproducible builds is about: tightly controlling the inputs so the same output is produced.

That a compiler might pick among different specific implementations in the same equivalency class is exactly what you want a multi-architecture optimizing compiler to do. You don't want it choosing randomly between different optimization choices within an optimization level, that would be non-deterministic at compile time and largely useless assuming that there is at most one most optimized equivalent. I always want the compiler to choose to xor a register with itself to clear it if that's faster than explicitly setting it to zero if that makes the most sense to do given the inputs/constraints.

CGMthrowaway

20 hours ago

Determinism may be required for some compiler use cases, such as reproducible builds, and several replies have pointed that out. My point isn't that determinism is unimportant, but that it isn't intrinsic to compilation itself.

There are legitimate compiler use cases e.g. search‑based optimization, superoptimization, diversification etc where reproducibility is not the main constraint. It's worth leaving conceptual space for those use cases rather than treating deterministic output as a defining property of all compilers

thwarted

16 hours ago

Given the same inputs, the desire for search-based optimization, superoptimization, or diversification should still be predictable and deterministic, even if it produces something that is initially unanticipated. It makes no sense that that a given superoptimization search would produce different output—would determine some other method is now more optimized than another—if the initial input and state is exactly the same. It is either the most optimal given the inputs and the state or it is not.

You are attempting to hedge and leave room for a non-deterministic compiler, presumably to argue that something like vibe-compilation is valuable. However, you've offered no real use cases for a non-deterministic compiler, and I assert that such a tool would largely be useless in the real world. There is already a huge gap between requirements gathering, the expression of those requirements, and their conversion into software. Adding even more randomness at the layer of translating high level programming languages into low level machine code would be a gross regression.

sureglymop

21 hours ago

Don't LLMs create the same outputs based on the same inputs if the temperature is 0? Maybe I'm just misunderstanding.

AlotOfReading

19 hours ago

Unfortunately not. Various implementation details like attention are usually non-deterministic. This is one of the better blog posts I'm aware of:

https://thinkingmachines.ai/blog/defeating-nondeterminism-in...

8note

10 hours ago

i dont think theres anything that makes it essentiall that llms are non-deterministic though

if you rewrote the math to be all fixed point precision on big ints, i think you would still get the useful LLM results?

if somebody really wanted to make a compiler in an LLM, i dont think that nondetermism is problem

id really imagine an llm compiler being a set of specs, dependency versions, and test definitions to use though, and you'd introduce essential nondetermism by changing a version number, even if the only change was the version name from "experimental" to "lts"

moregrist

21 hours ago

Perhaps you're comfortable with a compiler that generates different code every time you run it on the same source with the same libraries (and versions) and the same OS.

I am not. To me that describes a debugging fiasco. I don't want "semantic closure," I want correctness and exact repeatability.

pjmlp

3 hours ago

That is exactly how JIT compilers work, you cannot guarantee 100% machines code generation across runs, unless you can reproduce the whole universe that lead to the same heuristics and decision tree.

candiddevmike

21 hours ago

I wish these folks would tell me how you would do a reproducible build, or reproducible anything really, with LLMs. Even monkeying with temperature, different runs will still introduce subtle changes that would change the hash.

mvr123456

21 hours ago

This reminds me of how you can create fair coins from biased ones and vice versa. You toss your coin repeatedly, and then get the singular "result" in some way by encoding/decoding the sequence. Different sequences might map to the same result, and so comparing results is not the same as comparing the sequences.

Meanwhile, you press the "shuffle" button, and code-gen creates different code. But this isn't necessarily the part that's supposed to be reproducible, and isn't how you actually go about comparing the output. Instead, maybe two different rounds of code-generation are "equal" if the test-suite passes for both. Not precisely the equivalence-class stuff parent is talking about, but it's simple way of thinking about it that might be helpful

cjbgkagh

21 hours ago

There is nothing intrinsic to LLM prevents reproducibility. You can run them deterministically without adding noise, it would just be a lot slower to have a deterministic order of operations, which takes an already bad idea and makes it worse.

candiddevmike

21 hours ago

Please tell me how to do this with any of the inference providers or a tool like llama.cpp, and make it work across machines/GPUs. I think you could maybe get close to deterministic output, but you'll always risk having some level of randomness in the output.

cjbgkagh

21 hours ago

Just because you can’t do it with your chosen tools it does not mean it cannot be done. I’ve already granted the premise that it is impractical. Unless there is a framework that already guarantees determinism you’ll have to roll your own, which honestly isn’t that hard to do. You won’t get competitive performance but that’s already being sacrificed for determinism so you wouldn’t get that anyway.

wat10000

18 hours ago

It's just arithmetic, and computer arithmetic is deterministic.

On a practical level, existing implementations are nondeterministic because they don't take care to always perform mathematically commutative operations in the same order every time. Floating-point arithmetic is not commutative, so those variations change the output. It's absolutely possible to fix this and perform the operations in the same order every time, implementors just don't bother. It's not very useful, especially when almost everything runs with a non-zero temperature.

I think the whole nondeterminism thing is overblown anyway. Mathematical nondeterminism and practical nondeterminism aren't the same thing. With a compiler, it's not just that identical input produces identical output. It's also that semantically identical input produces semantically identical output. If I add an extra space somewhere whitespace isn't significant in the language I'm using, this should not change the output (aside from debug info that includes column numbers, anyway). My deterministic JSON decoder should not only decode the same values for two runs on identical JSON, a change in one value in the input should produce the same values in the output except for the one that changed.

LLMs inherently fail at this regardless of temperature or determinism.

raw_anon_1111

18 hours ago

Once I create code with an LLM, the code is not going to magically change between runs because it was generated by an LLM unless it did an “#import chaos_monkey”

SecretDreams

21 hours ago

Agree. I'm not sure what circle of software hell the OP is advocating for. We need consistent outputs from our most basic building blocks. Not performance probability functions. Many softwares run congruently across multiple nodes. What a nightmare it would be if you had to balance that for identical hardware.

bigstrat2003

21 hours ago

> What a compiler needs is not determinism, but semantic closure.

No, a compiler needs determinism. The article is quite correct on this point: if you can't trust that the output of a tool will be consistent, you can't use it as a building block. A stochastic compiler is simply not fit for purpose.

pjmlp

3 hours ago

Kind of, dynamic compilers, are called dynamic exactly because they depending on profiling and heuristics.

What matters is observable execution.

hackinthebochs

21 hours ago

Compiler output can be inconsistent and correct. For any source code there is an infinite number of machine code sequences that maintain the semantic constraints of the source code. Correctness is defined semantically, not by consistency.

cv5005

21 hours ago

Bitwise identical output from a compiler is important for verification to protect against tampering, supply chain attacks, etc.

8note

10 hours ago

its a useful way to solve those problems, but i dont think that means its the only way?

tjr

21 hours ago

Sometimes determinism is exactly what one wants. For avionics software, being able to claim complete equivalence between two builds (minus an expected, manually-inspected timestamp) is used to show that the same software was used / present in both cases, which helps avoid redundant testing, and ensure known-repeatable system setups.

bee_rider

a day ago

Are conventional compilers actually deterministic, with all the bells and whistles enabled? PGO seems like it ought to have a random element.

vlovich123

21 hours ago

No, modulo bugs generally the same set of inputs to a compiler are guaranteed to produce the same output bit for bit which is the definition of determinism.

There’s even efforts to guarantee this for many packages on Linux - it’s a core property of security because it lets you validate that the compilation process or environment wasn’t tampered with illicitly by being able to verify by building from scratch.

Now actually managing to fix all inputs and getting deterministic output can be challenging, but that’s less to do with the compiler and more to do with the challenge of completely taking the entire environment (the profile you are using for PGO, isolating paths on the build machine being injected into the binary, programs that have things in their source or build system that’s non deterministic (e.g. incorporating the build time into the binary)

jcranmer

20 hours ago

It is generally considered a bug in a compiler if its output is nondeterministic. Of course, compilers are large, complex beasts, and nondeterminism is so easy to accidentally introduce (e.g., do a "for each" in a map where the key is a pointer), that it's probably not too hard to find cases that have nondeterminism.

> PGO seems like it ought to have a random element.

PGO should be deterministic based on the runs used to generate the profile. The runs are tracking information that should be deterministic--how many times does the the branch get taken versus not taken, etc. HWPGO, which relies on hardware counters to generate profiling information, may be less deterministic because the hardware counters end up having some statistical slip to them.

9rx

11 hours ago

> It is generally considered a bug in a compiler if its output is nondeterministic.

Reproducible builds are oft considered a desirable goal, but not having them is hardly a bug. Not even some of the most well known compilers out there, like gcc and clang, are deterministic by default — and not by accident; in those cases you can enable determinism if you wish.

pjmlp

19 hours ago

Not at all, when talking about managed runtimes.

Hence why it is hard to do benchmarks with various kinds of GC and dynamic compilers.

You can't even expect deterministic code generation for the same source code across various compilers.

123malware321

a day ago

well considering you use components like DFA to build compilers, yes they are determenistic. you also have reproducible builds etc.

or does your binary always come out differently each time you compile the same file??

You can try it. try to compile the same file 10 times and diff the resultant binaries.

Now try to prompt a bunch of LLMs 10 times and diff the returned rubbish.

sigbottle

21 hours ago

I think one of the best ways to understand the "nice property" of compilers we like isn't necessarily determinacy, but "programming models".

There's this really good blog post about how autovectorization is not a programming model https://pharr.org/matt/blog/2018/04/18/ispc-origins

The point is that you want to reliably express semantics in the top level language, tool, API etc. because that's the only way you can build a stable mental model on top of that. Needing to worry about if something actually did something under the hood is awful.

Now of course, that depends on the level of granularity YOU want. When writing plain code, even if it's expressively rich in the logic and semantics (e.g. c++ template metaprogramming), sometimes I don't necessarily care about the specific linker and assembly details (but sometimes I do!)

The issue I think is that building a reliable mental model of an LLM is hard. Note that "reliable" is the key word - consistent. Be it consistently good or bad. The frustrating thing is that it can sometimes deliver great value and sometimes brick horribly and we don't have a good idea for the mental model yet.

To constrain said possibility space, we tether to absolute memes (LLMs are fully stupid or LLMs are a superset of humans).

Idk where I'm going with this

whattheheckheck

7 hours ago

Now you know how directors and executives feel

wat10000

16 hours ago

PGO takes the profile as one of the inputs. Give it the same profile and you should get the same output. If you have a pipeline that does something like build, run and profile performance tests, then rebuild with PGO, then that won't be deterministic. But you've brought it on yourself in that case.

candiddevmike

21 hours ago

Yes, they will output the same file hash every time, short of some build time mutation. Thus we can have nice things like reproducible builds and integrity checks.

WithinReason

a day ago

LLMs are deterministic at minimal temperature. Talking about determinism completely misses the point. The human brain is also non-deterministic and I don't see anybody dismiss human written code based on that. If you remove randomness and choose tokens deterministically, that doesn't magically solve the problems of LLMs.

SecretDreams

21 hours ago

> The human brain is also non-deterministic and I don't see anybody dismiss human written code based on that.

Humans, in all their non deterministic brain glory, long ago realized they don't want their software to behave like their coworkers after a couple of margaritas.

WithinReason

21 hours ago

You seem to be under the impression that I'm promoting LLMs, not sure where you got that idea. The argument is that non-determinism has nothing to do with the issues of LLMs.

9rx

a day ago

> LLMs are not deterministic

They are designed to be where temperature=0. Some hardware configurations are known defy that assumption, but when running on perfect hardware they most definitely are.

What you call compilers are also nondeterministic on 'faulty' hardware, so...

xigoi

19 hours ago

While they’re technically deterministic, they’re still chaotic, in the sense that changing irrelevant details in the input (such as writing “color” versus “colour”) can make the output completely different.

vlovich123

21 hours ago

Even with temperature and a batch size of 1 and fixed seed LLMs should be deterministic. Of course batch size of 1 is not economical.

troupo

21 hours ago

with temperature=0 and no context. That is, a clean run with t=0, pk=0 etc. etc. will produce the same output for the same question. However if you ask the same question in the same session, output will be different.

To say the least, this is garbage compared to compilers

9rx

21 hours ago

> However if you ask the same question in the same session, output will be different.

When isn't that true?

    int main() {
        printf("Continue?\n");
    }

and

    int main() {
        printf("Continue?\n");
        printf("Continue?\n");
    }

do not see the compiler produce equivalent outputs and I am not sure how they ever could. They are not equivalent programs. Adding additional instructions to a program is expected to see a change in what the compiler does with the program.

troupo

17 hours ago

If you ask the compiler to compile the same input, it will produce the same output.

With LLMs the output depends on the phases of the moon.

9rx

17 hours ago

> If you ask the compiler to compile the same input, it will produce the same output.

As with LLMs, unless you ask for the output to be nondeterministic. But any compiler can be made nondeterministic if you ask for it. That's not something unique to LLMs.

> With LLMs the output depends on the phases of the moon.

If you are relying on a third-party service to run the LLM, quite possibly. Without control over the hardware, configuration, etc. then there is all kinds of fuckery that they can introduce. A third-party can make any compiler nondeterministic.

But that's not a limitation of LLMs. By design, they are deterministic.

troupo

13 hours ago

> But any compiler can be made nondeterministic if you ask for it. That's not something unique to LLMs.

Not unique as in: no one makes their compilers deterministic, and you have to work to make a non-deterministic one. LLMs are non-deterministic by default, and you have to contort them to the point of uselessness to make them deterministic

> If you are relying on a third-party service to run the LLM, quite possibly. Without control over the hardware, configuration, etc.

Again. Even if you control everything, the only time they produce deterministic output is when they are completely neutered:

- workaround for GPUs with num_thread 1

- temperature set to 0

- top_k to 0

- top_p to 0

- context window to 0 (or always do a single run from a new session)

9rx

12 hours ago

> no one makes their compilers deterministic

Go (gc) was designed for reproducible builds by default, so clearly that's not true, but you are right that it isn't the norm.

Even the most widely recognized and used compilers, like gcc, clang, even rustc, are non-deterministic by default. Only if you work hard and control all the variables (e.g. -frandom-seed) can you make these compilers deterministic.

It's fascinating that anyone on HN thinks that compilers converge on always being deterministic or always being non-deterministic. I thought we were supposed to know things about computers around here?

troupo

5 hours ago

> no one makes their compilers deterministic

I was typing too fast. "No one makes their compilers nondeterministic.

> Even the most widely recognized and used compilers, like gcc, clang, even rustc, are non-deterministic by default.

Wat?

In which world are these compilers producing non-deterministic output if you run them again and again?

> It's fascinating that anyone on HN thinks that compilers converge on always being deterministic

It's called reality that is also very trivially verified.

> I thought we were supposed to know things about computers around here?

That's what I thought, too.

9rx

3 hours ago

> In which world are these compilers producing non-deterministic output if you run them again and again?

The one where deterministic iteration order (think use of unordered maps/sets, parallel execution, etc.) isn't considered a priority or is even seen as something to avoid as determinism here can lead to slower compiler times. Where there is use of heuristics that can have a "tie score" without effort to choose the winner deterministically. Where there is a need to support features like __DATE__ and __TIME__. So on and so on. It is a little out of the way so you've probably never heard of it, but its inhabitants call it "Earth".

troupo

2 hours ago

And those fictional compilers are? Like what fictional compilers when you run them on source code produce a program that may or may not run, may or may not produce desired result etc.?

Strange how literally no one is talking about this fictional non-determinism where the output of the compiler is completely different from run to run, but for the "LLMs are strictly deterministic" non-determinism is the norm and everyone expects it.

Edit also note how you went from "any compiler can be made nondeterministic if you ask for it." (compilers are deterministic, and you have to work to make them non-deterministic) to "most widely recognized and used compilers are non-deterministic by default."

There are specific things (undefined behavior) and specific things (like floating point precision between implementations) that may be specified as non-deterministic. To pretend that this is somehow equal to LLMs is not even being unrealistic. It's to exist on a plane orthogonal to our reality.

9rx

2 hours ago

> Like what fictional compilers when you run them on source code produce a program that may or may not run, may or may not produce desired result etc.?

Oh, you are talking about program determinism. I don't know about fictional compilers, but gcc and clang will produce programs that behave differently from compile to compile where __DATE__ or __TIME__ is used. There are languages like Church that are explicitly designed for non-deterministic programs, so a Church compiler would obviously need to adhere to that. And, of course, ick has the mysterious -mystery flag!

But we were talking about compiler determinism. Where for a stable input the compiler produces a stable output. A non-deterministic compiler does not necessarily equate to a non-deterministic program. Obvious to anyone who has used a computer before, the structure of the binary produced by a compiler and the execution of that binary are separate concerns.

If you wanted to talk about something completely different why not start a new thread?

mikewarot

4 hours ago

As a ham radio operator (KA9DGX), I tend to view all of this through the lens of impedance matching, it's my metaphor of choice.

You could use a badly designed antenna with a horrible VSWR at the end of a coax, and effectively communicate with some portion of the world, by using a tuner, which helps cover up the inefficiencies involved. However, doing so loses signal, in both directions. You can add amplification at the antenna for receive (a pre-amp) and transmit with more power, but eventually the coax will break down, possibly well before the legal limit.

It is far better to use a well designed antenna and matching system at the feed point. It maximizes signal transmission in both directions, by reducing losses as much as possible.

A compiler matches our cognitive impedance to that of the computer. We don't handle generating opcodes and instruction addresses manually very well. I don't see how an LLM is going to do that any better. Compilers, on the other hand, do it reliably, and very efficiently.

The best cognitive impedance matches happened a while ago, when Visual Basic 6 and Delphi for Windows first came out. You might think LLMs make it easier that that, but you'd be mistaken, for any problem of sufficient complexity.

mickdarling

a day ago

This is where the desire to NOT anthropomorphize LLMs actually gets in the way.

We have mechanisms for ensuring output from humans, and those are nothing like ensuring the output from a compiler. We have checks on people, we have whole industries of people whose whole careers are managing people, to manage other people, to manage other people.

with regards to predictability LLMs essentially behave like people in this manner. The same kind of checks that we use for people are needed for them, not the same kind of checks we use for software.

bigstrat2003

21 hours ago

> The same kind of checks that we use for people are needed for them...

The whole benefit of computers is that they don't make stupid mistakes like humans do. If you give a computer the ability to make random mistakes all you have done is made the computer shitty. We don't need checks, we need to not deliberately make our computers worse.

raw_anon_1111

18 hours ago

The same thing happens when I have a project that I’m leading where I have 3-4 other developers. It’s not deterministic that they will follow my specs completely, correctly and not have subtle bugs.

If they are junior developers working in Java they may just as well build an AbstractFactoryConcurrentSingletonBean because that’s what they learned in school as an LLM would be from training on code it found on the Internet.

skydhash

a day ago

> The same kind of checks that we use for people are needed for them

Those checks works for people because humans and most living beings respond well to rewards/punishment mechanisms. It’s the whole basis of society.

> not the same kind of checks we use for software.

We do have systems that are non deterministic (computer vision, various forecasting models…). We judge those by their accuracy and the likely of having false positive or false negatives (when it’s a classifier). Why not use those metrics?

wizzwizz4

21 hours ago

Because by those metrics, LLMs aren't very good.

LLM code completion compares unfavourably to the (heuristic, nigh-instant) picklist implementations we used to use, both at the low-level (how often does it autocomplete the right thing?) and at the high-level (despite many believing they're more effective, the average programmer is less effective when using AI tools). We need reasons to believe that LLMs are great and do all things, therefore we look for measurements that paint it in a good light (e.g. lines of code written, time to first working prototype, inclination to output Doom source code verbatim).

The reason we're all using (or pretending to use) LLMs now is not because they're good. It's almost entirely unrelated.

mvr123456

a day ago

Looking at LLMs as a less-than-completely-reliable compiler is a good idea, but it's misleading to think of them as natural-language-to-implementation compiler because they are actually an anything-to-anything compiler.

If you don't like the results or the process, you have to switch targets or add new intermediates. For example instead of doing description -> implementation, do description -> spec -> plan -> implementation

raw_anon_1111

20 hours ago

> My stance has been pretty rigid for some time: LLMs hallucinate, so they aren’t reliable building blocks. If you can’t rely on the translation step, you can’t treat it as a serious abstraction layer because it provides no stable guarantees about the underlying system.

This is technically true. But unimportant. When I write code in a higher level language and it gets compiled to machine code, ultimately I am testing statically generated code for correctness. I don’t care what type of weird tricks the compiler did for optimizations.

How is that any different than when someone is testing LLM generated C code? I’m still testing C code that isn’t going to magically be changed by the LLM without my intervention anymore than my C code is going to be changed without my recompiling it.

On this latest project I was on, the Python generated code by Codex was “correct” with the happy path. But there were subtle bugs in the distributed locking mechanics and some other concurrency controls I specified. Ironically, those were both caught by throwing the code in ChatGPT in thinking mode.

No one is using an LLM to compute is a number even or odd at runtime.

rileymichael

20 hours ago

> I don’t care what type of weird tricks the compiler did for optimizations.

you might not, but plenty of others do. on the jvm for example, anyone building a performance sensitive application has to care about what the compiler emits + how the jit behaves. simple things like accidental boxing, megamorphic call preventing inlining, etc. have massive effects.

i've spent many hours benchmarking, inspecting in jitwatch, etc.

raw_anon_1111

19 hours ago

And 95%+ developers aren’t writing performance sensitive code. In my career, most bottlenecks I’ve seen are because of bad database design, network latency, or other infrastructure related issuesor in the cloud days startup latency for anything serviceless.

Yes I know every millisecond a company like Google can shave off, is multiplied by billions of transactions a day and can save real money on infrastructure. But even at a second tier company like Salesforce, it probably doesn’t matter

rileymichael

16 hours ago

it all matters. if more people took pride in their craft and understood the behavior of their tools, modern software wouldn’t be so horrid

raw_anon_1111

16 hours ago

To a first approximation, no one gets paid to write bespoke hand crafted software. We get paid to make the company more money or save the company more money than the fully allocated cost to employ us to make computers do things. I take “pride” in the fact that software and implementations I designed meets the requirements of the people that paid me to write it - whether that be by a combination of my work and my delegated work to humans or LLMs

Over the past decade, part of my job has been to design systems, talk to “stakeholders” and delegate some work and do some myself. I’m neither a web developer nor a mobile developer.

I don’t look at a line of code for those types of implementations. I do make sure they work. From my perspective, those that I delegated to might as well be “human LLMs”.

pjmlp

19 hours ago

Which is a good example on how managed runtimes are already not deterministic and how hard it is to reproduce scenarios.

raw_anon_1111

18 hours ago

I agree, in my original comment, I went out of the way to say “C” in my hypothetical argument.

But even with C, it’s still not completely deterministic with out of order and predictive branching, cache hits vs misses etc. Didn’t exactly this cause some of the worse processor level security issues we had seen in years?

skydhash

20 hours ago

Because for all high level languages, errors happen at the same level of the language. You do not write programs in Go and then verify it in opcodes with a dissasembler. Incorrect syntax and runtime reference the Go files and symbols, not CPU registers.

The same thing happens in JavaScript. I debug it using a Javascript debugger, not with gdb. Even when using bash script, you don’t debug it by going into the programs source code, you just consult the man pages.

When using LLM, I would expect not to go and verify the code to see if it actually correct semantically.

raw_anon_1111

20 hours ago

If it works with all of your human or even generated test cases, why do I care if it decided to use a while loop or a for loop?

Like I said above, I do know to watch out for implementations that “Work on my Machine” but don’t work at scale or involve concurrency. But I have had to check for the same issues when I delegate work to more junior developers.

This is not meant to be an insult toward you. But my not doing front end development for well over a decade, a front end developer might as well be a “human LLM” to me. I’m going to give you the business requirements and constraints and you are going to come back with a website. I am just going to check it meets the business requirements and not tell you the how. I’m definitely not going to look at the code.

I just had a web project I had to modify for a new project, I used Codex and didn’t look at a line of code. Yeah I know JavaScript. But I have no idea whether the initial developer who worked on on another project I led or whether the Codex changes were idiomatic. I know the developer and Codex met my functional requirements.

lubujackson

19 hours ago

I agree LLMs shouldn't be "compilers" because that implies abstracting away all decisions embedded in the code. Code is structured decisions and we will always want access and control over those decisions. We might not care about many of those decisions, but some of those we absolutely do. Some might be architectural, some might be we want the button to always be red.

This is why I think the better goal is an abstraction layer that differentiates human decisions from default (LLM) decisions. A sweeping "compiler" locks humans out of the decision making process.

raw_anon_1111

18 hours ago

Have you ever led a project where you had to give the specs to other developers? Have you ever contracted out complete implementation to a consulting company? Those are just really slow Mechanical Turk style human LLMs

Martin_Silenus

9 hours ago

Maybe God was so angry seeing His fellows embrassing LLMs. So He asked vaguely one of those lame things, for the first time:

  0. "Make something cool out of this insane amount of energy." (temp: 10^42 Kelvin)
  1. He slept for a while.
  2. Datacenter exploded His realm.
  3. ~380 000 years passed and fiat lux.
  4. ~13 billions years passed and here we are.
  5. JMP 0.

olivia-banks

12 hours ago

I’m not sure I entirely agree with this. The example that comes to mind is text rendering in browsers. You can define a website, and how it will look, pretty well but not perfectly. There’s going to be some minor differences, like in the text rendering pipeline.

I think it’s more productive to chart all of these systems, LLMs included, on a line of abstraction leakiness. Even disregarding their stochastic nature, I think they’re a much too leaky abstraction to find any use in compilers. There’s a giant mismatch that I think is too big to reconcile.

skybrian

a day ago

Here's an experiment that might be worth trying: temporarily delete a source file, ask your coding agent to regenerate it, and examine the diffs to see what it did differently.

This could be a good way to learn how robust your tests are, and also what accidental complexity could be removed by doing a rewrite. But I doubt that the results would be so good that you could ask a coding agent to regenerate the source code all the time, like we do for compilers and object code.

raw_anon_1111

18 hours ago

I just had Claude rewrite a simple utility as far code, but complex if you didn’t know the gotchas of a particular AWS Service. It was much better than my implementation and it already knew how things work underneath,

For context, my initial implementation went through the official AWS open source process (no longer there) five years ago and I’m still getting occasional emails and LinkedIn Messages because it’s one of the best ways to solve the problem that is publicly available - the last couple of times, I basically gave the person the instructions I gave ChatGPT (since I couldn’t give them the code) and told them to have it regenerate the code in Python and it would do much better than what I wrote when I didn’t know the service as well as I do now, and the service has more features that you have to be concerned about

jasfi

8 hours ago

I'm actually building this and believe I've overcome the most difficult aspects mentioned here. It will be released as Open Source next week. https://intentcode.dev

explosion-s

a day ago

This is an interesting problem, one I've thought a lot about myself. On one hand, LLMs have the capacity to greatly help people, and I think, especially in the realm of gradually learning how to program, on the other hand, the non-determinism is such a difficult problem to work around.

One current idea of mine, is to iteratively make things more and more specific, this is the approach I take with psuedocode-expander ([0]) and has proven generally useful. I think there's a lot of value in the LLM instead of one shot generating something linearly, building from the top down with human feedback, for instance. I give a lot more examples on the repo for this project, and encourage any feedback or thoughts on LLM driven code generation in a more sustainable then vibe-coding way.

[0]: https://github.com/explosion-Scratch/psuedocode-expander/

Tade0

21 hours ago

> on the other hand, the non-determinism is such a difficult problem to work around.

Well, you can always set temperature to 0, but that doesn't remove hallucinations.

dpweb

a day ago

Compilation is transforming one computing model to another. LLMs aren't great at everything, but seem particularly well suited for this purpose.

One of the first things I tried to have an llm do is transpile. These days that works really well. You find an interesting project in python, i'm a js guy, boom js version. Very helpful.

echelon

a day ago

Forest for the trees.

You see a business you like, boom competing business.

These are going to turn into business factories.

Anthropic has a business factory. They can make new businesses. Why do they need to sell that at all once it works?

We're focusing on a compiler implementation. Classic engineering mindset. We focus on the neat things that entertain us. But the real story is what these models will actually be doing to create value.

throwaway2027

15 hours ago

Well they could be if we had a way to restore error state, like setting a trap and or catching signals by setting handlers and saving, restoring stack/registers then just like some JIT compilation we could progressively "fix" the assembly/machine instructions. Most "functions" are pretty short and the transformer architecture should be able to do it but the trickier part will be referencing global memory constants I think.

pjmlp

19 hours ago

The Pandora box is already open.

There are people playing around with straight machine code generation, or integrating ML into the optimisation backend, finally compiling via a translation to an existing language is already a given in vibe coding with agents.

Speaking of which, using agentic runtimes is hardly any different from writing programs, there are some instructions which then get executed just like any other applications, and if it gets compiled before execution or plainly interpreted, becomes a runtime implementation detail.

Are we there yet without hallucinations?

Not yet, however the box is already open, and there are enough people trying to make it happen.

plastic-enjoyer

a day ago

> It’s that the programming interface is functionally underspecified by default. Natural language leaves gaps; many distinct programs can satisfy the same prompt. The LLM must fill those gaps.

I think this is an interesting development, because we (linguists and logicians in particular) have spent a long time developing a highly specified language that leaves no room for ambiguity. One could say that natural language was considered deficient – and now we are moving in the exact opposite direction.

somesortofthing

18 hours ago

One thing that's missing from this is that the specification itself only matters insofar as it meets its own meta-specification of "what people will use/pay for". LLMs may have an easier time understanding that than what a specific developer wants from them - a perfect implementation of an un-marketable product is mostly pointless.

aethrum

a day ago

Can't we just turn the temp down to 0?

kibwen

21 hours ago

That doesn't make a difference here. Even with a nonzero temperature, an LLM could still be deterministic as long as you have control of its random seed. As the article says:

"This gets to my core point. What changes with LLMs isn’t primarily nondeterminism, unpredictability, or hallucination. It’s that the programming interface is functionally underspecified by default."

helloplanets

21 hours ago

Even if you turn the temperature down to 0, it's not deterministic. Floating points are messy. If there is even a tiny difference when it comes to the order of operations on the actual GPU that's running the billions of parallelized floating point operations over and over, it's very possible to end up with changing top probability logits.

abm53

a day ago

More to the point: is randomness of representation or implementation an inherent issue if the desired semantics of a program are still obeyed?

This is not really a point about whether LLMs can currently be used as English compilers, but more questioning whether determinism of the final machine code output is a critical property of a build system.

MyHonestOpinon

18 hours ago

I suppose that even with temp down to zero the model itself changes over time.

calebm

19 hours ago

My biggest AI win so far was using ChatGPT as a transpiler to convert from vanilla JS code to GLSL. It took 7 prompts and about 1.5 hours, but without the AI, I would have been thrilled to have completed the project in a week.

lambda-lollipop

18 hours ago

cf. Dijkstra "On the foolishness of "natural language programming" https://www.cs.utexas.edu/~EWD/transcriptions/EWD06xx/EWD667...

>From one gut feeling I derive much consolation: I suspect that machines to be programmed in our native tongues —be it Dutch, English, American, French, German, or Swahili— are as damned difficult to make as they would be to use.

hollowturtle

21 hours ago

They're giant pattern regurgitators, impressive for sure, but they only can be as good as their training data, reason why they seems to be more effective for TypeScript, Python etc. Nothing less nothing more. No AGI, no Job X is done. Hallucinations are a feature, otherwise they would just spit out training data. The thing is the whole discussion around these tools is so miserable that I'm pondering the idea of canceling from every corner of the internet, the fatigue is real and pushing back the hype feels so exausting, worse than crypto, nft and web3. I'm a user of these tools me pushing back the hype is because its ripple effects arrive inside my day job and I'm exausted of people handing to you generated shit just to try making a point and saying "see? like that"

slopusila

20 hours ago

two engineers implementing the same task are not-deterministic

yet nobody complained about this

in fact engineers appreciate that, "we are not replaceable code monkeys cogs in the machine as management would like"

MyHonestOpinon

18 hours ago

But once the code is written, tested, etc. it becomes deterministic.

smallnix

20 hours ago

> Computer science has been advancing language design by building higher and higher level languages

Why? Because new languages have an IR in their compilation path?

behnamoh

a day ago

> Specifying systems is hard; and we are lazy.

The more I use LLMs, the more I find this true. Haskell made me think for minutes before writing one line of code. Result? I stopped using Haskell and went back to Python because with Py I can "think while I code". The separation of thinking|coding phases in Haskell is what my lazy mind didn't want to tolerate.

Same goes with LLMs. I want the model to "get" what I mean but often times (esp. with Codex) I must be very specific about the project scope and spec. Codex doesn't let me "think while I vibe", because every change is costly and you'd better have a good recovery plan (git?) when Codex goes stray.

lfsss

21 hours ago

You want to fly on an AI-developed airplane. I don't (just kidding haha).

ryanschneider

a day ago

I’m kind of surprised no one has mentioned this one yet: https://www.lesswrong.com/posts/gQyphPbaLHBMJoghD/comp-sci-i...

Daviey

21 hours ago

If you have decent unit and functional tests, why do you care how the code is written?

This feels like the same debate assembly programmers had about C in the 60s. "You don’t understand what the compiler is doing, therefore it’s dangerous". Eventually we realised the important thing isn’t how the code was authored but whether the behaviour is correct, testable, and maintainable.

If code generated by an LLM:

  - passes a real test suite (not toy tests),
  - meets performance/security constraints,
  - goes through review like any other change,

then the acceptance criteria haven’t changed. The test suite is part of the spec. If the spec is enforced in CI, the authoring tool is secondary.

The real risk isn’t "LLMs as compilers", it’s letting changes bypass verification and ownership. We solved that with C, with large dependency trees, with codegen tools. Same playbook applies here.

If you give expected input and get expected output, why does it matter how the code was written?

shauhss

20 hours ago

Because testing at this level is a likely impossible across all domains of programming. You can narrow the set of inputs and get relatively far, but the more complex the systems the broader the space of problems becomes. And even a simple crud app on an EC2 has a lot more failure modes than people are able to test for with current tools.

> passes a real test suite (not toy tests)

“not toy tests” is doing a lot of heavy lifting here. Like an immeasurable amount of lifting.

lunarboy

a day ago

Are LLMs not already compilers? They translate human natural language to code pretty well now. But yeah, they probably don't fit the bill of English based code to machine code

rvz

21 hours ago

> Are LLMs not already compilers? They translate human natural language to code pretty well now.

Can you formally verify prose?

> But yeah, they probably don't fit the bill of English based code to machine code

Which is why LLMs cannot be compilers that transform code to machine code.

jerf

a day ago

A lot of people are mentally modeling the idea that LLMs are either now or will eventually be infinitely capable. They are and will stubbornly persist in being finite, no matter how much capacity that "finite" entails. For the same reason that higher level languages allow humans to worry less about certain details and more about others, higher level languages will allow LLMs to use more of their finite resources on solving the hard problems as well.

Using LLMs to do something like what a compiler can already do is also modelling LLMs as infinite rather than finite. In fact in this particular situation not only are they finite, they're grotesquely finite, in particular, they are expensive. For example, there is no world where we just replace our entire infrastructure from top to bottom with LLMs. To see that, compare the computational effort of adding 10 8-digit numbers with an LLM versus a CPU. Or, if you prefer something a bit less slanted, the computational costs of serving a single simple HTTP request with modern systems versus an LLM. The numbers run something like LLMs being trillions of times more expensive, as an opening bid, and if the AIs continue to get more expensive it can get even worse than that.

For similar reasons, using LLMs as a compiler is very unlikely to ever produce anything even remotely resembling a payback versus the cost of doing so. Let the AI improve the compiler instead. (In another couple of years. I suspect today's AIs would find it virtually impossible to significatly improve an already-optimized compiler today.)

Moreover, remember, oh, maybe two years back when it was all the rage to have AIs be able to explain why they gave the answer they did? Yeah, I know, in the frenzied greed to be the one to grab the money on the table, this has sort of fallen by the wayside, but code is already the ultimate example of that. We ask the LLM to do things, it produces code we can examine, and the LLM session then dies away leaving only the code. This is a good thing. This means we can still examine what the resulting system is doing. In a lot of ways we hardly even care what the LLM was "thinking" or "intending", we end up with a fantastically auditable artifact. Even if you are not convinced of the utility of a human examining it, it is also an artifact that the next AI will spend less of its finite resources simply trying to understand and have more left over to actually do the work.

We may find that we want different programming languages for AIs. Personally I think we should always try to retain that ability for humans to follow it, even if we build something like that. We've already put the effort into building AIs that produce human-legible code and I think it's probably not that great a penalty in the long run to retain that. At the moment it is hard to even guess what such a thing would look like, though, as the AIs are advancing far faster than anyone (or any AI) could produce, test, prove out, and deploy such a language, against the advantage of other AIs simply getting better at working with the existing coding systems.

rvz

a day ago

Anyone who knows 0.1% about LLMs should know that they are not deterministic systems and are totally unpredictable with their outputs meaning that they cannot become compilers at all.

The obvious has been stated.

pjmlp

19 hours ago

Anyone that knows 0.1% about GC and JIT compilers also knows how hard is to have deterministic behaviours, and how much their behaviours are driven by heuristics.

WithinReason

a day ago

Anyone who knows 0.2% about LLMs should know that they can be sampled deterministically, and yet that doesn't change the argument.

rvz

a day ago

We do not trust them (LLMs) 100% to reliably emit correct assembled code (why would anyone) compared with a compiler which the latter is deterministic and the former is fundamentally stochastic, no matter how you sample them.

LLMs are not designed for that.

hackinthebochs

21 hours ago

There's almost a good point here, but you're misusing concepts that obfuscate the point you're trying to make. Determinism is about producing the same output given the same input. In this sense, LLMs are fundamentally deterministic. Inference produces scores for every word in their vocabulary. This score map is then sampled from according to the temperature to produce the next token. But this non-determinism is artificially injected.

But the determinism/non-determinism axis isn't the core issue here. The issue is that they are trained by gradient descent which produces instability/unpredictability in its output. I can give it a set of rules and a broad collection of examples in its context window. How often it will correctly apply the supplied rules to the input stream is entirely unpredictable. LLMs are fundamentally unpredictable as a computing paradigm. LLMs training process is stochastic, though I hesitate to call them "fundamentally stochastic".

rvz

20 hours ago

> Determinism is about producing the same output given the same input. In this sense, LLMs are fundamentally deterministic.

You cannot formally verifiy prose or the text that LLMs generates when attempting to compare what a compiler does. So even in this sense that is completely false.

No-one can guarrantee that the outputs will be 100% to what the instructions you are giving to the LLM, which is why you do not trust it. As long as it is made up of artificial neurons that predict the next token, it is fundamentally a stochastic model and unpredictable.

One can maliciously craft an input to mess up the network to get the LLM to produce a different output or outright garbage.

Compilers have reproducable builds and formal verification of their functionality. No such thing with LLMs exist. Thus, comparing LLMs to a compiler and suggesting that LLMs are 'fundamentally deterministic' or is even more than a compiler is completely absurd.

hackinthebochs

19 hours ago

You're just using words incorrectly. Deterministic means repeatable. That's it. Predictable, verifiable, etc are tangential to deterministic. Your points are largely correct but you're not using the right words which just obfuscates your meaning.

rvz

10 hours ago

Nope. You have not shown how a large scale collection of neural networks irrespective of their architecture is more deterministic when compared to a 'compiler' and only repeating a known misconception of tweaking the temperature to 0 which does not bring the determinism you claim it brings with LLMs [0] [1] [2], otherwise you would not have this problem in the first place.

By even doing that, the result of the outputs are useless anyway. So this really does not help your point at all. So therefore:

> You're just using words incorrectly. Deterministic means repeatable. That's it. Predictable, verifiable, etc are tangential to deterministic.

There is nothing deteministic or predictable about an LLM even when you compare it to a compiler, unless you can guarrantee that the individual neurons through inference give a predictable output which would be useful enough for being a drop-in compiler replacement.

[0] https://152334h.github.io/blog/non-determinism-in-gpt-4/

[1] https://arxiv.org/pdf/2506.09501

[2] https://thinkingmachines.ai/blog/defeating-nondeterminism-in...

hackinthebochs

8 hours ago

Yes, there's some unknown sources of non-determinism when running production LLM architectures at full capacity. But that's completely irrelevant to the point. The core algorithm is deterministic. And you're still conflating deterministic and predictable. It's strange to have such disregard for the meaning of words and their correct usage.

rvz

5 hours ago

> Yes, there's some unknown sources of non-determinism when running production LLM architectures at full capacity. But that's completely irrelevant to the point.

It is directly relevant and supports my whole point which just debunked your assertions on LLMs being ‘deterministic’ which doesn’t exist at a fundamental sense which you can’t guarantee that the behaviour and even the outputs will be the same.

> The core algorithm is deterministic. And you're still conflating deterministic and predictable.

The entire LLM is still non-deterministic and it is still considered to be unpredictable even if you take that to account.

> It's strange to have such disregard for the meaning of words and their correct usage.

Nope. Not only you have shown absolutely zero sources at all to prove the deterministic nature of LLMs to where it can function as a “compiler”, you ultimately conceded by agreeing with the linked paper(s) recognising that LLMs still do not have deterministic or predictable properties at all; even if you tweak the temp, parameters, etc.

Therefore, once again LLMs are NOT compilers as even feeding them adversarial inputs can mess up the entire network up to become useless.

WithinReason

4 hours ago

from your first source:

> As a consequence, the model is no longer deterministic at the sequence-level, but only at the batch-level

therefore they are deterministic when the batch size is 1

Your second source lists a large number of ways how to make LLMs determnistic. The title of your third source is "Defeating Nondeterminism in LLM Inference" which also means that they can be made deterministic.

Every single one of your sources proves you wrong, so no more sources need to be cited.

fragmede

a day ago

In the comparison to compilers, it relevant to point out that work began on them in the 1950's. That they're basically solid by the time most people here used them, should be looked at with that time frame in mind. ChatGPT came out in 2022, 3-4 years ago. Compilers have had around three quarters of a century years to get where they are today. I'll probably be dead in seventy years, nevermind have any idea what AI (or society) is going to look like then!

But for reference, we don't (usually) care which register three compiler uses for which variable, we just care that it works, with no bugs. If the non-dertetminism of LLMs mean the variable is called file, filename, or fileName, file_name, and breaking with convention, why do we care? At the level Claude let's us work with code now, it's immaterial.

Compilation isn't stable. If you clear caches and recompile, you don't get a bit-for-bit exact copy, especially on today's multi-core processors, without doing extra work to get there.

SpicyLemonZest

a day ago

But the reason we don't care which register the compiler uses is that compilers, even without strict stability, reliably enforce abstractions that free us from having to care. If your compiler decided on 5% of inputs that it just doesn't feel like using more than two data registers, you'd have to think about it on 100% of inputs.

kittikitti

21 hours ago

"LlMs HAlLuCinATE"

Stop this. This is such a stupid way way of describing mistakes from AI. Please try to use the confusion matrix or any other way. If you're going to try and make arguments, it's hard to take them seriously if you keep regurgitating that LLM's hallucinate. It's not a well defined definition so if you continually make this your core argument, it becomes disingenuous.

dgxyz

21 hours ago

How about "expected poor ratio of corn to shit".?

jtrn

21 hours ago

That was a painfull read for me. It reminds me of a specific annoyance I had at university with a professor who loved to make sweeping, abstract claims that sounded incredibly profound in the lecture hall but evaporated the moment you tried to apply them. It was always a hidden 'I-am-very-smart' attempt that fell apart if you actually deconstructed the meaning, the logic, or the claimed results. This article is the exact same breed of intellectualizing. It feels deep, but there is no actual logical hold if you break up the claims and deductive steps.

You can see it clearly if you just translate the article's expensive vocabulary into plain English. When the author writes, 'When you hand-build, the space of possibilities is explored through design decisions you’re forced to confront,' they are just saying, 'When you write code yourself, you have to choose how to write it.' When they claim, 'contextuality is dominated by functional correctness,' they just mean, 'Usually, we just care if the code works.' When they warn about 'inviting us to outsource functional precision itself,' they really mean, 'LLMs let you be lazy.' And finaly, 'strengthening the will to specify,' is just a dramatic way of saying, 'We need to write better requirements.' It is obscurantism plain and simple. using complexity to hide the fact that the insight is trivial.

But that is just an estethical problem to me. Worse. The argument collapses entirely when you look at the logical leap between the premises.

The author basically argues that because Natural Language is vague, engineers will inevitably stop caring about the details and just accept whatever reasonable output the AI gives. This is pure armchair psychology. It assumes that just because the tool allows for vagueness, professionals will suddenly abandon the concept of truth or functional requirements. That is a massive, unsubstantiated jump.

If we use fuzzy matching to find contacts on our phones all the time. Just because the search algorithm is imprecise doesn't mean we stop caring if we call the right person. We don't say, 'Well, the fuzzy match gave me Bob instead of Bill, I guess I'll just talk to Bob now.' The hard constraint, the functional requirement of talking to the specific person you need, remains absolute. Similarly, in software, the code either compiles and passes the tests, or it doesn't. The medium of creation might be fuzzy, but the execution environment is binary. We aren't going to drift into accepting broken banking software just because the prompt was in English.

This entire essay feels like those social psychology types that now have been thoroughly been discredited by the replication crisis in psychology. The ones who are where concerned with dazzling people with verbal skills than with being right. It is unnecessarily complex, relying on projection of dreamt up concepts and behavior, rather than observation. THIS tries to sound profound by turning a technical discussion into a philosophical crisis, but underneath the word salad, it is not just shallow, it is wrong.