wenc
2 days ago
Once GPT is tuned more heavily on Lean (proof assistant) -- the way it is on Python -- I expect its usefulness for research level math to increase.
I work in a field related to operations research (OR), and ChatGPT 4o has ingested enough of the OR literature that it's able to spit out very useful Mixed Integer Programming (MIP) formulations for many "problem shapes". For instance, I can give it a logic problem like "i need to put i items in n buckets based on a score, but I want to fill each bucket sequentially" and it actually spits out a very usable math formulation. I usually just need to tweak it a bit. It also warns against weak formulations where the logic might fail, which is tremendously useful for avoiding pitfalls. Compare this to the old way, which is to rack my brain over a weekend to figure out a water-tight formulation of MIP optimization problem (which is often not straightforward for non-intuitive problems). GPT has saved me so much time in this corner of my world.
Yes, you probably wouldn't be able to use ChatGPT well for this purpose unless you understood MIP optimization in the first place -- and you do need to break down the problem into smaller chunks so GPT can reason in steps -- but for someone who can and does, the $20/month I pay for ChatGPT more than pays for itself.
side: a lot of people who complain on HN that (paid/good - only Sonnet 3.5 and GPT4o are in this category) LLMs are useless to them probably (1) do not know how to use LLMs in way that maximizes their strengths; (2) have expectations that are too high based on the hype, expecting one-shot magic bullets. (3) LLMs are really not good for their domain. But many of the low-effort comments seem to mostly fall into (1) and (2) -- cynicism rather than cautious optimism.
Many of us who have discovered how to exploit LLMs in their areas of strength -- and know how to check for their mistakes -- often find them providing significant leverage in our work.
WhatIsDukkha
2 days ago
I entirely agree about their utility.
HN, and the internet in general, have become just an ocean of reactionary sandbagging and blather about how "useless" LLMs are.
Meanwhile, in the real world, I've found that I haven't written a line of code in weeks. Just paragraphs of text that specify what I want and then guidance through and around pitfalls in a simple iterative loop of useful working code.
It's entirely a learned skill, the models (and very importantly the tooling around them) have arrived at the base line they needed.
Much Much more productive world by just knuckling down and learning how to do the work.
edit: https://aider.chat/ + paid 3.5 sonnet
skydhash
2 days ago
> Much Much more productive world by just knuckling down and learning how to do the work.
The fact everyone that say they've become more productive with LLMs won't say how exactly. I can talk about how VIM have make it more enjoyable to edit code (keybinding and motions), how Emacs is a good environment around text tooling (lisp machine), how I use technical books to further my learning (so many great books out here). But no one really show how they're actually solving problems with LLMs and how the alternatives were worse for them. It's all claims that it's great with no further elaboration on the workflows.
> I haven't written a line of code in weeks. Just paragraphs of text that specify what I want and then guidance through and around pitfalls in a simple iterative loop of useful working code.
Code is intent described in terms of machinery actions. Those actions can be masked by abstracting them in more understandable units, so we don't have to write opcodes, but we can use python instead. Programming is basically make the intent clear enough so that we know what units we can use. Software engineering is mostly selecting the units in a way to do minimal work once the intent changes or the foundational actions do.
Chatting with a LLM look to me like your intent is either vague or you don't know the units to use. If it's the former, then I guess you're assuming it is the expert and will guide you to the solution you seek, which means you believe it understands the problem more than you do. The second is more strange as it looks like playing around with car parts, while ignoring the manuals it comes with.
What about boilerplate and common scenarios? I agree that LLMs helps a great deal with that, but the fact is that there are perfectly good tools that helped with that like snippets, templates, and code generators.
Nadya
2 days ago
Ever seen someone try and search something on Google and they are just AWFUL at it? They can never find what they're looking for and then you try and can pull it up in a single search? That's what it is like watching some people try to use LLM's. Learning how to prompt an LLM is as much a learned skill as much as learning how to phrase internet searches is a learned skill. And as much as people decried that "searching Google isn't a real skill" tech-savvy people knew better.
Same thing except now it's also many tech-savvy people joining in with the tech-unsavvy in saying that prompting isn't a real skill...but people who know better know that it is.
On average, people are awfully bad at describing exactly what it is they want. Ever speak with a client? And you have to go back and forward for a few hours to finally figure out what it is they wanted? In that scenario you're the LLM. Except the LLM won't keep asking probing questions and clarifications - it will simply give them what they originally asked for (which isn't what they want). Then they think the LLM is stupid and stop trying to ask it for things.
Utilizing an LLM to its full potential is a lot of iterative work and, at least for the time being, requires having some understanding of how it works underneath the hood (eg. would you get better results by starting a new session or asking it to forget previous, poorly worded instructions?).
skydhash
2 days ago
I'm not arguing that you can't get result with LLMs, I'm just asking is it worth the actual effort especially when there's better way to get that result you're seeking (or if the result is really something that you want).
An LLM is a word (token?) generator which can be amazingly consistent according to its model. But rarely is my end goal to generate text. It's either to do something, to understand something, or to communicate. For the first, there are guides (books, manuals, ...), for the second, there are explanations (again books, manuals,...), and the third is just using language to communicate what's on my mind.
That's the same thing with search engines. I use them to look for something. What I need first is a description of that something, not how to do the "looking for". Then once you know what you want to find, it's easier to use the tool to find it.
If your end goal can be achieved with LLMs, be my guest to use them. But, I'm wary of people taking them at face value and then pushing the workload unto everyone else (like developers using electron).
Nadya
2 days ago
It's hard to quantify how much time learning how to search saves because the difference can range between infinite (finding the result vs not finding it at all) to basically no difference (1st result vs 2nd result). I think many people agree it is worth learning how to "properly search" though. You spend much less time searching and you get the results you're looking for much more often. This applies outside of just Google search: learning how to find and lookup information is a useful skill in and of itself.
ChatGPT has helped me write some scripts for things that otherwise probably would have taken me at least 30+ minutes and it wrote them in <10 seconds and they worked flawlessly. I've also had times where I worked with it to develop something that ended up taking me 45 minutes to only ever get error-ridden code that I had to fix the obvious errors and rewrite parts of it to get it working. Sometimes during this process it actually has taught me a new approach to doing something. If I had started from scratch coding it by myself it probably would have taken me only 10~ minutes. But if I was better at prompting what if that 45 minutes was <10 minutes? It would go from from a time loss to a time save and be worth using. So improving my ability to prompt is worthwhile as long as doing so trends towards me spending less time prompting.
Which is thankfully pretty easy to track and test. On average, as I get better at prompting, do I need to spend more or less time prompting to get the results I am looking for? The answer to that is largely that I spend less time and get better results. The models constantly changing and improving over time can make this messy - is it the model getting better or is it my prompting? But I don't think models change significantly enough to rule out that I spend less time prompting than I have in the past.
panarky
2 days ago
> how much time learning how to search saves
>>> you do need to break down the problem into smaller chunks so GPT can reason in steps
To search well, you need good intuition for how to select the right search terms.
To LLM well, you can ask the LLM to break the problem into smaller chunks, and then have the LLM solve each chunk, and then have the LLM check its work for errors and inconsistencies.
And then you can have the LLM write you a program to orchestrate all of those steps.
gtirloni
a day ago
Yes you can. What was the name of the agent that was going to replace all developers? Devin or something? It was shown it took more time iterate over a problem and created terrible solutions.
LLMs are in the evolutionary phase, IMHO. I doubt we're going to see revolutionary improvements from GPTs. So I say time and time again: the technology is here, show it doing all the marvelous things today. (btw, this is not directed at your comment in particular and I digressed a bit, sorry).
smallnamespace
2 days ago
> asking is it worth the actual effort
If prompting ability varies then this is not some objective question, it depends on each person.
For me I've found more or less every interaction with an LLM to be useful. The only reason I'm not using it continually for 8 hours a day is because my brain is not able to usefully manage that torrent of new information and I need downtime.
rvnx
2 days ago
It works quite nicely if you consider LLMs as a translator (and that’s actually why Transformers were created).
Enter technical specifications in English as input language, get code as destination language.
lolc
2 days ago
English as input language works in simple scenarios but breaks down very very quickly. I have to get extremely specific and deliberate. At some point I have to write pseudocode to get the machine to get say double checked locking right. Because I have enough experiences where varying the prompting didn't work, I revert to just writing the code when I see the generator struggling.
When I encounter somebody who says they do not write code anymore, I assume that they either:
1. Just don't do anything beyond the simplest tutorial-level stuff
2. or don't consider their post-generation edits as writing code
3. or are just bullshitting
I don't know which it is for each person in question, but I don't trust that their story would work for me. I don't believe they have some secret sauce prompting that works for scenarios where I've tried to make it work but couldn't. Sure I may have missed some ways, but my map of what works and what doesn't may be very blurry at the border, but the surprises tend to be on the "doesn't work" side. And no Claude doesn't change this.
ben_w
a day ago
I definitely still write code. But I also prefer to break down problems into chunks which are small enough that an LLM could probably do them natively, if only you can convince it to use the real API instead of inventing a new API each time — concrete example from ChatGPT-3.5, I tried getting it to make and then use a Vector2D class — in one place it had sub(), mul() etc., the other place it had subtract(), multiply() etc.
It can write unit tests, but makes similar mistakes, so I have to rewrite them… but it nevertheless still makes it easier to write those tests.
It writes good first-drafts for documentation, too. I have to change it, delete some stuff that's excess verbiage, but it's better than the default of "nobody has time for documentation".
andrepd
a day ago
Exactly! What is this job that you can get where you don't code and just copy-paste from ChatGPT? I want it!
My experience is just as you describe it: I ask a question whose answer is in stackoverflow or fucking geeks4geeks? Then it produces a good answer. Anything more is an exercise in frustration as it tries to sneak nonsense code past me with the same confident spiel with which it produces correct code.
ben_w
a day ago
It's absolutely a translator, but they're similar good/bad/weird/hallucinaty at natural translation translations, too.
Consider this round-trip in Google Translate:
"དེ་ནི་སྐད་སྒྱུར་པ་ཞིག་ཡིན། འོན་ཀྱང་ཁོང་ཚོ་རང་བྱུང་སྐད་སྒྱུར་གྱི་སྐད་སྒྱུར་ནང་ལ་ཡག་པོ/ངན་པ/ཁྱད་མཚར་པོ/མགོ་སྐོར་གཏོང་བ་འདྲ་པོ་ཡོད།"
"It's a translator. But they seem to be good/bad/weird/delusional in natural translations. I have a"
(Google translate stopped suddenly, there).
I've tried using ChatGPT to translate two Wikipedia pages from German to English, as it can keep citations and formatting correct when it does so; it was fine for the first 2/3rds, then it made up mostly-plausible statements that were not translated from the original for the rest. (Which I spotted and fixed before saving, because I was expecting some failure).
Don't get me wrong, I find them impressive, but I think the problem here is the Peter Principle: the models are often being promoted beyond their competence. People listen to that promotion and expect them to do far more than they actually can, and are therefore naturally disappointed by the reality.
People like me who remember being thrilled to receive a text adventure casette tape for the Commodore 64 as a birthday or christmas gift when we were kids…
…compared to that, even the Davinci model (that really was autocomplete) was borderline miraculous, and ChatGPT-3.5 was basically the TNG-era Star Trek computer.
But anyone who reads me saying that last part without considering my context, will likely imagine I mean more capabilities than I actually mean.
ben_w
2 days ago
> On average, people are awfully bad at describing exactly what it is they want. Ever speak with a client? And you have to go back and forward for a few hours to finally figure out what it is they wanted?
One of them it was the entire duration of me working for them.
They didn't understand why it was taking so long despite constantly changing what they asked for.
codr7
2 days ago
Building the software is usually like 10% of the actual job, we could do a better job of teaching that.
The other 90% is mostly mushy human stuff, fleshing out the problem, setting expectations etc. Helping a group of people reach a solution everyone is happy with has little to do with technology.
ben_w
2 days ago
Mostly agree. Until ChatGPT, I'd have agreed with all of that.
> Helping a group of people reach a solution everyone is happy with has little to do with technology.
This one specific thing, is actually something that ChatGPT can help with.
It's not as good as the best human, or even a middling human with 5 year's business experience, but rather it's useful because it's good enough at so many different domains that it can be used to clarify thoughts and explain the boundaries of the possible — Google Translate for business jargon, though like Google Translate it is also still often wrong — the ultimate "jack of all trades, master of none".
codr7
a day ago
We're currently in the shiny toy stage, once the flaws are thoroughly explored and accepted by all as fundamental I suspect interest will fade rapidly.
There's no substance to be found, no added information; it's just repeating what came before, badly, which is exactly the kind of software that would be better off not written if you ask me.
The plan to rebuild society on top of this crap is right up there with basing our economy on manipulating people into buying shit they don't need and won't last so they have to keep buying more. Because money.
threecheese
a day ago
The worry I have is that the net value will become great enough that we’ll simply ignore the flaws, and probabilistic good-enough tools will become the new normal. Consider how many ads the average person wades through to scroll an Insta feed for hours - “we’ve” accepted a degraded experience in order to access some new technology that benefits us in some way. To paraphrase comedian Mark Normand: “Capitalism!”
codr7
a day ago
Scary thought, difficult to unthink.
I'm afraid you might be right.
We've accepted a lot of crap lately just to get what we think we want, convenience is a killer.
ben_w
a day ago
Indeed, even if I were to minimise what LLMs can do, they are still achieving what "targeted advertising" very obviously isn't.
codr7
a day ago
They're both short sighted attempts at extracting profit while ignoring all negative consequences.
ben_w
18 hours ago
To extent I agree, I think that's true for all tech since the plough, fire, axles.
But I would otherwise say that most (though not all*) AI researchers seem to be deeply concerned about the set of all potential negative consequences, including mutually incompatible outcomes where we don't know which one we're even heading towards yet.
* And not just Yann LeCun — though, given his position, it would still be pretty bad even if it was just him dismissing the possibility of anything going wrong
SmartHypercube
20 hours ago
> That's what it is like watching some people try to use LLM's.
Exactly. I made a game testing prompting skills a few days earlier, to share with some close friends, and it was your comment that inspired me to translate the game into English and submitted to HN. ( https://news.ycombinator.com/item?id=41545541 )
I am really curious about how other people write prompts, so while my submission only got 7 points, I'm happy that I can see hundreds of people's own ways to write prompts thanks to HN.
However, after reading most prompts (I may missed some), I found exactly 0 prompts containing any kind of common prompting techniques, such as "think step by step", explaining specific steps to solve the problem instead of only asking for final results, few-shots (showing example inputs and outputs). Half of the prompts are simply asking AI to do the thing (at least asking correctly). The other half do not make sense, even if we show the prompt to a real human, they won't know what to reply with.
Well... I expected that SOME complaints about AI online are from people not familiar with prompting / not good at prompting. But now I realized there are a lot more people than I thought not knowing some basic prompting techniques.
Anyway, a fun experience for me! Since it was your comment made me want to do this, I just want to share it with you.
pbrowne011
2 days ago
> But no one really show how they're actually solving problems with LLMs and how the alternatives were worse for them. It's all claims that it's great with no further elaboration on the workflows.
To give an example, one person (a researcher at DeepMind) recently wrote about specific instances of his uses of LLMs, with anecdotes about alternatives to each example. [1] People on HN had different responses with similar claims with elaborations on how it has changed some of their workflows. [2]
While it would be interesting to see randomized controlled trials on LLM usage, hearing people's anecdotes brings to mind the (often misquoted) phrase: "The plural of anecdote is data". [3] [4]
[1] https://nicholas.carlini.com/writing/2024/how-i-use-ai.html
[2] https://news.ycombinator.com/item?id=41150317
[3] http://blog.danwin.com/don-t-forget-the-plural-of-anecdote-i...
[4] originally misquoted as "Anecdote is the plural of data."
kristianp
2 days ago
> (often misquoted) phrase
You misquoted it there! It should be: The plural of anecdote is data.
pbrowne011
2 days ago
Thank you! Another instance of a variant of Muphry's Law.
stavros
2 days ago
It's actually "the plural of 'anecdote' is not 'data'".
kristianp
2 days ago
In the CUDA example [1] from carlini's "how I Use AI", I would guess that o1 would need less handholding to do what he wanted.
[1] https://chatgpt.com/share/1ead532d-3bd5-47c2-897c-2d77a38964...
sweeter
2 days ago
Or people say "I've been pumping out thousands of lines of perfectly good code by writing paragraphs and paragraphs of text explaining what I want!" its like what are you programming dog? and they will never tell you, and then you look at their github and its like a dead simple starter project.
I recently built a Brainfuck compiler and TUI debugger and I tested out a few LLM's just to see if I could get some useful output regarding a few niche and complicated issues, and it just gave me garbage that looked mildly correct. Then I'm told its because I'm not prompting hard enough... I'd rather just learn how to do it at that point. Once I solve that problem, I can solve it again in the future in .25x the time.
rtsil
2 days ago
Here's the thing. 99% of people aren't writing compilers or debuggers, they're writing glorified CRUDs. LLM can save a lot of time for these people, just like 99% of people only use basic arithmetic operations, and MS Excel saves a lot of time for these people. It's not about solving new problems, it's about solving old and known problems very fast.
csmpltn
a day ago
> "99% of people aren't writing compilers or debuggers"
Look, I get the hype - but I think you need to step outside a bit before saying that 99% of the software out there is glorified CRUDs...
Think about the aerospace/defense industries, autonomous vehicles, cloud computing, robotics, sophisticated mobile applications, productivity suites, UX, gaming and entertainment, banking and payment solutions, etc. Those are not small industries - and the software being built there is often highly domain-specific, has various scaling challenges, and takes years to build and qualify for "production".
Even a simple "glorified CRUD", at a certain point, will require optimizations, monitoring, logging, debugging, refactoring, security upgrades, maintenance, etc...
There's much more to tech than your weekend project "Facebook but for dogs" success story, which you built with ChatGPT in 5 minutes...
williamcotton
2 days ago
This is almost entirely written by LLMs:
https://github.com/williamcotton/guish
I was the driver. I told it to parse and operate on the AST, to use a plugin pattern to reduce coupling, etc. The machine did the tippy-taps for me and at a much faster rate than I could ever dream of typing!
It’s all in a Claude Project and can easily and reliably create new modules for bash commands because it has the full scope of the system in context and a ginormous amount of bash commands and TypeScript in the training corpus.
KHRZ
2 days ago
One good use case is unit tests, since they can be trivial while at the same time being cumbersome to make. I could give the LLM code for React components, and it would make the tests and setup all the mocks which is the most annoying part. Although making "all the tests" will typically involve asking the LLM again to think of more edge cases and be sure to cover everything.
remoroid
2 days ago
Really? Come on. You think trying to make it solve "niche and complicated issues" for a Brainfuck compiler is reasonable? I can't take this seriously. Do you know what most developer jobs entail?
I never need to type paragraphs to get the output I want. I don't even bother with correct grammar or spelling. If I need code for x crud web app who is going to type it faster, me or the LLM? This is really not hard to understand.
itsoktocry
a day ago
For many of us programming is a means to an end. I couldn't care less about compilers.
hcks
a day ago
> I recently built a Brainfuck compiler and TUI debugger
Highly representative of what devs make all day indeed
sweeter
15 hours ago
Yea, obviously not, but the smaller problems this bigger project was composed of were things that you could see anywhere. I made heavy use of string manipulation that could be generally applied to basically anything
px1999
2 days ago
Specifically within the last week, I have used Claude and Claude via cursor to:
- write some moderately complex powershell to perform a one-off process
- add typescript annotations to a random file in my org's codebase
- land a minor feature quickly in another codebase
- suggest libraries and write sample(ish) code to see what their rough use would look like to help choose between them for a future feature design
- provide text to fill out an extensive sales RFT spreadsheet based on notes and some RAG
- generat some very domain-specific realistic sounding test data (just naming)
- scaffold out some PowerPoint slides for a training session
There are likely others (LLMs have helped with research and in my personal life too)
All of these are things that I could do (and probably do better) but I have a young baby at the moment and the situation means that my focus windows are small and I'm time poor. With this workflow I'm achieving more than I was when I had fully uninterrupted time.
ben_w
2 days ago
> But no one really show how they're actually solving problems with LLMs and how the alternatives were worse for them.
I'm an iOS dev, my knowledge of JS and CSS is circa 2004. I've used ChatGPT to convert some of my circa 2009 Java games into browser games.
> Chatting with a LLM look to me like your intent is either vague or you don't know the units to use
Or that you're moving up the management track.
Managers don't write code either. Some prefer it that way.
Workaccount2
2 days ago
I have used chatGPT to write test systems for our (physical) products. I have a pretty decent understanding of how code/programs works structurally, I just don't know the syntax/language (Python in this case).
So I can translate things like
"Create an array, then query this instrument for xyz measurements, then store those measurements in the array. Then store that array in the .csv file we created before"
It works fantastic and saved us from outsourcing.
joseluis
2 days ago
The key difference is that this is a multidisciplinary conversational interface, and a tool in itself for interrelating structured meaning and reshaping it coherently enough so that it can be of great value both in the specific domain of the dialog, and in the potential to take it on any tangent in any way that can be expressed.
Of course it has limitations and you can't be sleep at the wheel, but that's true of any tool or task.
DiogenesKynikos
2 days ago
For one, I spend less time on Stackoverflow. LLMs can usually give you the answer to little questions about programming or command-line utilities right away.
lifeformed
2 days ago
I think people who are successfully using it to write code are just chaining APIs together to make the same web apps you see everywhere.
imiric
2 days ago
The vast majority of software is "just chaining APIs together". It makes sense that LLMs would excel at code they've been trained on the most, which means they can be useful to a lot of people. This also means that these people will be the first to be made redundant by LLMs, once the quality improves enough.
elbear
2 days ago
I would say all software is chaining APIs together.
imiric
2 days ago
Well, that depends on how you look at it.
All software calls APIs, but some rely on literally "just chaining" these calls together more than writing custom behavior from scratch. After all, someone needs to write the APIs to begin with. That's not to say that these projects aren't useful or valuable, but there's a clear difference in the skill required for either.
You could argue that it's all APIs down to the hardware level, but that's not a helpful perspective in this discussion.
yuppiemephisto
2 days ago
> The fact everyone that say they've become more productive with LLMs won't say how exactly. But no one really show how they're actually solving problems with LLMs and how the alternatives were worse for them.
A pretty literal response: https://www.youtube.com/@TheRevAlokSingh/streams
Plenty of Lean 4 and Cursor.
fragmede
2 days ago
Here's one from simonw
https://gist.github.com/simonw/97e29b86540fcc627da4984daf5b7...
There are more to be found on his blog on the ai-assisted-programming tag. https://simonwillison.net/tags/ai-assisted-programming/
sumedh
2 days ago
> The fact everyone that say they've become more productive with LLMs won't say how exactly.
I have python scripts which do lot of automation like downloading pdfs, bookmarking pdfs, processing them, etc. Thanks to LLMs I dont write a python code myself, I just ask an LLM to write it, I just provide the requirement. I just copy the code generated by the AI model and run it. If there any errors, I just ask AI to fix it.
NetOpWibby
2 days ago
> The fact everyone that say they've become more productive with LLMs won't say how exactly.
Anecdotally, I no longer use StackOverflow. I don’t have to deal with random downvotes and feeling stupid because some expert with a 10k+ score on 15 SE sites each votes my question to be closed. I’m pretty tech savvy, been doing development for 15 years, but I’m always learning new things.
I can describe a rough idea of what I want to an LLM and get just enough code for me to hit the ground running…or, I can ask a question in forum and twiddle my thumbs and look through 50 tabs to hopefully stumble upon a solution in the meantime.
I’m productive af now. I was paying for ChatGPT but Claude has been my goto for the past few months.
Kiro
2 days ago
You clearly have made up your mind that it can't be right but to me it's like arguing against breathing. There are no uncertainties or misunderstandings here. The productivity gains are real and the code produced is more robust. Not in theory, but in practice. This is a fact for me and you trying to convince me otherwise is just silly when I have the result right in front of me. It's also not just boilerplate. It's all code.
mhuffman
2 days ago
>There are no uncertainties or misunderstandings here. The productivity gains are real and the code produced is more robust. Not in theory, but in practice.
So, that may be a fact for you but there are mixed results when you go out wide. For example [1] has this little nugget:
>The study identifies a disconnect between the high expectations of managers and the actual experiences of employees using AI.
>Despite 96% of C-suite executives expecting AI to boost productivity, the study reveals that, 77% of employees using AI say it has added to their workload and created challenges in achieving the expected productivity gains. Not only is AI increasing the workloads of full-time employees, it’s hampering productivity and contributing to employee burnout.
So not everyone is feeling the jump in productivity the same way. On this very site, there are people claiming they are blasting out highly-complex applications faster than they ever could, some of them also claiming they don't even have any experience programming. Then others claiming that LLMs and AI copilots just slow them down and cause much more trouble than they are worth.
It seems like just with programming itself, that different people are getting different results.
[1]https://www.forbes.com/sites/bryanrobinson/2024/07/23/employ...
Workaccount2
2 days ago
Just be mindful that it is one's interest to push the "LLMs suck, don't waste your time with them" narrative once they figure out how to harness LLMs.
"Jason is a strong coder, and he despises AI tools!"
cjbgkagh
2 days ago
In my view these models produce above average code which is good enough for most jobs. But the hacker news sampling could be biased towards the top tier of coders - so their personal account of it not being good enough can also be true. For me the quality isn't anywhere close to good enough for my purposes, all of my easy code is already done so I'm only left working on gnarly niche stuff which the LLMs are not yet helpful with.
For the effect on the industry, I generally make the point that even if AI only replaces the below average coder it will cause a downward pressure on above average coders compensation expectation.
Personally, humans appear to be getting dumber at the same time that AI is getting smarter and while, for now, the crossover point is at a low threshold that threshold will of course increase over time. I used to try to teach ontologies, stats, SMT solvers to humans before giving up and switching to AI technologies where success is not predicated on human understanding. I used to think that the inability for most humans to understand these topics was a matter of motivation, but have rather recently come to understand that these limitations are generally innate.
rvnx
2 days ago
It is also a problem of ego.
It is difficult if you have been told all your life that you are the best, to accept the fact that a computer or even other people might be better than you.
It requires lot of self-reflection.
Real top-tiers programmers actually don’t feel threatened by LLMs. For them it is just one more tool in the toolbox like syntax highlighting or code completion.
They choose to use these tools based on productivity gains or losses, depending on the situation.
hatefulmoron
2 days ago
Not to diminish your point at all: I think it's also just a fear that the fun or interesting part of the task is being diminished. To say that the point of programming is to solve real world problems ('productivity') is true, but in my experience it's not necessarily true for the person doing the solving. Many people who work as programmers like to program (as in, the process of working with code, typing it, debugging it, building up solutions from scratch), and their job is an avenue to exercise that part of their brain.
Telling that sort of person that they're going to be more productive by skipping all the "time consuming programming stuff" is bound to hurt.
elbear
2 days ago
The solution to this is to code your own things for fun.
p-e-w
2 days ago
> Real top-tiers programmers actually don’t feel threatened by LLMs.
They should, because LLMs are coming for them also, just maybe 2-3 years later than for programmers that aren't "real top-tier".
The idea that human intellect is something especially difficult to replicate is just delusional. There is no reason to assume so, considering that we have gone from hole card programming to LLMs competing with humans in a single human lifetime.
I still remember when elite chessplayers were boasting "sure, chess computers may beat amateurs, but they will never beat a human grandmaster". That was just a few short years before the Deep Blue match.
The difference is that nobody will pay programmers to keep programming once LLMs outperform them. Programmers will simply become as obsolete as horse-drawn carriages, essentially overnight.
kaoD
2 days ago
> They should, because LLMs are coming for them also, just maybe 2-3 years later than for programmers that aren't "real top-tier".
Would you be willing to set a deadline (not fuzzy dates) when my job is going to be taken by an LLM and bet $5k on that?
Because the more I use LLMs and I see their improvement rate, the less worried I am about my job.
The only thing that worries me is salaries going down because management cannot tell how bad they're burying themselves into technical debt and maintenance hell, so they'll underpay a bunch of LLM-powered interns... which I will have to clean up and honestly I don't want to (I've already been cleaning enough shit non-LLM code, LLMs will just generate more and more of that).
smallnamespace
a day ago
> Would you be willing to set a deadline (not fuzzy dates) when my job is going to be taken by an LLM and bet $5k on that?
This is just a political question and of course so long as humans are involved in politics they can just decide to ban or delay new technologies, or limit their deployment.
Also in practice it's not like people stopped traditional pre-industrial production after industrialization occurred. It's just that pre-industrial societies fell further and further behind and ended up very poor compared to societies that chose to adopt the newest means of production.
I mean, even today, you can make a living growing and eating your own crops in large swathes of the world. However you'll be objectively poor, making only the equivalent of a few dollars a day.
In short I'm willing to bet money that you'll always be able to have your current job, somewhere in the world. Whether your job maintains its relative income and whether you'd still find it attractive is a whole different question.
airspresso
2 days ago
> The difference is that nobody will pay programmers to keep programming once LLMs outperform them. Programmers will simply become as obsolete as horse-drawn carriages, essentially overnight.
I don't buy this. A big part of the programmer's job is to convert vague and poorly described business requirements into something that is actually possible to implement in code and that roughly solves the business need. LLMs don't solve that part at all since it requires back and forth with business stakeholders to clarify what they want and educate them on how software can help. Sure, when the requirements are finally clear enough, LLMs can make a solution. But then the tasks of testing it, building, deploying and maintaining it remain too, which also typically fall to the programmer. LLMs are useful tools in each stage of the process and speed up tasks, but not replacing the human that designs and architects the solution (the programmer).
concordDance
2 days ago
> > Real top-tiers programmers actually don’t feel threatened by LLMs.
> They should, because LLMs are coming for them also, just maybe 2-3 years later than for programmers that aren't "real top-tier".
Not worrying about that because if they've gotten to that point (note: top tier programmers also need domain knowledge) then we're all dead a few years later.
bcoates
2 days ago
Re: Compensation expectations, I figured out a long time ago that bad programmers create bad code, and bad code creates work for good programmers.
If the amount of bad code is no longer limited by the availability of workers who can be trained up to "just below average" and instead anyone who knows how to work a touchscreen can make AI slop, this opens up a big economic opportunity.
cjbgkagh
2 days ago
One could hope, but in my view perception precedes reality and even if that is the reality the perception is that AI will lower compensation demands and those doing the layoffs/hiring will act accordingly.
You could also make the same claims about outsourcing, and while it appears that in most cases the outsourcing doesn't pay off, the perception that it would has really damaged CS as a career.
tzs
2 days ago
And like with outsourcing it starts with the jobs at the lower end of the skill range in an industry, and so people at the higher end don't worry about it, and later it expands and they learn that they too are not safe.
What happened a couple of decades ago in poetry [1] could happen now with programming:
> No longer is it just advertising jingles and limericks made in Haiti and Indonesia. It's quatrains, sonnets, and free-form verse being "outsourced" to India, the Philippines, Russia, and China.
...
> "Limericks are a small slice of the economy, and when people saw globalization creating instability there, a lot said, 'It's not my problem,'" says Karl Givens, an economist at Washington's Economic Policy Institute. "Now even those who work in iambic pentameter are feeling it."
acedTrex
2 days ago
Anything that makes fewer people get into programming is good for the field of CS. Only those who truly care go into it
delusional
2 days ago
What sort of problems do you solve? I tried to use it. I really did. I've been working on a tree edit distance implementation base on a paper from 95. Not novel stuff. I just can't get it to output anything coherent. The code rarely runs, it's written in absolutely terrible style, it doesn't follow any good practices for performant code. I've struggled with getting it to even implement the algorithm correctly, even though it's in the literature I'm sure it was trained on.
Even test cases have brought me no luck. The code was poorly written, being too complicated and dynamic for test code in the best case and just wrong on average. It constantly generated test cases that would be fine for other definitions of "tree edit distance" but were nonsense for my version of a "tree edit distance".
What are you doing where any of this actually works? I'm not some jaded angry internet person, but I'm honestly so flabbergasted about why I just can't get anything good out of this machine.
macrolime
2 days ago
This kind of problems is really not where LLMs shine.
Where you save loads of time is when you need to write lots of code using unfamiliar APIs. Especially when it's APIs you won't work with a lot and spending loads of time learning then would just be a waste of time. In these cases LLMs call tell you the correct API cells and it's easy to verify. The LLM isn't really solving some difficult technical problem, but saves lots of work.
throwaway765123
2 days ago
This exactly. LLMs can't reason, so we shouldn't expect them to try. They can do translation extremely well, so things like converting descriptions to 90-95% correct code in 10-100x less time, or converting from one language to another, are the killer use cases IMO.
But expecting them to solve difficult unsolved problems is a fundamental misunderstanding of what they are under the hood.
delusional
2 days ago
I picked this problem specifically because it's about "converting from one language to another". The problem is already solved in the literature. I understand that doing cutting edge research is a different problem, and that is explicitly not what I'm doing here, nor what I am expecting of the tool. I have coauthored an actual published computer science paper, and this excercise is VERY far from the complexity of that.
Could you share some concrete experience of a problem where aider, or a tool like it, helped you? What was your workflow, and how was the experience?
kaoD
2 days ago
I'm a senior engineer (as in, really senior, not only years of experience). I can get familiar with unfamiliar APIs in a few hours and then I can be sure I'm doing the right thing, instead of silently failing to meet edge cases and introducing bugs because I couldn't identify what was wrong in the LLM output (because, well, I'm unfamiliar with the API in the first place).
In other words: LLMs don't solve any noteworthy problems, at least yet.
delusional
2 hours ago
I feel sort of the same way but I'm desperate to understand what I'm missing. So many people sing such high praises. Billions are being invested. People are proclaiming the end of software developers. What I'm looking at can't be the product they are talking about.
I'm perfectly happy reading man pages personally. Half the fun of programming to me is mastering the API to get something out of it nobody expected was in there. To study the documentation (or implementation) to identify every little side effect. The details are most of the fun to me.
I don't really intend to use the AI for myself, but I do really wish to see what they see.
Tainnor
2 days ago
Maybe for happy path cases. I've tried to ask ChatGPT how you can do a certain non-obvious thing with Kafka, and it just started inventing things. Turns out, that thing isn't actually possible to do with Kafka (by design).
thesz
2 days ago
I think that contemporary models are trained for engagement, not for actual help.
My experience is the same as yours, but I noticed that while LLMs circa two years ago tried to come up with the answer, current generation of LLMs tries to make me come with the answer. And that not helping at all.
bongodongobob
2 days ago
Did you tell it that? Are you trying to converse and discuss or are you trying to one shot stuff? If it gets something wrong, tell it. Don't just stop and try another prompt. You have to think of it as another person. You can talk to it, question it, guide it.
Try starting from ground zero and guiding it to the solution rather than trying to one shot your entire solution in one go.
I want you to implement this kind of tree in language x.
Ok good, now I want you to modify it to do Y.
Etc.
delusional
2 days ago
I've tried both. One time I actually tried so hard that I ran out of context, and aider just dumped me back to the main prompt. I don't think It's possible to guide it any more than that.
My problem is that the solution is right there in the paper. I just have to understand it. Without first understanding that paper, I can't possibly guide the AI towards a reasonable implementation. The process of finding the implementation is exactly the understanding of the paper, and the AI just doesn't help me with that. In fact, all too often I would ask it to make some minor change, and it would start making random changes all over the file, completely destroying my mental model of how the program worked. Making it change that back completely pulls me out of the problem.
When it's a junior at my job, at least I can feel like I'm developing a person. They retain the conversation and culture I impart as part of the problem solving process. When I struggle against the computer, it's just a waste of my time. It's not learning anything.
I'm still really curious what you're doing with it.
minkles
2 days ago
That’s fine until your code makes its way to production, an unconsidered side effect occurs and then you have to face me.
You are still responsible for what you do regardless of the means you used to do it. And a lot of people use this not because it’s more productive but because it requires less effort and less thought because those are the hard bits.
I’m collecting stats at the moment but the general trend in quality as in producing functional defects is declining when an LLM is involved in the process.
So far it’s not a magic bullet but a push for mediocrity in an industry with a rather bad reputation. Never a good story.
blargey
2 days ago
Wasn't there a recent post about many research papers getting published with conclusions derived from buggy/incorrect code?
I'd put more hope in improving LLMs/derivatives than improving the level of effort and thought in code across the entire population of "people who code", especially the subset who would rather be doing something else with their time and effort / see it as a distraction from the "real" work that leverages their actual area of expertise.
a_wild_dandan
2 days ago
> You are still responsible for what you do regardless of the means you used to do it. And a lot of people use this not because it’s more productive but because it requires less effort and less thought because those are the hard bits.
Yeah, that's...the whole point of tools. They reduce effort. And they don't shift your responsibility. For many of us, LLMs are overwhelmingly worth the tradeoffs. If your experience differs, then it's unfortunate, and I hate that for you. Don't use 'em!
bongodongobob
2 days ago
Ugh, dude, I used to push bad code into production without ChatGPT. It is such a stupid argument. Do you really think people are just blindly pushing code they can't make heads or tails of? That they haven't tested? Do you seriously think people are just one shotting code and blasting it into prod? I am completely baffled by people in this industry that just don't get it. Learn to prompt. Write tests. Wtf.
hughesjj
2 days ago
My problem is that, for a surprising number of applications, it's taken me longer to have the conversation with chatgpt to get the code I want than just doing it myself.
Copilot and the likes are legit for boilerplate, some test code, and posix/power shell scripting. Anything that's very common it's great.
Anything novel though and it suffers. Did AWS just release some new functionality and only like 4 people have touched it so far on GitHub? Are you getting source docs incomplete or spread out amongst multiple pages with some implicit/betwen-the-lines spec? Eh, good luck, you're probably better off just reading the docs yourself or guess and checking.
Same goes for versioning, sometimes it'll fall back into an older version of the system (ex Kafka with kraft vs zookeeper)
Personally, the best general use case of LLMs for me is focus. I know how to break down a task, but sometimes I have an issue staying focused on doing it and having a reasonably competent partner to rubber duck with is super useful. It helps that the chat log then becomes an easy artifact to more or less copy paste, and chatgpt doesn't do a terrible job reformatting either. Like for 90% of the stuff it's easier than using vim commands.
lanstin
2 days ago
It seems great for like straightforward linear code, elisp functions, python data massage scripts, that sort of thing. I had it take a shot at some new module for a high volume Go server with concurrency/locking concerns and nil pointer receivers. I got more panics from the little bit of code GPT wrote than all my own code, not because it was bad code but because when I use dangerous constructs like locking and pointers that can be nil, I have certain rigid rules for how to use them and the generated code did not follow those rules.
sensanaty
a day ago
> Do you really think people are just blindly pushing code they can't make heads or tails of? That they haven't tested?
Yes, most definitely. I've recently been introduced to our CTOs little pet project that he's been building with copious help from ChatGPT, and it's genuinely some of the most horrid code I've ever seen in my professional career. He genuinely doesn't know what half of it even does when I quizzed him about some of the more egregious crap that was in there. The real fun part is that now that it's a "proven" PoC some poor soul is going to have to maintain that shit.
We also have a mandate from the same CTO to use more AI in our workflows, so I have 0 doubts in my mind that people are blindly pushing code without thinking about it, and people like myself are left dealing with this garbage. My time & energy is being wasted sifting through AI-generated garbage that doesn't pass the smell test if you spend a singular minute of effort reading through the trash it generates.
minkles
2 days ago
Yes that's exactly what they are doing.
I literally had someone with the balls to tell me that it was ChatGPT's fault.
Due diligence and intelligence has shit the fucking bed quite frankly.
hobs
2 days ago
Do you think ChatGPT has changed any of those answers from Yes to No? Because it hasn't.
People blindly copied stack overflow code, they blindly copied every example off of MSDN, they blindly copy from ChatGPT - your holier than thou statements are funny, and frankly most LLMs cannot leave a local maxima, so anyone who says they dont write any code anymore I frankly think they are not capable of telling the mistakes, both architecturally and specifically that they are making.
More and different prompting will not dig you out of the hole.
kaoD
2 days ago
This. Most people I know that use LLMs to be super productive are like "make me a button, it's red" (hyperbolic statement but you know what I mean). I can do that faster and better myself.
When I'm deeply stuck on something and I think "let's see if an LLM could help here", I try (and actually tried many times) to recruit those prompting gurus around me that swear LLMs solve all their problems... and they consistently fail to help me at all. They cannot solve the problem at all and I'm just sitting there, watching the gurus spend hours prompting in circles until they give up and leave (still thinking LLMs are amazing, of course).
This experience is what makes me extremely suspicious of anyone on the internet claiming they don't write code anymore but refusing to show (don't tell!) -- when actually testing it in real life it has been nothing but disappointment.
scubbo
2 days ago
> Do you really think people are just blindly pushing code they can't make heads or tails of? That they haven't tested? Do you seriously think people are just one shotting code and blasting it into prod?
Yes, and I see proof of it _literally every day_ in Code Reviews where I ask juniors to describe or justify their choices and they shrug and say "That's what Copilot told me to put".
mewpmewp2
2 days ago
That sounds more like poor hiring decisions.
benterix
2 days ago
> I've found that I haven't written a line of code in weeks
Which is great until your next job interview. Really, it's tempting in the short run but I made a conscious decision to do certain tasks manually only so that I don't lose my basic skills.
vasco
2 days ago
ChatGPT voice interface plugged into the audio stream, with the prompt:
- I need you to assist me during a programming interview, you will be listening to two people, the interviewer and me. When the interviewer asks a question, I'd like you to feed me lines that seem realistic for an interview where I'm nervous, don't give me a full blown answer right away. Be very succinct. If I think you misunderstood something, I will mention the key phrase "I'm nervous today and had too much coffee". In this situation, remember I'm the one that will say the phrase, and it might be because you've mistaken me by the interviewer and I want you to "reset". If I want you to dig deeper than what you've provided me with, I'll say the key phrase "Let's dig deeper now". If I think you've hallucinated and want you to try again, I'll say "This might be wrong, let me think for just a minute please". Remember, other than these key phrases, I'll only be talking to the interviewer, not you.
On a second screen of some sort. Other than that, interviewers will just have to accept that nobody will be doing the job without these sort of assistants from now on anyway. As an interviewer I let candidates consult online docs for specific things already because they'll have access to Google during the job, this is just an extension of that.
gcanyon
2 days ago
I recently interviewed a number of people about their SQL skills. The format I used was to share two queries with them a couple days ahead of time in a google doc, and tell them I will ask them questions about those queries during the interview.
Out of maybe twenty people I interviewed this way, only three of them pointed out that one of the queries had a failing error in it. It was something any LLM would immediately point out.
Beyond that: the first question I asked was: "What does this query do, what does it return?" I got responses ranging from people who literally read the query back to me word by word, giving the most shallow and direct explanation of what each bit did step-by-step, to people who clearly summarized what the query did in high-level, abstract terms, as you might describe what you want to accomplish before you write the query.
I don't think anyone did something with ChatGPT live, but maybe?
apsurd
2 days ago
This made me laugh. I can't deny it isn't already happening. But wow people work so hard to avoid working hard.
throwaway765123
2 days ago
It's not about avoiding hard work - the audience on HN skews wealthy due to heavy representation of skilled devs in their 30s+, but the average person does not earn anything close to FAANG salaries. Even most devs in general don't earn like that. The interview process being fairly well understood in general, any advantage that can possibly get a person from $60k/year to generationally-life-changing $300k/year will be used eventually.
vasco
2 days ago
And I wrote this as a knee-jerk reaction after reading the parent, I imagine people will be putting way more effort if it can get them a great job. And to be honest, if they can fool you, they can most likely do the job as well. Most of the industry tests at a higher skill level than what they actually require on the day to day anyway.
bessbd
2 days ago
It's almost inspiring, isn't it?
elbear
2 days ago
I think the point is to avoid pointless hard work.
jamesmotherway
2 days ago
Not everyone is doing coding interviews. Some might struggle with a particular language due to lack of muscle memory, but can dictate the logic in pseudocode and can avoid pitfalls inferred from past experience. This sort of workflow is compatible with LLMs, assuming a sufficient background (otherwise one can't recognize when the output diverges from your intent).
I personally treat the LLM as a rubber duck. Often I reject its output. In other cases, I can accept it and refactor it into something even better. The name of the game is augmentation.
dmd
2 days ago
I sometimes get the idea from statements like this - and HN's focus on interviewing in general - that people are switching jobs a dozen times a year or something. How often are most people switching jobs? I've had 5 jobs in the last 20 years.
macintux
2 days ago
I'm old, and well-paid for my geographic region (but for various mostly stupid reasons utterly broke). No amount of value created (at least, for my skill level) will protect me from ageism and/or budget cuts.
ed
2 days ago
This. I’ve been using elixir for ~6 months (guided by Claude) and probably couldn’t solve fizz buzz at a whiteboard without making a syntax error. Eek.
stavros
2 days ago
Who cares? If I'm hiring you to make a product, I care that the higher order logic is correct, that the requirements are all catered for, and that the code does reasonable things in all cases. Things I don't care about are FizzBuzz, programming on whiteboards, and not making syntax errors.
kaoD
2 days ago
This is how companies fail. 5 years down the line no one is able to change anything in the system because it's so poorly architected (by being a bunch of Claude copypastes cobbled together) that it takes one month to do a one-day task (if it's even possible).
stavros
2 days ago
I guess we should change our hiring practices to optimize for FizzBuzz and getting all the syntax right first try.
kaoD
2 days ago
I can see how you got that impression from my comment (if you ignore how I mentioned architecture), so let me elaborate:
It's the opposite. FizzBuzz and getting the syntax right is what LLMs are good at... but there's so much more nuance at being experienced with a language/framework/library/domain which senior engineers understand and LLMs don't.
Being able to write Elixir assisted by an LLM does not mean you can produce proper architecture and abstractions even if the high level ideas are right. It's the tacit knowledge and second-order thinking that you should hire for.
But the thing is, if someone cannot write Elixir without syntax errors unless using an LLM, well, that's a extremely good proxy that they don't know the ins and outs of the language, ecosystem, best practices... Years of tacit knowledge that LLMs fail to use because they're trained on a huge number of tutorial and entry-level code ridden with the wrong abstractions.
The only code worse than one that doesn't work is one that kinda works unless your requirements change ever so slightly. That's a liability and you will pay it with interests.
To give a concrete example: I am very experienced with React. Very. A lot. The code that LLMs write for it is horrid, bug-ridden, inflexible and often misuses its footgun-y APIs like `useEffect` like a junior fresh out of a boot camp would, directly contradicting the known best practices for maintainable (and often even just "correct") code. But yeah it superficially solves the problem. Kinda. But good luck when the system needs to evolve. If it cannot do proper code that's <500 lines how do you expect it to deal with massive systems that need to scale to 10s of KLOC across an ever-growing twine?
But management will be happy because the feature shipped and time to market was low... until you can no longer ship anything new and you go out of business.
stavros
2 days ago
Ah, sorry, I read your comment as disagreeing with me, now I see it's the opposite. Exactly, LLMs (for now) are good at writing low-level code, but we need someone to work on architecture.
I had an idea the other day of an LLM system that would start from a basic architecture of an app, and would zoom down and down on components until it wrote the entire codebase, module by module. I'll try that, it sounds promising.
LouisSayers
2 days ago
You need to prep for job interviews anyway. I'd rather spend the majority of my time being productive.
calmworm
2 days ago
Job interview? You might be surprised at the number of us who don’t code for a job.
__loam
2 days ago
I'd bet most people on this forum program professionally.
calmworm
2 days ago
I would take that bet.
idiotsecant
2 days ago
Me too.
atomic128
2 days ago
Somebody tested people on Hacker News to evaluate programming competency.
This was part of a larger evaluation comparing the Hacker News population to people on Reddit programming subreddits.
Here is a very heated discussion of the result:
https://news.ycombinator.com/item?id=33293522
It appears that Hacker News is perhaps NOT populated by the programming elite. In contrast, there are real wizards on Reddit.
Surprising, I know.
calmworm
a day ago
Not surprised there would be a “heated” discussion as a result of this one link, that measured only those who engaged it, and how? I opened the link, hit Submit just to see what would happen… now the percentage of HN users who are competent programmers is even fewer than before, by that metric.
__loam
2 days ago
Not surprising given how bad the takes here are and how many of the users here are dumb kids right out of college who are aspiring founders.
calmworm
a day ago
Unnecessarily negative. Maybe rethink it.
whamlastxmas
2 days ago
I’ve made the decision to embrace being bad at coding but getting a ton of work done using an LLM and if my future employer doesn’t want massive productivity and would prefer being able to leetcode really well then I unironically respect that and that’s ok.
I’m not doing ground breaking software stuff, it’s just web dev at non massive scales.
__loam
2 days ago
You future employer might expect you to bring some value through your expertise that doesn't come from her LLM. If you want to insist on degrading your own employability like this, I guess it's your choice.
fragmede
2 days ago
For the most part, businesses don't care how you deliver value, just that you do. If programmer A does a ticket in 3 days with an LLM, and programmer B takes a week to do the same ticket, but doesn't use an LLM, with programmer B choosing not to out of some notion of purity, who's more employable?
__loam
2 days ago
Productivity is not the only aspect of our profession that matters, and in fact it's probably not even the most important part. I'm not suggesting we get stuck or handcraft every aspect of our code, and there are multitudes of abstractions and tools that enhance productivity, including everything from frameworks to compilers.
What I'm saying is what the original comment is doing, having the LLM write all their code, will make them a less valuable employee in the long term. Participating in the act of programming makes your a better programmer. I'd rather have programmer B if they take the time to understand their code, so that when that code breaks at 4am and they get the call, they can actually fix it rather than be in a hole they dug with LLMs that they can't dig out of.
roenxi
2 days ago
You don't need to call them at 4am, you can keep a git log of the prompts that were used to generate the code and some professional 4am debugger can sit there and use an LLM to fix it.
Probably not a practical option yet, but if we're looking at the long term that is where we are heading. Or, realistically, the even longer term where the LLM self-heals broken systems.
dvfjsdhgfv
2 days ago
While a git log of prompts seems like a novel idea to me, I don't believe it would work - not because of temperature and LLMs being non-deterministic and the context window overflowing, but because at a certain level of complexity LLMs simply fail, even though they are excellent at fixing simple bugs.
__loam
2 days ago
Lol, yeah the prompt is definitely going to help clarify what the code actually does.
Der_Einzige
2 days ago
See, if you work in AI, say, as an AI researcher, asking them not to be allowed to use AI models in the interview is basically not an option.
Also, often folks in this space are better at cheating than you will be at detecting them. Don't believe me? https://bigvu.tv/captions-video-maker/ai-eye-contact-fix
apsurd
2 days ago
LLMs are certainly not useless.
But "lines of code written" is a hollow metric to prove utility. Code literacy is more effective than code illiteracy.
Lines of natural language vs discrete code is a kind of preference. Code is exact which makes it harder to recall and master. But it provides information density.
> by just knuckling down and learning how to do the work?
This is the key for me. What work? If it's the years of learning and practice toward proficiency to "know it when you see it" then I agree.
smileson2
2 days ago
we're a post illiteracy society now
acedTrex
2 days ago
> I've found that I haven't written a line of code in weeks
How are people doing this, none of the code that gpt4o/copilot/sonnet spit out i ever use because it never meets my standards. How are other people accepting the shit it spits out.
viraptor
2 days ago
You're listing plain models, so I'm assuming you're using them directly. Aider and similar agents use those models but they don't step at the first answer. You can add test running and a linter to the request and it will essentially enter a loop like: what are the steps to solve (prompt)?; here's a map of the repository, which files do you need?; what's your proposed change?; here's the final change and the test run, do you think the problem has been solved?; (go back to the beginning if not)
See the video at https://plandex.ai/ to get an idea how it works.
acedTrex
2 days ago
That just sounds/looks like more work then just doing it normally? what am I missing?
viraptor
2 days ago
Depends on the task but if you're going high level enough, it's not more work. Think about it this way: if you're doing proper development you're going to write code, tests and commit messages. Since you know what you want to achieve, write a really good commit message as the prompt, start writing tests and let the agent run in the meantime. Worst case, it doesn't work and you do the code yourself. Best case, it worked and you saved time.
(Not sure if that was clear but the steps/loop described before happens automatically, you're not babysitting it)
freeone3000
2 days ago
You put it behind an API call and run the loop automatically for every coding query
namanyayg
2 days ago
I'm using Cursor and till now the "test run" part is manual, like Cursor doesn't care about testing or actually checking the code it wrote works
Any tips how I could integrate that? Do I need to switch to aider/plandex?
anujsjpatel
2 days ago
For someone who didn't study a STEM subject or CS in school, I've gone from 0 to publishing a production modern looking app in a matter of a few weeks (link to it on my profile).
Sure, it's not the best (most maintainable, non-redundant styling) code that's powering the app but it's more than enough to put an MVP out to the world and see if there's value/interest in the product.
threeseed
2 days ago
> HN, and the internet in general, have become just an ocean of reactionary sandbagging and blather about how "useless" LLMs are.
This is cult like behaviour that reminds me so much of the crypto space.
I don't understand why people are not allowed to be critical of a technology or not find it useful.
And if they are they are somehow ignorant, over-reacting or deficient in some way.
wenc
2 days ago
I think it's perfectly ok to be critical of technology as long as one is thoughtful rather than dismissive. There is a lot of hype right now and pushing back against it is the right thing to do.
I'm more reacting against simplistic and categorical pronouncements of straight up "uselessness," which to me seems un-curious and deeply cynical, especially since it is evidentially untrue in many domains (though it is true for some domains). I just find this kind of emotional cynicism (not a healthy skepticism, but cynicism) to be contrary to the spirit of innovation and openness, and indeed contrary to evidence. It's also an overgeneralization -- "I don't find it useful, so it's useless" -- rather than "Why don't I find it useful, and why do others do? Let me learn more."
As future-looking HNers, I'd expect we would understand the world through a lens of "trajectories" rather than "current state". Just because LLMs hallucinate and make mistakes with a tone of confidence today -- a deep weakness -- doesn't mean they are altogether useless. We've witnessed that despite their weaknesses, we are getting a lot of value from them in many domains today and they are getting better over time.
Take neural networks themselves for instance. For most of the 90s-2000s, people thought they were a dead end. My own professor had great vitriol against Neural Networks. Most of the initial promises in the 80s truly didn't pan out. Turns out what was missing was (lots of) data, which the Internet provided. And look where we are today.
Another area of cynicism is self-driving cars (Level 5). Lots of hype and overpromise, and lots of people saying it will never happen because it requires a cognitive model of the world, which is too complicated, and there are too many exceptional cases for there to ever be Level 5 autonomy. Possibly true, but I think "never" is a very strong sentiment that is unworthy of a curious person.
rainsford
2 days ago
I generally agree, although an important aspect of thinking in terms of "trajectories" is recognizing when a particular trajectory might end up at a dead end. One perspective on the weaknesses of current LLMs is that it's just where the things are today and they can still provide value even while the technology improves. But another perspective is that the persistence of these weaknesses indicates something more fundamentally broken with the whole approach that means it's not really the path towards "real" AI, even if you can finesse it into doing useful things in certain applications.
There's also an important nuance differentiating rejection of a general technological endpoint (e.g. AGI or Level 5 self-driving cars) with a particular technological approach to achieving those goals (e.g. current LLM design or Tesla's autopilot). As you said, "never" is a long time and it takes a lot of unwarranted confidence to say we will never be able to achieve goals like AGI or Level 5 self-driving. But it seems a lot more reasonable to argue Tesla or OpenAI (and everyone else doing essentially the same thing as OpenAI) are fundamentally on the wrong track to achieving those goals without significantly changing their approach.
I agree that none of that really warrants dismissive cynicism of new technology, but being curious and future-looking also requires being willing to say when you think something is a bad approach even if it's not totally useless. Among other reasons, our ability to explore new technology is not limitless, and hype for a flawed technology isn't just annoying but may be sucking all the oxygen out of the room not leaving any for a potentially better alternative. Part of me wants to be optimistic about LLMs, but another part of me thinks about how much energy (human and compute) has gone into this thing that does not seem to be providing a corresponding amount of value.
wenc
2 days ago
I appreciate this thoughtful comment.
You are absolutely right that the trajectories, if taken linearly, might hit a dead end. I should clarify that when I mentioned "trajectories" I don't mean unpunctuated ones.
I am myself not convinced that LLMs -- despite their value to me today -- will eventually lead to AGI as a matter of course, nor the type of techniques used in autopilot will lead to L5 autonomy. And you're right that they are consuming a lot of our resources, which could well be better invested in a possibly better alternative.
I subscribe to Thomas Kuhn's [1] idea of scientific progress happening in "paradigms" rather than through a linear accumulation of knowledge. For instance, the path to LLMs itself was not linear, but through a series of new paradigms disrupting older ones. Early natural language processing was more rule-based (paradigm), then it became more statistical (paradigm), and then LLMs supplanted the old paradigms through transformers (paradigm) which made it scale to large swaths of data. I believe there is still significant runway left for LLMs, but I expect another paradigm must supplant it to get closer to AGI. (Yann Lecun said that he doesn't believe LLMs will lead to AGI).
Does that mean the current exuberant high investments in LLMs are misplaced? Possibly, but in Kuhn's philosophy, typically what happens is a paradigm will be milked for as much as it can be, until it reaches a crisis/anomaly when it doesn't work anymore, at which point another paradigm will supplant it.
At present, we are seeing how far we can push LLMs, and LLMs as they are have value even today, so it's not a bad approach per se even though it will hit its limits at some point. Perhaps what is more important are the second-order effects: the investments we are seeing in GPUs (essentially we are betting on linear algebra) might unlock the kind of commodity computational power the next paradigm needs to disrupt the current one. I see parallels between this and investments in NASA resulting in many technologies that we take for granted today, and military spend in California producing the technology base that enabled Silicon Valley today. Of course, these are just speculations and I have no more evidence that this is happening with LLMs than anyone else.
I appreciate your point however and it is always good to step back and ask, non-cynically, whether we are headed down a good path.
[1] https://en.wikipedia.org/wiki/The_Structure_of_Scientific_Re...
threeseed
2 days ago
This entire comment can be summarised as: everyone who doesn't think like me is wrong.
Not everyone is interested in seeing the world through the hopes and dreams of e/acc types and would prefer to see it as it is today.
LLMs are a technology. Nothing more. It can be as amazing or useless as anyone likes.
fragmede
2 days ago
And this comment can be summarized as "Nuh uh, I'm right". When summarizing longer bits of text down to a single sentence, nuance and meaning gets lost, making the summarization ultimatele useless, contributing nothing to the discussion.
ben_w
2 days ago
Crypro and AI have similarities and differences.
The similarities include intense "true believer" pitches and governments taking them seriously.
The differences include that the most famous cryptocurrency can't function as a direct payment mechanism for just lunch purchases in just Berlin (IIRC nor is it enough for all interbank transactions so it can't even be a behind-the-scenes system by itself), while GenAI output keeps ending up in places people would rather not find it like homework and that person on Twitter who's telling you Russia Did Nothing Wrong (and also giving you a nice cheesecake recipe because they don't do any input sanitation).
wenc
2 days ago
Also, I'm deeply skeptical of crypto too due to its present scamminess, but I am keeping an open mind that there is a future in which crypto -- once it gets over this phase of get-rich-quick schemers -- will be seen as just another asset class.
I read somewhere that historically bonds in their early days were also associated with scamminess but today they're just a vanilla asset.
rainsford
2 days ago
I'm honestly more optimistic about cryptocurrency as a mechanism of exchange rather than an asset. As a mechanism of exchange, cryptocurrency has some actually novel properties like distributed consensus that could be useful in certain cases. But an asset class which has zero backing value seems unworkable except for wild speculation and scams. Unfortunately the incentives around most cryptocurrencies (and maybe fundamental to cryptocurrency as an idea) greatly emphasize the asset aspects, and it's getting to be long enough since it became a thing that I'm starting to become skeptical cryptocurrency will be a real medium of exchange outside of illegal activities and maybe a few other niche cases.
evilfred
2 days ago
bonds have utility, crypto does not
evilfred
2 days ago
just like with crypto and NFTs and the metaverse, they are always focused on what is suppsoedly coming down the pipe in the future and not what is actually possible today
rafaelmn
2 days ago
I use sonet 3.5 and while it's actually usable for codegen (compared to gpt/copilot) it's still really not that great. It does well at tasks like "here's a stinky collection of tests that accrued over time - clean this up in style of x" but actually writing code still shows fundamental lack of understanding of underlying API and problem (the most banal example being constantly generating `x || Array.isArray(x)` test)
wokwokwok
2 days ago
> I've found that I haven't written a line of code in weeks
Please post a video of your workflow.
It’s incredibly valuable for people to see this in action, otherwise they, quite legitimately, will simply think this is not true.
Kiro
2 days ago
Who cares what they think? In fact, the fewer who uses this the better for the ones that do. It's not in my self-interest to convert anyone and I obviously don't need to convince myself when I have the result right in front of me. Whether you believe it or not does not make me less productive.
wokwokwok
2 days ago
The obvious answer is you’ll get called a liar and shrill.
I’m not saying you are; I think there are a lot of legitimate AI workflows people use.
…but, there are a lot of people trying to sell AI, and that makes them say things about it which are just flat out false.
/shrug
But you know; freedom of speech; you can say whatever you want if you don’t care what people think of you.
My take on it is showing people things (videos, blogs, repos, workbooks like Terence posted) moves the conversation from “I don’t believe you” to “let’s talk about the actual content”. Wow, what an interesting workflow, maybe I’ll try that…
If you don’t want to talk to people or have a discussion that extends beyond meaningless trivia like “does AI actually have any value” (obviously flame bait opinions only comment threads)… why are you even here?
If you don’t care, then fine. Maybe someone else will and they’ll post an interesting video.
Isn’t that the point of reading HN threads? What do you win by telling people not to post examples of their workflow?
It’s incredibly selfish.
perching_aix
2 days ago
> HN, and the internet in general, have become just an ocean of reactionary sandbagging and blather about how "useless" LLMs are.
Now imagine how profoundly depressing it is to visit a HN post like this one, and be immediately met with blatant tribalism like this at the very top.
Do you genuinely think that going on a performative tirade like this is what's going to spark a more nuanced conversation? Or would you rather just the common sentiment be the same as yours? How many rounds of intellectual dishonesty do we need to figure this out?
riku_iki
2 days ago
> Meanwhile, in the real world, I've found that I haven't written a line of code in weeks. Just paragraphs of text that specify what I want and then guidance through and around pitfalls in a simple iterative loop of useful working code.
could it be that you are mostly engaged in "boilerplate coding", where LLMs are indeed good?
holoduke
2 days ago
People in general don't like change and are naturally defending against it. And the older people get the greater the percentage of people fighting against it. A very useful and powerful skill is to be flexible and adaptable. You positioned yourself in the happy few.
_wire_
a day ago
> Meanwhile, in the real world, I've found that I haven't written a line of code in weeks. Just paragraphs of text that specify what I want and then guidance through and around pitfalls in a simple iterative loop of useful working code.
Comment on first principles:
Following the dictum that you can't prove the absence of bugs, only their presence, the idea of what constitutes "working code" deserves much more respect.
From an engineering perspective, either you understand the implementation or you don't. There's no meaning to iteratively loop of producing working code.
Stepwise refinement is a design process under the assumption that each step is understood in a process of exploration of the matching of a solution to a problem. The steps are the refinement of definition of a problem, to which is applied an understanding of how to compute a solution. The meaning of working code is in the appropriateness of the solution to the definition of the problem. Adjust either or both to unify and make sense of the matter.
The discipline of programming is rotting when the definition of working is copying code from an oracle you run it to see if it goes wrong.
The measure of works must be an engineering claim of understanding the chosen problem domain and solution. Understanding belongs to the engineer.
LLMs do not understand and cannot be relied upon to produce correct code.
If use of an LLM puts the engineer in contact with proven principles, materials and methods which he adapts to the job at hand, while the engineer maintains understanding of correctness, maybe that's a gain.
But if the engineer relies on the LLM transformer as an oracle, how does the engineer locate the needed understanding? He can't get it from the transformer: he's responsible for checking the output of the transformer!
OTOH if the engineer draws on understanding from elsewhere, what is the value of the transformer but as a catalog? As such, who has accountability for the contents of the catalog? It can't be the transformer because it can't understand. It can't be the developer of the transformer because he can't explain why the LLM produces any particular result! It has to be the user of the transformer.
So a system of production is being created whereby the engineer's going-in position is that he lacks the understanding needed to code a solution and he sees his work as integrating the output of an oracle that can't be relied upon.
The oracle is a peculiar kind of calculator with a unknown probability of generating relevant output that works at superhuman speeds, while the engineer is reduced to an operator in the position of verifying that output at human speeds.
This looks like a feedback system for risky results and slippery slope towards heretofore unknown degrees of incorrectness and margins for error.
At the same time, the only common vernacular for tracking oracle veracity is in arcane version numbers, which are believed, based on rough experimentation, to broadly categorize the hallucinatory tendencies of the oracle.
The broad trend of adoption of this sketchy tech is in the context of industry which brags about seeking disruption and distortion, regards its engineers as cost centers to be exploited as "human resources", and is managed by a specialized class of idiot savants called MBAs.
Get this incredible technology into infrastructure and in control of life sustaining systems immediately!
ijustlovemath
2 days ago
How much do you typically pay in a month of tokens?
skybrian
2 days ago
What sort of code do you write this way?
sph
2 days ago
Probably nothing a junior programmer wouldn't be able to do relatively easily.
amrrs
2 days ago
Curious why Aider? Why not Cursor ?
evilfred
2 days ago
writing code is the easy part, designing is hard and not LLMable
fragmede
2 days ago
Given how hard we thought programming was a year or two ago, I wouldn't bank my future on design being too hard for an LLM. They're already quite good at helping writing design docs.
bongodongobob
2 days ago
Lol nope. When I'm trying to get it do make something big/complicated I start by telling it it's a software project manager and have me build a spec sheet on the design. Then I hand that off to an architect to flesh out the languages, libraries, files needed etc. Then from that list you can have it work on individual files and functions.
sterlind
2 days ago
I also do OR-adjacent work, but I've had much less luck using 4o for formulating MIPs. It tends to deliver correct-looking answers with handwavy explanations of the math, but the equations don't work and the reasoning doesn't add up.
It's a strange experience, like taking a math class where the proofs are weird and none of the lessons click for you, and you start feeling stupid, only to learn your professor is an escaped dementia patient and it was gobbledygook to begin with.
I had a similar experience yesterday using o1 to see if a simple path exists through s to t through v using max flow. It gave me a very convincing-looking algorithm that was fundamentally broken. My working solution used some techniques from its failed attempt, but even after repeated hints it failed to figure out a working answer (it stubbornly kept finding s->t flows, rather than realizing v->{s,t} was the key.)
It's also extremely mentally fatiguing to check its reasoning. I almost suspect that RLHF has selected for obfuscating its reasoning, since obviously-wrong answers are easier to detect and penalize than subtly-wrong answers.
mjburgess
a day ago
Yip. We need research into how long it takes experts to repair faulty answers, vs. generate them on their own.
Benchmarking 10,000 attempts on an IQ test is irrelevant if on most of those attempts the time taken to repair an answer is long than the time to complete the test yourself.
I find its useful to generate examplars in areas you're roughly familiar with, but want to see some elaboration or a refresher. You can stich it all together to get further, but when it comes time to actually build/etc. something -- you need to start from scratch.
The time taken to reporduce what it's provided, now that you understand it, is trivial compared to the time needed to repair its flaws.
CJefferson
2 days ago
I'm currently teaching a course on MIP, and out of interest I tried asking 4o about some questions I ask students. It could give the 'basic building blocks' (How to do x!=y, how to do a knapsack), but as soon as I asked it a vaguely interesting question that wasn't "bookwork", I don't think any of it's models were right.
I'm interested on how you seem to be getting better answers than me (or, maybe I just discard the answer once I can see it's wrong and write it myself, once I see it's wrong?)
In fact, I just asked it to do (and explain) x!=y for x,y integer variables in the range {1..9}, and while the constraints are right, the explanation isn't.
wenc
2 days ago
I had to prompt it correctly (tell it to exclude x=y case in the x≠y formulation), but ChatGPT seems to have arrived at the correct answer:
https://chatgpt.com/share/66e652e1-8e2c-800c-abaa-92e29e0550...
CJefferson
2 days ago
OK, but at that point you've told it basically everything, and this is a really basic book problem!
As another example I just gave it a network flow problem, and asked it to convert to maximum flow (I'm using the API, not chatGPT).
Despite numerous promptings, it never got it right -- it would not stop putting a limit on the source and sink (usually 1), which mean the flow was always exactly 1, here's the bit of wrong code (it's the last part, it's shouldn't be putting any restrictions on nmap['s'] and nmap['t'], as they represent the source and sink), and I couldn't pursade it this was wrong after several prods:
# Constraints: Ensure flow conservation at each vertex
A_eq = np.zeros((len(namelist), num_edges))
b_eq = np.zeros(len(namelist))
for i, (u, v, capacity) in enumerate(edges):
A_eq[nmap[u], i] = 1 # Outflow from u
A_eq[nmap[v], i] = -1 # Inflow to v
# Source 's' has a net outflow, and sink 't' has a net inflow
b_eq[nmap['s']] = 1
b_eq[nmap['t']] = -1
wenc
2 days ago
Sure, but that is nature of LLM prompting. It does take some doing to set up the right guardrails. It's still a good starting point.
Also a trick when the LLM fights you: start from scratch, and put guardrails in your initial prompt.
LLM prompting is a bit like gradient descent in a bumpy nonconvex landscape with lots of spurious optima and saddle points -- if you constrain it to the right locality, it does a better job at finding an acceptable local optimum.
CJefferson
2 days ago
I think this is just a case of different people wanting to work differently (and that's fine).
I can only tell this is wrong because I fully understand it -- and if I fully understand it, why not just write it myself rather than fight against an LLM. If I was trying to solve something I didn't know how to do, then I wouldn't know it was wrong, and where the bug was.
wenc
2 days ago
That's true, except an LLM can sometimes propose a formulation that one has never thought of. In nuanced cases, there is more than one formulation that works.
For MIPs, correctness can often (not always but usually) be checked by simply flipping the binaries and checking the inequalities. Coming up the inequalities from scratch are not always straightforward so LLMs often provide good starting points. Sometimes the formulation is something specific from a paper that that one has never read. LLMs are a way to "mine" those answers (some sifting required).
I think this the mindset that is needed to get value out of LLMs -- it's not about getting perfect answers on textbook problems, but working with an assistant to explore the space quickly at a fraction of the effort.
l33t7332273
2 days ago
I an also working in OR and I have had the complete opposite experience with respect to MILP optimization(and the research actually agrees; there was a big survey paper published earlier this year showing LLMs were mostly correct on textbook problems but got more and more useless as complexity and novelty increased.)
The results are boiler plate at best, but misleading and insidious at worst, especially when you get into detailed tasks. Ever try to ask a LLM what a specific constraint does or worse ask it to explain the mathematical model of some proprietary CPLEX syntactic sugar? It hallucinates the math, the syntax, the explanation, everything.
wenc
2 days ago
Can you point me to that paper? What version of the model were they using?
Have you tried again with the latest LLMs? ChatGPT4 actually (correctly) explains what each constraint does in English -- it doesn't just provide the constraint when you ask it for the formulation. Also, not sure if CPLEX should be involved at all -- I usually just ask it for mathematical formulations, not CPLEX calling code (I don't use CPLEX). The OR literature primarily contains math formulations and that's where LLMs can best do pattern matching to problem shape.
Many of the standard formulations are in here:
https://msi-jp.com/xpress/learning/square/10-mipformref.pdf
All the LLM is doing is fitting the problem description to a combination of these formulations (and others).
l33t7332273
2 days ago
I was referring to section 4 of A Survey for Solving Mixed Integer Programming via Machine Learning(2024): https://arxiv.org/pdf/2401.03244.
I’ve heard (but not so much observed) that there is substantial difference between recent models, so it’s possible that they are better than when this was written.
Anyways, CPLEX has an associated modeling language that features syntactic sugar which has the effect of providing opaqueness to the underlying MILP that it solves. I find LLMs essentially unable to even make an attempt at determining the MILP from that language.
PS: How is Xpress? Is there some reason to prefer it to Gurobi or Mosek?
wenc
2 days ago
Thanks for sharing that, I appreciate it. It looks like they used open-source Llama models which are not great. I tested these models offline using Ollama and outside of being character chat bots, they weren't very good at much (the only models that give good answers are Sonnet 3.5 or ChatGPT 4). However the paper's conclusion is essentially correct even for state-of-the-art models:
"Overall, while LLM made several errors, the provided formulations can serve as a starting point for OR experts to create mathematical models. However, OR experts should not rely on LLM to accurately create mathematical models, especially for less common or complex problems. Each output needs to be thoroughly verified and adjusted by the experts to ensure correctness and relevance."
I wouldn't recommend anyone inexperienced to use LLMs to create entire models from scratch, but rather use LLMs as a search tool for specific formulations which are then verified and plugged into a larger model. For this, it works really well and saves me a ton of time. As MIP modeler, I have an intuition on the shape of the answer, so even if ChatGPT makes mistakes, I know how to extract the correct bits and it still saves me a ton of time.
The CPLEX API doesn't have a lot of good examples out in the wild, so I don't expect the training to be good. I've always used CPLEX through a modeling language like AMPL, and even AMPL code is rare so I can't expect an LLM to decipher any of it. On the other hand, MIP formulations abound in PDFs of journal publications.
In the vibes department, I feel Xpress is second to Gurobi and CPLEX and it does the job just fine. But it's been a while since I used CPLEX and Gurobi so I have no recent points of comparison (corporate licensing is prohibitively expensive).
marmakoide
2 days ago
I had the same experience with computational geometry.
Very good at giving a textbook answer ("give a Python/ Numpy function that returns the Voronoi diagram of set of 2d points").
Now, I ask for the Laguerre diagram, a variation that is not mentioned in textbooks, but very useful in practice. I can spend a lot of time spoon-feeding the answer, I just have the bullshiting student answers.
I tried other problems like numerical approximation, physics simulation, same experience.
I don't get the hype. Maybe it's good at giving variations of glue code ie. Stack Overflow meet autocomplete ? As a search tool it's bad because it's so confidently incorrect, you may be fooled by bad answers.
CamperBob2
2 days ago
But many of the low-effort comments seem to mostly fall into (1) and (2) -- cynicism rather than cautious optimism.
One good riposte to reflexive LLM-bashing is, "Isn't that just what a stochastic parrot would say?" Some HN'ers would dismiss a talking dog because the C code it wrote has a buffer overflow error.
Workaccount2
2 days ago
It's understandable that people whose career and lifelong skill set that are seemingly on the precipice of obsolescence are going to be extremely hostile to that threat.
How many more years is senior swe work going to be a $175k/yr gig instead of an $75k check-what-the-robot-does gig?
CamperBob2
a day ago
It depends. If you got into computing because it seemed like the most lucrative career choice to which you might be suited, then yes, I can imagine feeling threatened. Bummer, sucks to be you. But if you got into computing because it seemed like the most interesting thing available to work on, then no, I can't imagine not being fascinated by, and supportive of, the progress being made in the ML field today. Any hostility you feel should be directed at the people who want to lock it all up behind legislative, judicial, or proprietary doors.
In my case, it's all I can do not to walk away from everything else I'm doing to follow this particular muse. I don't have a lot of sympathy for my colleagues who see it as a threat. If you're afraid of new ideas, technologies, and methodologies, you picked the wrong line of work.
jazzyjackson
2 days ago
Id rather live in the world without talking dogs if their main utility is authoring buggy code
airstrike
2 days ago
It also doesn't help that Lean has had so many breaking changes in such little time. When I tried using GPT-4 for it, it mostly rendered old code that would fail to run unless you already knew the answer and how to fix it, which basically made it entirely unhelpful.
RayVR
a day ago
I’m amazed you have had any luck with 4o. I found 4 was much better than 4o but still quite bad.
I tried to use 4/4o for a MIP several months ago. Frequently, it would iterate through three or four bad implementations over and over.
Claude 3.5 has been a significant improvement. I don’t really use chatgpt for anything at this point.
benterix
2 days ago
> people who complain on HN that (paid/good - only Sonnet 3.5 and GPT4o are in this category)
Correction: I complain that the only decent model in "Open"AI's arsenal, that is GPT-4, has been replaced by a cheaper GPT-4o, which gives subpar answers to most of my question (I don't care it does it faster). As they moved it to "old, legacy" models, I expect they will phase it out, at which point I'll cancel my OpenAI subscriptions and Sonnet 3.5 will become the clear leader for my daily tasks.
Kudos to Anthropic for their great work, you guys are going in the right direction.
bongodongobob
2 days ago
Nah, o1 is fucking impressive. It's really fucking good. I'm guessing you haven't used it yet.
benterix
a day ago
I used it and was really disappointed, maybe because of the hype. It generates a long page of entries that I used to generate in the past, often with better results. Note I use it for code generation, not for problem solving.
So I cancelled two of my 3 subscriptions as I realized OpenAI goes in a direction that is not useful for me at all. Claude, on the other hand, is incredibly useful.
EvgeniyZh
2 days ago
There is ~3 order of magnitude more Python code in the internet than Lean code (200GB vs 200MB in the stack v2). You can't tune it "the same way"
agumonkey
2 days ago
Fair point but a lot of python code is redundant and low quality.
Davidzheng
2 days ago
I'm not sure the lean coverage of pure math research is that much (maybe like 1% is represented on mathlib). But I think a system like alpha proof could even today be useful for mathematicians--I mostly dislike systems like o1 where they confidently say nonsense with such high frequency. But i think value is already there.
lanstin
2 days ago
The point about using lean is you don't have to trust you can verify.
Davidzheng
2 days ago
no I agree I just don't think existing Lean codebase is approaching useful coverage. Should change soon
lanstin
2 days ago
I keep asking people in my department about using lean but zero interest so far.
po76
2 days ago
Give it a few months. ChatGPT will be recommending GPTs to use or do it automatically.
Nothing is static in the way things are moving.
riffraff
2 days ago
_can_ GPT be tuned more heavily on Lean? It looks like the amount of python code in the corpus would outnumber Lean something like 1000:1. Although I guess OpenAI could generate more and train on that.
agumonkey
2 days ago
side question, are there good OR websites / platforms (reddit, mastodon) to get involved in the field ?
thelastparadise
2 days ago
> but for someone who can and does, the $20/month I pay for ChatGPT more than pays for itself.
Would you be willing to pay even more, if it meant you were getting proportionally more valuable answers?
E.g. $200/month or $2,000/month (assuming the $2,000/month gets into employee/intern/contractor level of results.)
This might drive a positive feedback loop.
eab-
2 days ago
Why do you expect GPT being tuned on Lean will help it for research-level math?
threeseed
2 days ago
> side
Or (4) LLMs simply do not work properly for many use cases in particular where large volumes of trained data doesn't exist in its corpus.
And in these scenarios rather than say "I don't know" it will over and over again gaslight you with incoherent answers.
But sure condescendingly blame on the user for their ignorance and inability to understand or use the tool properly. Or call their criticism low-effort.
lanstin
2 days ago
Yeah I have been using them to help with learning graduate maths as a grad student. Claude Sonnet 3.5 was unparalleled and the first quite useful one. GPT4o preview seems about equal (based on cutting and pasting the past six months of prompts into it).
andrepd
a day ago
I take cynicism over unbridled optimism. People speak as if we were on the cusp of technological singularity, but I've seen nothing to indicate we're not already past the inflection point of the logistic curve, and well into diminishing returns territory.