Insanity
11 days ago
I wonder how much the 'inflection point' is a thing vs marketing. I'm sure the models got somewhat better, but even now when I'm trying to 'vibe code' a game with the latest models (combination of Codex w/ gpt5.5 and gpt5.3-codex), they really do struggle.
They definitely get something barebones up and running, but it's far from a fully fledged application.
kvakkefly
11 days ago
I remember this very clearly myself. Before opus 4.5, I was doing a lot of hand holding and was coding a lot myself, but I have not written code since that day more or less.
I did write some stuff myself just to learn how the enigma encryption machine worked, so wrote myself to learn. But professionally, I stopped coding in November.
krzyk
11 days ago
It is sad. I like programming, if I couldn't do it and had to write text (which I do hate, I'm not a writer) it would be make quite a sad world.
bloppe
11 days ago
A pattern I've settled into is to write code but leave a TODO for every narrow thing I want the LLM to do for me. Then just tell the agent to fix the todos. It's often faster and easier to give "instructions" this way
krzyk
9 days ago
I never thought about that, good idea.
yen223
11 days ago
Nothing stopping you from doing that in a post-LLM world
satvikpendem
11 days ago
Of course you can always program by hand, no one is stopping you.
junga
11 days ago
Not sure this is true for all of us. I bet many/some (unsure here) are told to use ai for their daily programming tasks.
chasd00
11 days ago
“A tool so good its use is mandatory” :)
I actually use claudecode a lot, where it works it works very well for me.
LtWorf
11 days ago
Plenty of companies are forcing the use of AI to people.
tonyedgecombe
11 days ago
In most cases you could work around that. For instance write the code yourself and make the AI write the tests. Or keep it busy writing superfluous documentation. Very few people are micromanaged to the extent that they can’t subvert the system.
AussieWog93
11 days ago
Exact same experience here. Prior to Opus 4.5 I'd sometimes use AI for some frontend webdev stuff (I am a C/C++/Python programmer; my HTML/CSS/JS knowledge is probably on par with a first-year uni student) and I'd have to manually edit things and retry, tell it not to attempt a paradigm that had failed before or cycle between models in Cursor just to try and get one that could make a simple widget that worked properly.
Now, I'm using Claude or Codex (GPT-5.5) for frontend and backend and it just gets it right first time more often than not. I've been making use of things like LSPs, Context7 and CLAUDE.md (global and per-repo) and it just stops doing the dumb LLM things that I hate.
viccis
11 days ago
How do you justify your salary given that you're just using a tool that any of us could use for $20 an hour in your role?
rafaelmn
11 days ago
How do you justify your salary given that you're just using OSS compiler/editor any of us could use for free in your role ?
AI just changed how I edit code - I still see coworkers (senior developers) failing with Claude/Codex and get stuck when there are trivial solutions if you understand the full problem space. Right now AI is just a productivity tool.
manmal
11 days ago
Can you share how you use it to edit code? I‘ve seen a couple approaches, curious what you are doing:
1. Spec -> plan -> code (all agent driven, maybe with grill-me or ultraplan)
2. Handwritten spec -> agent driven plan -> agent driven code
3. Agent driven spec -> vibed code -> Fix by handholding until ok-ish
4. Vibed throwaway prototypes -> extract useful patterns -> rewrite with handholding
5. Generate file structure with handholding -> manual TODO comments -> Fill in blanks with handholding
rafaelmn
11 days ago
Usually I describe the problem, explore a bit with LLM iteratively. Then I switch to creating a plan when I have enough insight (and the LLM has it in context/same session as exploration), specifying all the things I'm trying to accomplish.
Then I just iterate with LLM - I let it start writing stuff in YOLO mode and check on what it's doing in the code steering it in the direction I want.
Usually the code LLM generates will work but is kind of garbage - but I can easily steer it towards better implementations.
Sometimes using an LLM is theoretically slower than hand-rolling - if I just sat down and focused I could outperform the iteration and the waiting, especially considering how stupid agents are at running expensive builds/test suites (with a bunch of explicit instructions in skills/claude/agents.md). But the practical improvement of going with LLM is that you have a bunch of thinking traces saved as a part of your iteration proces - it's really easy to get back into flow. This is a huge productivity win for me given how many interruptions I have in my work day. Like so many people like to point out - writing code ends up being less and less of your time as you level up in your career.
manmal
11 days ago
Thank you for elaborating. Really interesting.
peepee1982
11 days ago
I don't feel the need to justify my salary, since I'm simply lucky in that regard. But I'm pretty sure you couldn't do my job just because you had access to a coding agent. Most of my time at the office is spent discussing high-level architecture and strategy, ideas, customer requests, backward compatibility, safety, security, quality assurance, etc.
Writing the actual code is a significant part of that, but the codebase is so complex that even Opus 4.7 and GPT-5.5 struggle with it without being fed a *lot* of context and constraints. And even then, they need a *lot* of steering due to making bad decisions that only someone with an intimate knowledge of the theory behind our software is able to catch.
I can only assume that people who think coding agents can completely replace an actual developer mostly deal with trivial software regarding both scope and the type of customers they serve (individuals instead of big companies in industry).
altmanaltman
11 days ago
They're using a tool that anyone can use for $20 an hour, sure. But that's not what they're "just" doing. This is what is so insane about non-technical people talking about code - writing the actual syntax is not really the hard part.
What you're saying is like "how do you justify your salary as a NASA engineer when anyone can use Simulink and generate the code?"
It is extremely ignorant.
musebox35
11 days ago
Please see Ben Evans’ podcast on a good take on this. Coding is just one of the task you do in your job, it is not the job or at least it probably is not. You do not get paid to code, you get paid to make a set of decisions that create value to the company. If this is automated then yes sadly your salary is not justified.
Timwi
11 days ago
> Coding is just one of the task[s] you do in your job
But it's by far the most fun part and the only reason to take such a job...
peepee1982
11 days ago
I agree, but the reality is that most people work to make a living, not to have fun. If you enjoy your job because you mostly get to write code in a tight feedback loop instead of doing the "hard" work of planning, writing and reviewing specs, balancing customer requirements, and the lot, you have a very privileged life. And those jobs are probably going to get fewer now.
It's kind of sad. But on the other hand, I am glad I don't have to write every little line of code myself *on top* of having to do all the other stuff.
musebox35
11 days ago
I totally agree. I loved coding because of its closed feedback loop. Since last November, I also delegated it mostly to agents. Now I concentrate more on the design part, which is not the same. However, you move with the times and hope something else will become exciting. I do not know a more worthwhile and satisfying way than computing to spend my work hours.
OakNinja
11 days ago
To me, LLM's free up time for me so that I can spend time on the fun parts of coding. Less boilerplate, more focus on the interesting problems. This is no different from using high level languages. The problem domain is less around memory management and garbage collection and closer to the problem you're actually trying to solve.
dawnerd
11 days ago
But we’ve had tools to automate out the boilerplate for years. We don’t need ai for that. It’s seriously like we all forgot we could run one command and scaffold a project. AI isn’t even that great at it. Last I tried a month ago it used a really out of date version of nextjs and picked all sorts of random deps that weren’t in the plan.
I could have just used the next project scaffold tool and been on my way before the ai even started returning output.
skydhash
11 days ago
Or copy paste another file and edit the 10 lines that are actually different. The nice thing is that when you have an epiphany that you’ve already done this twice and that it’s for the same purpose, so you abstract the code and remove 100 lines from the project.
mekael
11 days ago
You have no idea how many times I’ve asked “why are we not using the project generator” or “why did you write 200 lines to parse a csv? Here’s a library and five lines to get it done” in the last year. Its easily up 20x compared to pre ai, and getting worse.
stepbeek
11 days ago
I agree with this. I feel like there’s a false dichotomy right now in a lot of these discussions where one can only vibe code or only code by hand. It is possible to do both…
BOOSTERHIDROGEN
11 days ago
Which episode ?
musebox35
11 days ago
I watched the last one S5:E17 What jobs are AI jobs and I think it gives the right framing to think about this. It is not prescriptive, it does not give a list which is smart. The job title might be the same but the actual role might have different context so the best is to have the right frame to explore your particular situation.
pastel8739
11 days ago
How do you justify your salary given that you sit in a chair all day, likely making the world worse, and make 5x as much as someone saving lives, building houses, or teaching kids how to read?
IshKebab
11 days ago
Supply and demand. Not many people are good at programming and it's highly in demand.
The question is how many people will be good at vibe coding? If the answer is "lots" then we can definitely expect programming salaries to return to "normal" levels. His question is very relevant; you can't dismiss it as easily as that.
apsurd
11 days ago
it can be easily dismissed because "anyone can use the tool that costs $20" makes no meaningful sense.
this was always true in fact $20 is more than the free it costs for notepad++
it's a flippant statement. Go down the line of any tool; it's cost has basically nothing to do with skill difference to operate it. See basically everything. There's levels.
IshKebab
11 days ago
I have no idea what you're trying to say. If anyone really can vibe code then programming salaries are pretty much guaranteed to come down. The critical question is whether it really is true that anyone can do it, or if it still requires rare skill.
apsurd
11 days ago
are you a programmer? it 100% requires skill. AI or not.
i'm trying to say there's levels to this. if you don't agree then you don't agree. but i can buy commodity tools for any skill and that doesn't make me professional grade at that skill.
IshKebab
11 days ago
Yes I am. The vibe coding I've tried didn't work very well so I agree that still required my skill. But I also don't have access to the latest models and supposedly they're a lot better (see this article for example!).
So is it possible for non-programmers to vibe code if they have the latest models? If not now, what about in a few years?
AI is clearly a different class of tool to something like a welder.
mianos
11 days ago
Never to feed the trolls ... but, how does my carpenter deserve $100 an hour when he is using an electric drill and power saw I can get at Home Deepo for $100 bucks?
Most good developers are not employed because just because they can code well.
What is over is: fizzbuzz and trivial CS algorithm regurgitation as a gate.
aspenmartin
11 days ago
Someone competent using them is today a requirement and for awhile will make the marginal utility of skilled workers greater than that of unskilled. The justification is that they are much more productive than they were before.
yieldcrv
11 days ago
no engineers on staff and stakeholders think the company is incompetent
Coinbase is paying the price for that for every UX glitch, after the CEO was gleeful about HR personnel shipping production code
MikeNotThePope
11 days ago
You can build things quickly with AI, but you can’t delegate your responsibilities to AI. Once the AI starts struggling, you’ll need to takeover and figure it out.
skor
11 days ago
This is _the_ question we must all be able to answer, so here goes my attempt - we all have access to the same tools, before stackoverflow it was forums, books/manuals, so its always been about “getting there, showing up, figuring it out” your hypothetical boss has other things to do than kick a LLM around at that price
piva00
11 days ago
I don't think you understand how programming as a job works, writing code is the final output of the process but it's not the job in itself.
komali2
11 days ago
There is no good justification for anyone's salary really, except perhaps doctors and underwater welders.
bsder
11 days ago
Because the tool will happily give you a "solution" that kinda works for a few inputs. It will happily correct itself when you give it more incorrect tests.
It will almost never converge on the general solution that will pass tests you haven't given it yet.
This is why AI is sooo good at Javascript and related slop. A solution that "kinda works" is good enough 9 times out of 10 and if some tests fail well ... YOLO and the web page will probably render anyway.
Contrast that to using Scheme or Lisp where AI will have trouble simply keeping the parentheses balanced.
FeepingCreature
11 days ago
To be fair, take away a human's paren highlighting and see how well they do.
dkersten
11 days ago
While I certainly like parentheses highlighting and rainbow parentheses, I've programmed Clojure without syntax highlighting and while it’s not as nice as it would be with, it’s fine.
I’ve also written C++ and Java in Notepad long ago. Not ideal, but hardly a problem.
sampullman
11 days ago
You adjust pretty quickly. Taking away compiler error messages would be fun though.
user
11 days ago
hansmayer
11 days ago
Not everyone is a "coder" you know, some of us are engineers.
NobleLie
10 days ago
With very different final results though..?
wilg
11 days ago
They don't need to justify it!
troupo
11 days ago
> Before opus 4.5, I was doing a lot of hand holding and was coding a lot myself, but I have not written code since that day more or less.
I still must hand hold it every day, as it always does things wrong. Especially after it got seriously nerfed in March.
Note: experiences vary a lot depending on the programming language used, and projects. And the experience of the person coding.
jackzhuo
9 days ago
Same experience here. I now think AI writes much better code than me. So I shifted my focus to finding requirements, analyzing possibilities, and making good plans.
szundi
11 days ago
[dead]
bluegatty
11 days ago
Paradox - you can get multiple inflection points even as systems start to have dimishing marginal returns in core capability, I think this is due to 'threshold crossing' where something 'becomes good enough for a specific purpose' - it just unlocks capabilities.
'Nail Guns' used to be heavy, required heavy power cords, they were extremely expensive. When they got lighter, cheaper, battery pack ... at some point, they blend seamlessly into the roofers process, and multiply dramatically the work that can be done. Marginal improvements beyond that may not yield the same 'unlocks' because the threshold has been crossed.
asdff
11 days ago
Nitpick but commercial roofers prefer pneumatic over battery.
smackeyacky
11 days ago
This is a great analogy. Jan/Feb this year was when the models crossed from useful to essential.
szundi
11 days ago
[dead]
magicalhippo
11 days ago
I've "vibed" some non-trivial stuff lately using a combination of Codex with 5.5 and Claude Code with Opus 4.7.
Key has been to spend a fair amount of time on initial overall design document, which is split into tangible and limited phases. I go back and forth between them on this document until we're all happy.
For each phase an implementation plan is made. At the end, a summary document of what was delivered and what was discovered. This becomes input to next phase.
I do check the documents, and what they're doing. I also check the tests, some more thorough. And some spot checks on the code to see if I like the structure.
I have mainly used Claude for coding and Codex for design and code review after phases. I ask both to check test coverage after phases.
Managed to implement some tools and libraries without writing a single line of code this way, which have been very beneficial to us.
Since it's so async I can work on other stuff while they plod along.
I think it's not universal though. But stuff that can be tested easily and which you have a firm grasp of what you want to achieve, but not necessarily exactly how, that I've been impressed with.
WesolyKubeczek
11 days ago
> Key has been to spend a fair amount of time on initial overall design document, which is split into tangible and limited phases.
> For each phase an implementation plan is made. At the end, a summary document of what was delivered and what was discovered.
> I do check the documents, and what they're doing. I also check the tests, some more thorough.
Sounds like programming, but with extra steps.
magicalhippo
11 days ago
It's software development, but with much less actual programming (in my case none).
When I said I check the documents, the initial design document was the only I really took a hard look at. The intermediary I just skimmed, looking for red flags or something I had forgotten to tell them. Those documents served as a basis for their work, and as a record of what was done.
Overall I spent perhaps a few hours on each project, over the course of a few days. I'd check in every half hour or whenever I had time, tell Claude "Great, let's do the next deliverable", or GPT "We're done with phase 4, please do a detailed code review, reference the design document and documentation of previous phases". Then I'd leave them cooking.
dawnerd
11 days ago
Also the least fun part of development. Maybe I’m the weird one but I like to just jump right in, planning every last detail before writing code is boring.
magicalhippo
11 days ago
For me, the fun in programming is sometimes to actually write code, solving a problem in a specific way or try some new approach. Other times the fun is to create something that works, and the code is more a means to an end.
The first case I'll probably still do by hand, like handmade vases despite factory made are cheap and readily available.
For the second case I think these newfangled tools have made it even more fun, since writing lots of boiler plate, repetitive event handles and whatnot is not my idea of fun.
skydhash
11 days ago
> I think these newfangled tools have made it even more fun, since writing lots of boiler plate, repetitive event handles and whatnot is not my idea of fun
That’s what code generators, snippets plugins, macros, and the old copy-paste are here for. I wonder if you were using notepad to code. Because even nano had macros.
magicalhippo
11 days ago
Those tools only get you so far, especially if you write something novel to you. Using a new framework or programming language say.
Sometimes using a new framework or programming language is the fun part.
But sometimes it's just the best way of solving a problem incidental to the fun part.
One of the two projects I vibed included a web frontend. I didn't touch a single line of HTML, CSS or JavaScript of the frontend. And I didn't touch the API on the backend. I'm not a web dev, so this isn't something I've got snippets for or whatever, and in this case wasn't the interesting part.
The interesting part for me in that case was making a tool that could help us, not the details how exactly how that was done.
QuercusMax
10 days ago
For someone who did a lot of webdev 20 years ago and hasn't done much in the last decade, and does mostly backend development now, being able to vibe code up a quick web or text-mode UI is killer. It might look like crap and not be very maintainable, but who cares - it's a temporary dashboard we'll throw away next quarter when we're done with our migration.
skydhash
11 days ago
> The interesting part for me in that case was making a tool that could help us, not the details how exactly how that was done.
And I wouldn’t argue about the economics of getting a MVP out. But with software, you often got one happy path and myriads way of getting into an incoherent state (and crashing early would be a boon in this case) and/or returning the wrong response. When you care about failure, you also care that your code is semantically right. The devil is very much in the details, especially if you have N>1 users.
Getting thing dones for me include a high confidence that the code will do the right thing. And that’s means reviewing each line and checking the semantics (only when it’s a few line of code) or building a test harness and making sure I handle contracts and invariants.
Snippets, Code Generators, and Copy-Paste gives me sample that I can trust, although I may need to edit. But LLM doesn’t. And I’m doubly doubtful when it’s something I’m not familiar with.
mrcsharp
11 days ago
> planning every last detail before writing code is boring
Not only that but you can't really plan everything. It is impossible. Without LLMs, with every line of code you are making a decision or discovering something new that must be dealt with or realizing how the current thing might impact something else and so on.
There is no way for a programmer to consider all of these little things ahead of time and if an attempt is made, it will take as long as actually writing that code.
magicalhippo
11 days ago
> Without LLMs, with every line of code you are making a decision or discovering something new that must be dealt with or realizing how the current thing might impact something else and so on.
Part of this is true, part of it the agents catch at least a non-trivial portion of. If you prompt it to do a review, especially with a specific angle like ensuring sustained write performance, or how it will work when the future extensions are implemented, they do often catch a lot of issues.
I agree you lose a fair bit of the sense of "it feels like I'm doing something wrong", or "this doesn't seem optimal" etc. I think the skill in using these tools is to determine when you need that control and where it doesn't really matter.
manmal
11 days ago
That’s not vibing, but waterfall development.
whatshisface
11 days ago
Waterfall was famous for wasting developer time and extending delivery dates in exchange for simplifying management. If Claude time is comparatively inexpensive, but human oversight remains necessary, we will switch back to waterfall because the relative importance of the two resources will invert.
magicalhippo
11 days ago
It's vibing in the sense that I'm not really writing code, and I'm leaving a lot of decision to the models. I let them drive a lot of the design document details, I just made sure it contained the salient points. Implementation plans I just skimmed. Didn't write any code, just did some checks here and there.
But yes, I did think that it sorta felt like being a team lead for some eager programmers.
nopurpose
11 days ago
Do you use anything to orcheatrate multiple agent pitted against each other (coder, reviewer, tester, etc)?
magicalhippo
11 days ago
Currently just manual. I'm not pushing the frontier here, just getting my feet wet.
While both Claude Code and Codex are capable harnesses, I definitely think there's a lot more to be gained from the harnesses. Quite a few of the times I needed to nudge the steering wheel it was things that a separate agent with the right prompt could have picked up on.
variodot
11 days ago
[flagged]
nothinkjustai
11 days ago
None of it is non-trivial tho. You might think so, but it’s not.
magicalhippo
11 days ago
It wasn't trivial in that I used a lot of my programming and domain knowledge, both when iterating on the design document and skimming implementation plans.
I didn't use it often, but when it was needed it was needed.
ryanjshaw
11 days ago
I find it gets you past the starting line but when you dig into the code it’s a mess of duplicated code, muddled responsibilities, poor architecture, 10k line files that eat your tokens, etc.
I’m building something using LLMs to scrape websites/socials for unstructured event data from combined text/images and the only way I’ve managed to get 100% consistent results for a reasonable cost is to break the task down into very small pieces that reduce the scope of mistakes significantly.
At present, for reasonable complex tasks, Codex/Claude will happily code you into an expensive corner.
ben_w
11 days ago
Indeed. To add to this, the obvious solution (ask the AI to break down the tasks to whatever METR says they'd be capable of 80% of the time) is of limited utility, as the AI are only so-so at estimating task complexity.
(Even when they're getting the planning part right, I do also recommend checking the LLM-generated unit tests, because in my experience some of those are "regex the source code" not "execute functions and check outputs").
minimaxir
11 days ago
Opus 4.5 in November 2025 was legitimately, unironically an inflection point and is the sole reason for the current hysteria.
GPT 5.5 is a significant improvement over GPT 5.4 but I wouldn't call it an inflection.
baq
11 days ago
5.2 and the first codex model were step function changes in capability
halflife
11 days ago
I feel the change. It went from an autocomplete tool, to an agent running 5 tasks in parallel while I just supervise. The improvement is enormous.
orrito
11 days ago
While some people got it to work better, for me vibe coding games still didn't reach the point of regular sites/web apps. Physics, creativity, assets and UI/UX still need a lot of hand handholding with the models. Games that are more interface based like point and click or something like reigns are easier though
adgjlsfhk1
11 days ago
It's very real. Just in the past 2 months or so IMO there's been a pretty big improvement in claude for local dev (although I think a lot of that is less model strength and more harness capability). 1m context is a huge difference (~30 min vs 2.5hr between compact significantly increases the scope of what I get the AI to do before it goes stupid). The other biggest difference I've noticed is a better balance of actually doing the work vs pushing back on bad ideas. I want the AI to tell me if it thinks the thing I am telling it is wrong or a bad idea, but if I confirm, I want it to do that anyway. A couple months ago, the claude was a lot more likely to either say "This is too much work I'm not going to do all of it", tell me the idea was genius (and then pretend to do it) or something equally useless.
DeathArrow
11 days ago
>1m context is a huge difference (~30 min vs 2.5hr between compact significantly increases the scope of what I get the AI to do before it goes stupid)
I think the smart zone stays within the first 100k tokens, no mater if the context window is 240k or 1 million.
I divide the work to fit within that 100k and use subagent for the tasks.
danielbln
11 days ago
In my experience it's more like 400-500k tokens.
ReptileMan
11 days ago
Anecdata of 1 but it is real. At the end of last year they passed some invisible threshold and became useful. I don't think it is models themselves, but mostly the much more powerful harnesses and I guess their tool calling abilities.
What changed I think was the context harvesting capability of the models. What most programmers did was - debugging and figuring out how something works were the time consuming part - the fix was usually trivial. And now models could do in seconds what took a developer hour or more.
If right now we create a smart grep that just takes everything for a piece of code and outlaw llm-s we will not regress to the previous level. The developers needed this context as much as llm-s to do their job.
iLoveOncall
11 days ago
It is all marketing. The easiest way to tell is that a year ago the same people said the inflection point was X or Y model.
When people claim LLMs just don't work for them, the first question is whether they're using the latest model or not, and if not, dismissing the poster.
The thing is that that same question was being asked a year ago, and even a year before that, but with the models that lead to a dismissal today.
Just make the experiment yourself, wait 6 months, say LLMs just aren't working for the software engineering that you do, and people will dismiss you if you say that you use Opus 4.5 and not the latest model Claude MegaMind 8.8 pro max gigathinking. Despite this model being touted as the inflection point in this article.
harshitaneja
11 days ago
I think it's because both sides are talking about different things. If you go in expecting it is good enough to make developers obsolete today(reasonable impression to get from the way a lot of people hype it) you would be disappointed and after first couple of tries every few months you would probably not try it much with next generations. Reasonable if it's considered a dichotomy.
But a lot of people excited about new generations(including me, now) are not seeing it as a dichotomy but rather a spectrum where models are getting better and indeed once a year or even 6 months at times there comes a sudden growth which feels like an inflection point from what came before. Practically, it's a tool like any other, you evaluate it based on if it's worth the effort and cost for the benefit you get from it and if it is and has a good DX you use it. If the calculation doesn't work for you, it doesn't. For me, it has gone from a novelty, to good for some kind of quick manual search, to I guess it can debug some kind of errors at times in very specific conditions, to hey I think I am getting a bit addicted to autocomplete in IDE provided by them even if I don't use them for anything intelligent but it's becoming indispensable now but only this part, to it's good for areas I lack expertise in, to agentic sucks I will stick with discussing algorithms and architecture with it on greenfield projects, to holy shit it can do agentic decently well now, I am skeptic to give it access more than in limited cases, to now I am getting close to letting it run free on my device in not so distant future I guess. Some of these were big jumps, at each point I was skeptical of growth. Everytime I thought now the growth will slow down from days 2k context window to millions now. From basic chat completion to working on complex adaptive systems, game theoretic modelling, heurestics and constraint modelling and other things I throw at it. I am still needed in the loop, it can be so smart at times and then will do something so stupid, but the frequency of stupidity is rapidly decreasing. I am still needed, I don't think it could accomplish alone all that it has done for me. But I do at times at night remain awake reflecting on my self worth for the potential day when I don't add that value. When I have a harder time keeping up.
Also had someone told me not in even 2019 that in 2026 we could have NLP models do what they do today, I would have posited it all as sci-fi and here I am waking up in awe of the world we live in and how quickly we adapt.
iLoveOncall
11 days ago
You're completely twisting what I said. I've never talked about people claiming it's not making developers obsolete. We are obviously extremely far from that. I'm talking about people who say it doesn't work to build basic features in their projects correctly.
Just take a look at this comment on a different topic, which lists all the pre-requisite for those AI models to work well, from the perspective of someone who has bought into the hype: https://news.ycombinator.com/item?id=48157235
If this is everything needed for an LLM to generate acceptable code, what is even the point of them?
harshitaneja
11 days ago
Maybe we come from different cultures and context is harder to grasp just in text so maybe for those reasons your response feels ruder than I hope it was intended to be.
I am sorry for not being clear in my response but I didn't intend to twist your words. I am not sure where I did so. My response was intended to be a more general remark on the kind of discourse on this topic I see and that I think both sides are right from the context they are looking in with and also why I think both sides come out of this discussion exhausted of the other. Not discounting presence of bad actors but generally I think there are most engaging in good faith like you are probably.
Coming specifically to respond your last response, I don't think one needs all of these prerequisites to get value out of LLMs. In fact LLMs have helped me untangle some very messy ball of muds on projects where we previously deemed it not worth the effort and basically carried some codebases as legacy. Now we can write enough tests to feel confidence and do a port against those tests all in a span of few days, which we found impressive.
Now having said all this, I think I understand your perspective a bit better on your original comment.
While it's a very versatile hammer, if it doesn't work for your use case that's all great. I just think that a bit more patience though with honing it maybe could help you find areas where it could work for you. If not, cheers!
vikramkr
10 days ago
That's a list of like 6 things. And each of those less complicated a question then the seven thousand questions people throw at you when you complain about something not working right on a Linux distro or about speeding up build times for a new tool or configuring webpack or like pretty much any software tool. What lint rules are you using are you using poetry or uv are you running on Mac windows linux or wsl how are your security groups configured in aws - some tools are more plug and play but it's quite the stretch to say that asking "how is your code organized, do you have your agents.md config file set up, do you have tests, and how large is the codebase" is some sort of unmanageable list of questions for a software engineer to think through when figuring out wtf is going on with some new tooling they're using
vikramkr
10 days ago
My take is there was one big inflection point around opus 4.5 when they got the agentic stuff working and now whether or not it works depends on whether your use case/area of software engineering is profitable enough for the companies to have spent a bunch of money generating synthetic data to RL on, or if it's similar enough to areas that they've done that for. With similar enough being a very loose constraint given how much overlap there is in a lot of coding fundamentals. Tbh if the models aren't working for you now I don't think they're gonna be working for you in 6 months
vikramkr
10 days ago
It's very real but probably very domain specific. It got really good at a lot of traditional web dev stuff, bash, sql, and writing one off scripts to accomplish random tasks (hence all the agent stuff taking off). And they got good at staying on task. That may not translate to game dev because from what I understand a lot of these gains are basically around post training methods driven by synthetic data generation etc (with potential caveats on how synthetic that data actually is lol). I wouldn't be surprised if the areas of code the llms are good at now are straight up just product decisions of where to allocate budget for generating those synthetic data sets, and game dev stuff might not be at the top of the list because the customer base for that might not be as big
sofixa
11 days ago
Counterpoint, I'm also vibecoding a game, and even before doing the "proper" setup (a good AGENTS.md, skills people have published for my chosen game engine, Godot), mechanically, the game was pretty spot on. It looked boring, so I used Claude Design to create a few mockups to choose from, chose the one I liked the most, and told Claude Code to redo the game UI with it.
There have been plenty of small issues like tables not having the columns aligned, or the game menu being a bit offset, or one graph being a placeholder instad of connected to the actual value. And of course I've had to instruct it on all the flavour I want.
But honestly, for a simulation strategy game, especially without doing the "proper" setup from the start, it's been _very_ good.
QuercusMax
10 days ago
UI fit and finish is really hard for these models, even in with text-mode UIs. The super fiddly stuff still needs to be done by hand, at least for now.
DeathArrow
11 days ago
Purely vibe code won't work. You need to define an excellent architecture, have great specs, a solid plan, divide the plan in small phases that fit well in a context window, use TDD and automated code reviews for implementing each phase, do QA and some code review.
At any point you need to have agents review, verify and test the other agents output and iterate until the output is perfect.
And also, have good e2e tests.
IMO, if you don't spend at least a few tens of millions tokens per day, you aren't doing it properly.
fluder_tw
11 days ago
Sounds very self confident to claim such thing. Something like "If you don't do how me is doing, then you are doing it wrong"
ssdspoimdsjvv
11 days ago
At what point is it easier and faster to just code it yourself? I don't trust myself to write better specs than code.
xbmcuser
11 days ago
It's real for me as a non coder previously uploading a python script asking it to add this function or that function used to break it now usually it just works at least with Claude and Chat Gpt models. Google Gemini still breaks stuff but rumors are their new flash model that will be announced soon is very good. I am usually working with data in csv files and generating spreadsheet pdf etc and the results for that has improved dramatically.
Scoundreller
11 days ago
That’s me. Built a scraper do dump stuff to a csv of a list of images for further ocr and openCV processing. Now I have a convenient list of hits once I run the batch that used to be a loooot of manual sifting.
Once I work out the kinks, I’ll be able to further automate it.
Would have taken 10-100x as long for me to build it without AI and the AI version is probably better.
But yeah, I have enough knowledge to know what prompts are needed and figure out those “oh, I think it’s running slow or failing because of xyz” and further prompt to improve it based on that what I think it should do instead.
And I know where to make slight changes without burning my allotments.
LAC-Tech
11 days ago
"flash" or "fast" AI models are worse than useless at coding for me. they make my codebase much worse. It's a maintenance burden.
Gemini Pro on the other hand can be quite a pleasant experience.
righthand
11 days ago
I mean this blog post and many from this author are pure evangelism and marketing. Can you find anything critical or any dissent from this author about LLMs?