wg0
2 days ago
I write detailed specs. Multifile with example code. In markdown.
Then hand over to Claude Sonnet.
With hard requirements listed, I found out that the generated code missed requirements, had duplicate code or even unnecessary code wrangling data (mapping objects into new objects of narrower types when won't be needed) along with tests that fake and work around to pass.
So turns out that I'm not writing code but I'm reading lots of code.
The fact that I know first hand prior to Gen AI is that writing code is way easier. It is reading the code, understanding it and making a mental model that's way more labour intensive.
Therefore I need more time and effort with Gen AI than I needed before because I need to read a lot of code, understand it and ensure it adheres to what mental model I have.
Hence Gen AI at this price point which Anthropic offers is a net negative for me because I am not vibe coding, I'm building real software that real humans depend upon and my users deserve better attention and focus from me hence I'll be cancelling my subscription shortly.
gwerbin
2 days ago
Or just don't use AI to write code. Use it as a code reviewer assistant along with your usual test-lint development cycle. Use it to help evaluate 3rd party libraries faster. Use it to research new topics. Use it to help draft RFCs and design documents. Use it as a chat buddy when working on hard problems.
I think the AI companies all stink to high heaven and the whole thing being built on copyright infringement still makes me squirm. But the latest models are stupidly smart in some cases. It's starting to feel like I really do have a sci-fi AI assistant that I can just reach for whenever I need it, either to support hard thinking or to speed up or entirely avoid drudgery and toil.
You don't have to buy into the stupid vibecoding hype to get productivity value out of the technology.
You of course don't have to use it at all. And you don't owe your money to any particular company. Heck for non-code tasks the local-capable models are great. But you can't just look at vibecoding and dismiss the entire category of technology.
onlyrealcuzzo
2 days ago
> Or just don't use AI to write code.
Anecdata, but I'm still finding CC to be absolutely outstanding at writing code.
It's regularly writing systems-level code that would take me months to write by hand in hours, with minimal babysitting, basically no "specs" - just giving it coherent sane direction: like to make sure it tests things in several different ways, for several different cases, including performance, comparing directly to similar implementations (and constantly triple-checking that it actually did what you asked after it said "done").
For $200/mo, I can still run 2-3 clients almost 24/7 pumping out features. I rarely clear my session. I haven't noticed quality declines.
Though, I will say, one random day - I'm not sure if it was dumb luck - or if I was in a test group, CC was literally doing 10x the amount of work / speed that it typically does. I guess strange things are bound to happen if you use it enough?
Related anecdata: IME, there has been a MASSIVE decline in the quality of claude.ai (the chatbot interface). It is so different recently. It feels like a wanna-be crapier version of ChatGPT, instead of what it used to be, which was something that tried to be factual and useful rather than conversational and addictive and sycophantic.
mlinsey
2 days ago
My anecdata is that it heavily depends on how much of the relevant code and instructions it can fit in the context window.
A small app, or a task that touches one clear smaller subsection of a larger codebase, or a refactor that applies the same pattern independently to many different spots in a large codebase - the coding agents do extremely well, better than the median engineer I think.
Basically "do something really hard on this one section of code, whose contract of how it intereacts with other code is clear, documented, and respected" is an ideal case for these tools.
As soon as the codebase is large and there are gotchas, edge cases where one area of the code affects the other, or old requirements - things get treacherous. It will forget something was implemented somewhere else and write a duplicate version, it will hallucinate what the API shapes are, it will assume how a data field is used downstream based on its name and write something incorrect.
IMO you can still work around this and move net-faster, especially with good test coverage, but you certainly have to pay attention. Larger codebases also work better when you started them with CC from the beginning, because it's older code is more likely to actually work how it exepects/hallucinates.
onlyrealcuzzo
2 days ago
> My anecdata is that it heavily depends on how much of the relevant code and instructions it can fit in the context window.
Agreed, but I'm working on something >100k lines of code total (a new language and a runtime).
It helps when you can implement new things as if they're green-field-ish AND THEN implement and plumb them later.
antonvs
2 days ago
In a well-designed system, you can point an agent at a module of that system and it's perfectly capable of dealing with it. Humans also have a limited context window, and divide and conquer is always how we've dealt with it. The same approach works for agents.
janalsncm
2 days ago
How can a person reconcile this comment with the one at the root of this thread? One person says Claude struggles to even meet the strict requirements of a spec sheet, another says Claude is doing a great job and doesn’t even need specific specs?
I have my own anecdata but my comment is more about the dissonance here.
oefrha
2 days ago
One aspect you have to consider is the differences in human beings doing the evaluation. I had a coworker/report who would hand me obvious garbage tier code with glaring issues even in its output, and it would take multiple iterations to address very specific review comments (once, in frustration, I showed a snippet of their output to my nontechnical mom and even my mom wtf’ed and pointed out the problem unprompted); I’m sure all the AI-generated code I painstakingly spec, review and fix is totally amazing to them and need very little human input. Not saying it must be the case here, that was extreme, but it’s a very likely factor.
rhubarbtree
a day ago
This is plausible. Assuming it’s true, we would see the adoption of vibe coding at a faster rate amongst inexperienced developers. I think that’s true.
A counterpoint is Google saying the vast majority of their code is written by AI. The developers at Google are not inexperienced. They build complex critical systems.
But it still feels odd to me, this contradiction. Yes there’s some skill to using AI but that doesn’t feel enough to explain the gap in perception. Your point would really explain it wonderfully well, but it’s contradicted by pronouncements by major companies.
One thing I would add is that code quality is absolutely tanking. PG mentioned YC companies adopted AI generated code at Google levels years ago. Yesterday I was using the software of one such company and it has “Claude code” levels of bugginess. I see it in a bunch of startups. One of the tells is they seem to experience regressions, which is bizarre. I guess that indicates bugs with their AI generated tests.
SpaceNoodled
a day ago
You don't think Sundar would do that, just go on the Internet and tell lies?
tclancy
2 days ago
This is magical because you are both on the exact right path and not right. My theory is there’s a sort of skill to teasing code from AI (or maybe not and it’s alchemy all over again) and this is all new enough and we don’t have a common vocabulary for it that it’s hard for one person who is having a good experience and one person who is not to meaningfully sort out what they are doing differently.
Alternatively, it could be there’s a large swath of people out there so stupid they are proud of code your mom can somehow review and suggest improvements in despite being nontechnical.
DennisP
2 days ago
I just read Steve Yegge's book Vibe Coding, and he says learning to use AI effectively is a skill of its own, and takes about a year of solid work to get good at it. It will sometimes do a good job and other times make a mess, and he has a lot of tips on how to get good results, but also says a lot of it is just experience and getting a good feel for when it's about to go haywire.
stevenicr
11 hours ago
I think it matters what the project and tech stack is, and how much you try to get done before starting a fresh chat.
I've had interesting chats where it explained that it's choice of tailwind for example was because it had a ton of training knowledge on it.
I've also had it try to build more in one chat than it should many times.
For some reason openai codex handles building too much without failing better - but that is total anecdata from my particular projects and ymmv.
I've had these things try to build big when a little nudge gets them to change direction and not build so much. Explain which libraries and such and asking it to change the tech stack and the steps to build at once seem to make things much better for my use cases.
Also running extra checks and cleanup later is a thing, that sure a human might have seen an obvious thing at time of build, but we have bigger memory context comparatively imho.
sarchertech
2 days ago
One person is rigorously checking to see if Claude is actually following the spec and one person isn’t?
hunterpayne
2 days ago
One is getting paid by a marketing department program and the other isn't. Remember how much has been spent making LLMs and they have now decided that coding is its money maker. I expect any negative comment on LLM coding to be replied to by at least 2 different puppets or bots.
riquito
2 days ago
Then you should expect any positive comment to be replied negatively by a competition's puppet or bot too
SpaceNoodled
a day ago
Not necessarily; rising tide and all that. When a new scam like this emerges, it behooves all of the grifters to cooperate and not muddy the waters with distrust.
sarchertech
a day ago
I’m normally very skeptical of conspiracy theories. But saw an AI booster bot responding to a negative AI post I made here.
Someone pointed out to me in the comments that the username had posted long replies to 3 completely different threads in the same minute. That and looking back at its post history confirmed it was a bot.
flyinglizard
2 days ago
... or one person has a very strong mental model of what he expects to do, but the LLM has other ideas. FWIW I'm very happy with CC and Opus, but I don't treat it as a subordinate but as a peer; I leave it enough room to express what it thinks is best and guide later as needed. This may not work for all cases.
sarchertech
2 days ago
If you don’t have a very strong mental model for what you are working on Claude can very easily guide in you into building the wrong thing.
For example I’m working on a huge data migration right now. The data has to be migrated correctly. If there are any issues I want to fail fast and loud.
Claude hates that philosophy. No matter how many different ways I add my reasons and instructions to stop it to the context, it will constantly push me towards removing crashes and replacing them with “graceful error handling”.
If I didn’t have a strong idea about what I wanted, I would have let it talk me into building the wrong thing.
Claude has no taste and its opinions are mostly those of the most prolific bloggers. Treating Claude like a peer is a terrible idea unless you are very inexperienced. And even then I don’t know if that’s a good idea.
timr
2 days ago
> Claude has no taste and its opinions are mostly those of the most prolific bloggers.
I often think that LLMs are like a reddit that can talk. The more I use them, the more I find this impression to be true - they have encyclopedic knowledge at a superficial level, the approximate judgement and maturity of a teenager, and the short-term memory of a parakeet. If I ask for something, I get the statistical average opinion of a bunch of goons, unconstrained by context or common sense or taste.
That’s amazing and incredible, and probably more knowledgeable than the median person, but would you outsource your thinking to reddit? If not, then why would you do it with an LLM?
prmph
a day ago
> they have encyclopedic knowledge at a superficial level, the approximate judgement and maturity of a teenager, and the short-term memory of a parakeet. If I ask for something, I get the statistical average opinion of a bunch of goons, unconstrained by context or common sense or taste.
Love this paragraph; it's exactly how I feel about the LLMs. Unless you really know what you are doing, they will produce very sub-optimal code, architecturally speaking. I feel like a strong acumen for proper software architecture is one of the main things that defines the most competent engineers, along with naming things properly. LLMs are a long, long way from having architectural taste
flyinglizard
a day ago
Try asking to review your code as if it were Linus Torvalds. No, really.
sarchertech
a day ago
I’ve tried that. I’ve experimented with a whole council of 13 personas including many famous developers. It’s definitely different. But it’s hasn’t performed significantly better in my tests.
datavirtue
a day ago
Holding it wrong.
oops
2 days ago
That’s interesting to hear as for me Claude has been quite good about writing code that fails fast and loud and has specifically called it out more than once. It has also called out code that does not fail early in reviews.
sarchertech
a day ago
If you add a single space to a prompt, you’ll get a completely different output, so it’s no surprise that feeding entirely different programs into the prompt produces radically different output.
My guess is that there must be something about the language(go) or the domain (a data migration tool that uses Kafka) that triggers this.
aforwardslash
2 days ago
Have you created a plan where the requisite is not to bother you with x and y, and to use some predetermined approach? What you describe sometimes happens to me, but it happens less when its part of the spec.
sarchertech
a day ago
Yes. That’s one of the things included in this.
> No matter how many different ways I add my reasons and instructions to stop it to the context
justinclift
2 days ago
> it will constantly push me towards removing crashes and replacing them with “graceful error handling”.
Is it generating JS code for that?
sarchertech
a day ago
No this is a kafka consumer written in go.
flyinglizard
a day ago
You're right, data migration is a specific case where you have a very strong set of constraints.
I, on the other hand, am doing a new UI for an existing system, which is exactly where you want more freedom and experimentation. It's great for that!
mojuba
2 days ago
I think it depends on both the complexity and the quality bars set by the engineer.
From my observations, generally AI-generated code is average quality.
Even with average quality it can save you a lot of time on some narrowly specialized tasks that would otherwise take you a lot of research and understanding. For example, you can code some deep DSP thingie (say audio) without understanding much what it does and how.
For simpler things like backend or frontend code that doesn't require any special knowledge other than basic backend or frontend - this is where the bars of quality come into play. Some people will be more than happy with AI generated code, others won't be, depending on their experience, also requirements (speed of shipping vs. quality, which almost always resolves to speed) etc.
sameerds
2 days ago
It could just be that each of the two reviewers is merely focussing on different sides of the same coin? I use Claude all the time. It saves me a lot of effort that I would have otherwise spent in looking up specific components. The magically autocompleted pieces of boilerplate are a tangible relief. It also catches issues that I missed. But when it is wrong, it can be subtly or embarassingly or spectacularly wrong depending on the situation.
justinclift
2 days ago
Note that one person is mentioning they use Claude Sonnet, which is less capable than the higher tiers (Opus, etc).
aforwardslash
2 days ago
It boils down to scope. I use CC in both very specific one-language systems and broad backend-frontend-db-cache systems. You can guess where the difficulty lies. (Hint: its the stuff with at least 3 distinct languages)
ghurtado
2 days ago
> basically no "specs" - just giving it coherent sane direction
This is one variable I almost always see in this discussion: the more strict the rules that you give the LLM, the more likely it is to deeply disappoint you
The earlier in the process you use it (ie: scaffolding) the more mileage you will get out of it
It's about accepting fallability and working with it, rather than trying to polish it away with care
phatskat
2 days ago
To me this still feels like it would be a net negative. I can scaffold most any project with a language/stack specific CLI command or even just checking out a repo.
And sure, AI could “scaffold” further into controllers and views and maybe even some models, and they probably work ok. It’s then when they don’t, or when I need something tweaked, that the worry becomes “do I really understand what’s going on under the hood? Is the time to understand that worth it? Am I going to run across a small thread that I end up pulling until my 80% done sweater is 95% loose yarn?”
To me the trade-off hasn’t proven worth it yet. Maybe for a personal pet project, and even then I don’t like the idea of letting something else undeterministically touch my system. “But use a VM!” they say, but that’s more overhead than I care for. Just researching the safest way to bootstrap this feels like more effort than value to me.
Lastly, I think that a big part of why I like programming is that I like the act of writing code, understanding how it works, and building something I _know_.
michaelmrose
2 days ago
A lot of the benefit of scaffolding is building basic context which you can also build by feeding it the files produced by whatever CLI tool and talk about it forcing it to think for lack of a better word about your design. You can also force feed it design and api documentation. If you think that you have given it too much you are almost certainly wrong.
Doing nonsensical things with a library feed it the documentation still busted make it read the source
prmph
2 days ago
But, how do you know the code is good?
If you do spot checks, that is woefully inadequate. I have lost count of the number of times when, poring over code a SOTA LLM has produced, I notice a lot of subtle but major issues (and many glaring ones as well), issues a cursory look is unlikely to pick up on. And if you are spending more time going over the code, how is that a massive speed improvement like you make it seem?
And, what do you even mean by 10x the amount of work? I keep saying anybody that starts to spout these sort of anecdotes absolutely does NOT understand real world production level serious software engineering.
Is the model doing 10x the amount of simplification, refactoring, and code pruning an effective senior level software engineer and architect would do? Is it doing 10x the detailed and agonizing architectural (re)work that a strong developer with honed architectural instincts would do?
And if you tell me it's all about accepting the LLM being in the driver's seat and embracing vibe coding, it absolutely does NOT work for anything exceeding a moderate level of complexity. I used to try that several times. Up to now no model is able to write a simple markdown viewer with certain specific features I have wanted for a long time. I really doubt the stories people tell about creating whole compilers with vide coding.
If all you see is and appreciate that it is pumping out 10x features, 10x more code, you are missing the whole point. In my experience you are actually producing a ton of sh*t, sorry.
datavirtue
a day ago
Way better than the random India dev output. I seriously don't know what everyone around here is doing. All I see are complaints while I produce the output of ten devs. Clean code, solid design.
Spend a few hours writing context files. Spend the rest of the week sipping bourbon.
sarchertech
a day ago
So what have you released?
10x means you could have built something that would have taken 4 or 5 years in the time you've had since Opus 4.5 came out.
Where's your operating system, game engine, new programming language, or complex SaaS app?
hirvi74
2 days ago
> But, how do you know the code is good?
Honestly, this more of a question about scope of the application and the potential threat vectors.
If the GP is creating software that will never leave their machine(s) and is for personal usage only, I'd argue the code quality likely doesn't matter. If it's some enterprise production software that hundreds to millions of users depend on, software that manages sensitive data, etc., then I would argue code quality should asymptotically approach perfection.
However, I have many moons of programming under my belt. I would honestly say that I am not sure what good code even is. Good to who? Good for what? Good how?
I truly believe that most competent developers (however one defines competent) would be utterly appalled at the quality of the human-written code on some of the services they frequently use.
I apply the Herbie Hancock philosophy when defining good code. When once asked what is Jazz music, Herbie responded with, "I can't describe it in words, but I know it when I hear it."
sarchertech
2 days ago
> I apply the Herbie Hancock philosophy when defining good code. When once asked what is Jazz music, Herbie responded with, "I can't describe it in words, but I know it when I hear it."
That’s the problem. If we had an objective measure of good code, we could just use that instead of code reviews, style guides, and all the other things we do to maintain code quality.
> I truly believe that most competent developers (however one defines competent) would be utterly appalled at the quality of the human-written code on some of the services they frequently use.
Not if you have more than a few years of experience.
But what your point is missing is the reason that software keeps working in the fist, or stays in a good enough state that development doesn’t grind to a halt.
There are people working on those code bases who are constantly at war with the crappy code. At every place I’ve worked over my career, there have been people quietly and not so quietly chipping away at the horrors. My concern is that with AI those people will be overwhelmed.
They can use AI too, but in my experience, the tactical tornadoes get more of a speed boost than the people who care about maintainability.
hirvi74
2 days ago
I had a long reply to your comment, then decide it was not truly worth reading. However, I do have one question remaining:
> the tactical tornadoes get more of a speed boost than the people who care about maintainability.
Why are these not the same people? In my job, I am handed a shovel. Whatever grave I dig, I must lay in. Is that not common? Seriously, I am not being factious. I've had the same job for almost a decade.
sarchertech
a day ago
That’s because you’ve been there a decade. It’s very common for people to skip jobs every 2 years so that they never end up seeing the long term consequences of their actions.
The other common pattern I’ve seen goes something like this.
Product asks Tactical Tornado if they can building something TT says sure it will take 6 weeks. TT doesn’t push back or asks questions, he builds exactly what product asks for in an enormous feature branch.
At the end of 6 weeks he tries to merge it and he gets pushback from one or more of the maintainability people.
Then he tells management that he’s being blocked. The feature is already done and it works. Also the concerns other engineers have can’t be addressed because “those are product requirements”. He’ll revisit it later to improve on it. He never does because he’s onto the next feature.
Here’s the thing. A good engineer would have worked with product to tweak the feature up front so that it’s maintainable, performant etc…
This guy uses product requirements (many that aren’t actually requirements) and deadlines to shove his slop through.
At some companies management will catch on and he’ll get pushed out. At other companies he’ll be praised as a high performer for years.
sameerds
2 days ago
> I can still run 2-3 clients almost 24/7 pumping out features.
Honest question. How does one do that? My workflow is to create one git worktree per feature and start one session per worktree. And then I spent two hours in a worktree talking to Opus and reviewing what it is doing.
Peritract
a day ago
> It's regularly writing systems-level code that would take me months to write by hand in hours, with minimal babysitting
Has your output kept pace with the code? Because months in hours means, even pushing those ratios quite far, to be years in days.
Has your roadmap accelerated multiple years in the last few months in terms of verifiable results?
kobe_bryant
2 days ago
months you say? how incredible. it beggars belief in fact
hirvi74
2 days ago
Not sure about ChatGPT, but Claude was (is still?) an absolute ripper at cracking some software if one has even a little bit of experience/low level knowledge. At least, that's what my friend told me... I would personally never ever violate any software ToA.
buredoranna
2 days ago
> the whole thing being built on copyright infringement
I am not a lawyer, but am generally familiar with two "is it fair use" tests.
1. Is it transformative?
I take a picture, I own the copyright. You can't sell it. But if you take a copy, and literally chop it to pieces, reforming it into a collage, you can sell that.
2. Does the alleged infringing work devalue the original?
If I have a conversation with ai about "The Lord of the Rings". Even if it reproduces good chunks of the original, it does not devalue the original... in fact, I would argue, it enhances it.
Have I failed to take into account additional arguments and/or scenarios? Probably.
But, in my opinion, AI passes these tests. AI output is transformative, and in general, does not devalue the original.
taikahessu
2 days ago
In order for LLM to be useful, you need to copy and steal all of the work. Yes, you can argue you don't need the whole work, but that's what they took and feed it in.
And they are making money off of other people's work. Sure, you can use mental jiujutsu to make it fair use. But fair use for LLMs means you basically copy the whole thing. All of it. It sounds more like a total use to me.
I hope the free market and technology catches up and destroys the VC backed machinery. But only time will tell.
ragequittah
2 days ago
I always wonder if anyone out there thinks they're not making money off of other people's work. If you're coding, writing a fantasy novel, taking a photograph or drawing a picture from first principals you came up with yourself I applaud you though.
taikahessu
2 days ago
You are absolutely right.
Seriously though, I do think that is the case. It would be self-righteous to argue otherwise. It's just the scale and the nature of this, that makes it so repulsive. For my taste, copying something without permission, is stealing. I don't care what a judge somewhere thinks of it. Using someone's good will for profit is disgusting. And I hope we all get to profit from it someday, not just a select few. But that is just my opinion.
IcyWindows
2 days ago
This kind of thinking seems like a road for people to have to pay a license for the rest of their life after going to school for the knowledge they "stole" from their textbooks.
taikahessu
2 days ago
Except the school paid royalties for that specific book. Every book. The money was distributed. Writers, publishers and so on. The normal stuff.
Or if you had to buy the book yourself, same thing, distributed, royalties paid.
IcyWindows
2 days ago
So your complaint is that they didn't pay for training data by buying every book found online?
That does seem more reasonable, but makes public libraries also evil.
taikahessu
a day ago
Except the libraries pay the fees of the books, they only serve a dedicated local region of people and by loaning a book, you will know the author of the book.
For LLMs the transformative part is then removing the copyright info and serving it to you as OpenAI whatever.
Sure, you can query multiple books at the same time and the technology is godlike. But the underlying issue remains. Without the original content, the LLM is useless. Someone took all the books, feed them in and didn't pay anything back to the authors.
I'm not sure whether arguing in good faith here. This information you could easily check for yourself too. The problem is not the information itself. It's the massive machinery that steals all the works and one day we are staring at the paywall. And the artists are still not funded. I'd rather just do something nice offline in the future.
IcyWindows
7 minutes ago
I'm talking about the knowledge people "steal" by reading. LLMs and humans both absorb knowledge by reading. You want to tax using that knowledge that was absorbed.
It will be applied to people soon after.
ragequittah
2 days ago
I understand but I think this will be quite a quaint idea soon in all honesty. Imagine these things are able to progress the world of science, math, physics, and whatever else (they already are) and we stopped them because someone didn't make enough royalties first. That to me would be more repulsive. We stop/slow the progress of all humanity because there wasn't enough temporary gain for x individual who wrote y book. And if it all turns out to be bogus nonsense then I doubt x individual who wrote y book loses much in the process anyway.
taikahessu
a day ago
Yeah, it's not an easy puzzle piece. How far are we going to go in the name of science and progress again? Are you buying it, that it's all for the greater good? Quite a lot of money involved here. Everyone wants a piece of it. But I digress. Dropping the big bomb, stealing the lands and riches of the natives, using slaves and colonies to power the whole civilization into a new era might be powerful and efficient. But it doesn't make it right. I don't buy the narrative. Do no evil until you can no longer say no?
ragequittah
a day ago
I think comparing intellectual property theft to slavery and stealing land is where I start leaning towards the argument being absurd. The stolen books are still on store shelves. People are likely still buying them at about the same rate as before.
And as far as it being for the greater good that seems to be the promise of many of these companies. What will inevitably get in the way is greed and money, the very same reasons we're arguing about IP theft. Good or bad I see no way out of this but through at this point.
jjwiseman
2 days ago
And in Bartz v. Anthropic, the court found that Anthropic training their LLMs on books was "highly transformative."
verve_rat
2 days ago
The US is not the only legal jurisdiction these services are being sold in.
Madmallard
2 days ago
What in the mental gymnastics?
They just stole everyone's hard work over decades to make this or it wouldn't have been useful at all.
NewsaHackO
2 days ago
That's a statement. The comment you are replying to had actual reasoning behind his claim. Do you have any actual reasoning behind yours?
Madmallard
a day ago
Let's not ignore the entirety of reality and what has been going on for the last few years to defend a pestilence on mankind you probably have stock invested in. I'm not going to acknowledge how insane of an argument that is you're making. It's like you heard of zero leaks, zero law suits, zero open source complaints. Zero anything. Just either intentionally or unintentionally astroturfing.
Thanks.
idiotsecant
2 days ago
This is a tiresome and well trod road.
The fact of the matter is that for profit corporations consumed the sum knowledge of mankind with the intent to make money on it by encoding it into a larger and better organized corpus of knowledge. They cited no sources and paid no fees (to any regular humans, at least).
They are making enormous sums of money (and burning even more, ironically) doing this.
If that doesn't violate copyright, it violates some basic principle of decency.
michaelmrose
2 days ago
You are assuming intellectual property has intrinsic basis when it's at best functional not foundational. It's only useful if the net value to society is positive which is extremely dubious.
idiotsecant
2 days ago
I'm assuming human creativity has intrinsic value, or what's the point of being human?
michaelmrose
a day ago
You are assuming that somehow human creativity was born with intellectual property and will somehow die with it. It's just not so.
idiotsecant
a day ago
Ok captain pedant, instead of making vague handwavey negations exclusively how about you say something.
michaelmrose
21 hours ago
Intellectual property is supposed to feed creativity by securing for creators exclusive rights to benefit from their creation. It mostly feeds uncreative leaches whose business it is to own things in exchange for crumbs for the creativity and drags down both the inherent enjoyment of the fruits of creativity and even its creation. It belonged in the bin back when we first thought of it as is only going to be more unfit for purpose as time goes on.
Aurornis
2 days ago
Writing detailed specs and then giving them to an AI is not the optimal way to work with AI.
That's vibecoding with an extra documentation step.
Also, Sonnet is not the model you'd want to use if you want to minimize cleanup. Use the best available model at the time if you want to attempt this, but even those won't vibecode everything perfectly for you. This is the reality of AI, but at least try to use the right model for the job.
> Therefore I need more time and effort with Gen AI than I needed before
Stop trying to use it as all-or-nothing. You can still make the decisions, call the shots, write code where AI doesn't help and then use AI to speed up parts where it does help.
That's how most non-junior engineers settle into using AI.
Ignore all of the LinkedIn and social media hype about prompting apps into existence.
EDIT: Replaced a reference to Opus and GPT-5.5 with "best available model at the time" because it was drawing a lot of low-effort arguments
wg0
2 days ago
> Writing detailed specs and then giving them to an AI is not the optimal way to work with AI.
It is NOT the way to work with humans basically because most software engineers I worked with in my career were incredibly smart and were damn good at identifying edge cases and weird scenarios even when they were not told and the domain wasn't theirs to begin with. You didn't need to write lengthy several page long Jira tickets. Just a brief paragraph and that's it.
With AI, you need to spell everything out in detail. But that's NO guarantee either because these models are NOT deterministic in their output. Same prompt different output each time. That's why every chat box has that "Regenerate" button. So your output with even a correct and detailed prompt might not lead to correct output. You're just literally rolling a dice with a random number generator.
Lastly - no matter how smart and expensive the model is, the underlying working principles are the same as GPT-2. Same transformers with RL on top, same random seed, same list of probabilities of tokens and same temperature to select randomly one token to complete the output and feedback in again for the next token.
throwaway7783
2 days ago
This is not true in my experience at all. I never write such detailed spec for AI - and that is my value as the human in the loop - to be iterative, to steer and make decisions. The AI in fact catches more edge cases than I do, and can point me to things that I never considered myself. Our productivity has increased manyfold, and code quality has increased significantly because writing tests is no longer a chore or an afterthought, or the biggest one for us - "test setup is too complicated". All of that is gone. And it is showing in a decrease in customer reported issues
aforwardslash
2 days ago
> It is NOT the way to work with humans basically because most software engineers I worked with in my career were incredibly smart and were damn good at identifying edge cases and weird scenarios even when they were not told and the domain wasn't theirs to begin with.
I have no clue what AI you're using, but both Claude and Codex, you just explain the outcome, and they are pretty smart figuring out stuff on complex codebases.You don't even need a paragraph, just say "doing this I got an error".
> NO guarantee either because these models are NOT deterministic in their output. Same prompt different output each time.
So, exactly like humans. But a bit more predictable and way more reliable.
> That's why every chat box has that "Regenerate" button.
If you're using the chat box to write code, that's a human error, not an LLM one. Don't blame "AI" for your ignorance.
> no matter how smart and expensive the model is, the underlying working principles are the same as GPT-2.
Sure. Every machine is a smoke machine if operated wrong enough. This tells me you should not get your insight from random YT videos. As a bit of nugget, some of the underlying working principles of the chat system also powered search engines; and their engineers also drank water, like hitler.
snarkconjecture
2 days ago
> the underlying working principles are the same as GPT-2
I don't think anyone was claiming otherwise. Sonnet is still better at writing code than GPT-2, and worse than Opus. Workflows that work with Opus won't always work with Sonnet, just as you can't use GPT-2 in place of Sonnet to do code autocomplete.
jonas21
2 days ago
> That's why every chat box has that "Regenerate" button.
Wait, are you doing this in the web chat interface?!
That's definitely not a good way. You need to be using a harness (like Claude Code) where the agent can plan its work, explore the codebase, execute code, run tests, etc. With this sort of set up, your prompts can be short (like 1 to 5 sentences) and still get great results.
wg0
2 days ago
I use claud CLI or OpenCode. The "Regenerate" example is just to illustrate that same prompt would produce different output each time. You're rolling a dice.
hnfong
a day ago
Not sure what your point is.
Sure, AI output is kind of random.
But that's also basically true for humans. It's harder to "prove" humans are random, but wouldn't you think a person would do things slightly differently when given the same tasks but on different days? People change their minds a lot, it's just that there's no "reconsider" button for people so you feel a bit of social friction if you pester somebody to rethink an issue. But it's no different.
I'd be really surprised if your point is that humans, unlike AI, are super deterministic and that's why they are so much more trustworthy and smarter than AI...
rafram
2 days ago
> Opus or GPT-5.5 are the only ways to even attempt this.
It’s pretty funny to claim that a model released 22 hours ago is the bare minimum requirement for AI-assisted programming. Of course the newest models are best at writing code, but GPT-* and Claude have written pretty decent systems for six months or so, and they’ve been good at individual snippets/edits for years.
Aurornis
2 days ago
> It’s pretty funny to claim that a model released 22 hours ago is the bare minimum requirement for AI-assisted programming.
Not what I said.
The OP was trying to write specs and have an AI turn it into an app, then getting frustrated with the amount of cleanup.
If you want the AI to write code for you and minimize your cleanup work, you have to use the latest models available.
They won't be perfect, but they're going to produce better results than using second-tier models.
rafram
2 days ago
Is it actually the case that 5.5 is that much better at implementing specs than its very capable predecessor released a month ago? Just seems like a baseless and silly claim about a model that has barely been out long enough for anyone to do serious work with it.
Aurornis
2 days ago
> Is it actually the case that 5.5 is that much better at implementing specs than its very capable predecessor released a month ago?
The OP comment was talking about Claude Sonnet. I was comparing to that.
I should have just said "use the best model available"
ghurtado
2 days ago
> Is it actually the case that 5.5 is that much better
Nobody was talking about how much better it is until you wrote this though
It's like you're building your own windmills brick by brick
munk-a
2 days ago
> Stop trying to use it as all-or-nothing. You can still make the decisions, call the shots, write code where AI doesn't help and then use AI to speed up parts where it does help.
You're assuming that finding the places where AI needs help isn't already a larger task than just writing it yourself. AI can be helpful in development in very limited scenarios but the main thrust of the comment above yours is that it takes longer to read and understand code than to write it and AI tooling is currently focused on writing code.
We're optimizing the easy part at the expense of the difficult part - in many cases it simply isn't worth the trouble (cases where it is helpful, imo, exist when AI is helping with code comprehension but not new code production).
Aurornis
2 days ago
> You're assuming that finding the places where AI needs help isn't already a larger task than just writing it yourself.
Not assuming anything, I'm well versed in how to do this.
Anyone who defers to having AI write massive blocks of code they don't understand is going to run into this.
You have to understand what you want and guide the AI to write it.
The AI types faster than me. I can have the idea and understand and then tell the LLM to rearrange the code or do the boring work faster than I can type it.
kakacik
2 days ago
If you are trying to sell it, you are doing a poor job and effectively siding with OP while desperately trying to write the opposite.
Juniors are mostly better than what you write as behavior, I certainly never had to correct as much after any junior as OP writes. If you have 'boring code' in your codebase, maybe it signals not that great architecture (and I presume we don't speak about some codegens which existed since 90s at least).
Also, any senior worth their salt wants to intimately understand their code, the only way you can anyhow guarantee correctness. Man, I could go on and on and pick your statements one by one but that would take long.
Exoristos
2 days ago
The number of devs I've worked with who can't touch-type and don't use or know their way around a proper IDE is depressingly large.
Aurornis
2 days ago
Same with debuggers. I run into people with 10 years of experience who are still trying to printf debug complex problems that would be easy with 5 minutes in a debugger.
I think we're seeing something similar with AI: There are devs who spend a couple days trying to get AI to magically write all of their code for them and then swear it off forever, thinking they're the only people who see the reality of AI and everyone else is wrong.
munk-a
2 days ago
At the same time - there are devs that spend two days setting up a debugger for a simple problem that would be easy with five minutes and printf. AI is a tool and it's a useful tool - it's not always the best tool for the job and the real skill is in knowing when you use it and when not to.
It's a sort of context of life that the easy problems are solved - those where an extreme answer is always correct are things we no longer even consider problems... most of the options that remain have their advantages and disadvantages so the true answer is somewhere in the middle.
hunterpayne
2 days ago
Right, but then the AI doesn't have a positive ROI. In all fairness, it never has a positive ROI but now its much more negative, to the point the accountants will put an end to the experiment after year end reveals how negative it really is.
throwuxiytayq
2 days ago
This isn't about touch typing or IDE tricks. I'm an IDE power user and - reasoning aside - I used to run circles around my peers when it comes to raw code editing efficiency. This is increasingly an obsolete workflow. LLMs can execute codebase-wide refactors in seconds. You can use them as a (foot-)shotgun, or as a surgical tool.
Exoristos
2 days ago
So many are masters of AI marketing, it's thinkable one of them has mastered AI.
ryan_n
2 days ago
You've come full circle and are essentially just describing what the OP was saying in their initial post lol.
_puk
2 days ago
The problem I have with this take is it's focused on solving the right now problem.
Yes, it's quicker to do it yourself this time, but if we build out the artifacts to do a good enough job this time, next time it'll have all the context it needs to take a good shot at it, and if you get overtaken by AI in the meantime you've got an insane head start.
Which side of history are you betting on?
munk-a
2 days ago
I don't believe that investing more of my time in a slower process now would result in an advantage if that other process was refined. I've toyed around with these tools and know enough to get an environment up and running so what would I gain from using them more right now if those tools may significantly change before they're adapted to more efficient usage?
I'm okay not being at the bleeding edge - I can see the remains of the companies that aggressively switch to the new best thing. Sometimes it'll pay off and sometimes it won't. I am comfortable being a person that waits until something hits a 2.0 and the advantages and disadvantages are clear before seriously considering a migration.
torben-friis
2 days ago
If you don't do it yourself and you don't get overtaken by AI, you've lost the head start to be better next time - humans learn, and they atrophy as well.
afro88
2 days ago
> Writing detailed specs and then giving them to an AI is not the optimal way to work with AI. > That's vibecoding with an extra documentation step.
Read uncharitably, yeah. But you're making a big assumption that the writing of spec wasn't driven by the developer, checked by developer, adjusted by developer. Rewritten when incorrect, etc.
> You can still make the decisions, call the shots
One way to do this is to do the thinking yourself, tell it what you want it to do specifically and... get it to write a spec. You get to read what it thinks it needs to do, and then adjust or rewrite parts manually before handing off to an agent to implement. It depends on task size of course - if small or simple enough, no spec necessary.
It's a common pattern to hand off to a good instruction following model - and a fast one if possible. Gemini 3 Flash is very good at following a decent spec for example. But Sonnet is also fine.
> Stop trying to use it as all-or-nothing
Agree. Some things just aren't worth chasing at the moment. For example, in native mobile app development, it's still almost impossible to get accurate idiomatic UI that makes use of native components properly and adheres to HIG etc
yonaguska
2 days ago
this is my workflow, converse with it to write a spec. I'm reviewing the spec myself. Ask it to trace out how it would implement it. I know the codebase because it was originally written mostly by hand. Correct it with my best practices. Have it challenge my assumptions and read the code to do so. then it s usually good enough to go on it's on. the beauty of having a well defined spec is that once it's done, I can have another agent review it and it generates good feedback if it deviates from the spec at all.
I'm unsure if this is actually faster than me writing it myself, but it certainly expends less mental energy for me personally.
The real gains I'm getting are with debugging prod systems, where normally I would have to touch five different interfaces to track down an issue, I've just encompassed it all within an mcp and direct my agent on the debugging steps(check these logs, check this in the db, etc)
mandeepj
2 days ago
Sure, Opus is next level than Sonnet, but it still doesn't free OP from these handcuffs - It is reading the code, understanding it and making a mental model that's way more labour intensive.
Aurornis
2 days ago
The OP's problem was treating the situation as two extremes: Either write everything myself, or defer entirely to the AI and be forced to read it later.
I was trying to explain that this isn't how successful engineers use AI. There is a way to understand the code and what the AI is doing as you're working with it.
Writing a spec, submitting it to the AI (a second-tier model at that) and then being disappointed when it didn't do exactly what you wanted in a perfect way is a tired argument.
hunterpayne
2 days ago
Is doing that faster than just writing it by hand? Remember to include the time you need to review the code afterwards. The research so far says it isn't faster. Yet people keep doubling down on it and thinking winning an Internet argument is going to matter when it hits the fan in the near future.
WesolyKubeczek
2 days ago
But when you write code by hand, you at least are there as it’s happening, which makes reading and understanding way easier.
ost-ing
2 days ago
> That's vibecoding with an extra documentation step.
This sounds like an LLM talking.
Either you're a bot, or our human languages are being modified in realtime by the influence of these tools.
elAhmo
2 days ago
Funny hearing you’re saying only GPT 5.5 (and Opus) can do this, having in mind that it came out last night.
Aurornis
2 days ago
To be clear, I'm not saying that they can do this.
I'm saying that if you're trying to have AI write code for you and you want to do as little cleanup as possible, you have to use the best model available.
ForOldHack
2 days ago
"Writing detailed specs and then giving them to an AI is not the optimal way to work with AI." Perfect. I loosely define things, and then correct it, and tell it to make the corrections, and it gets trained, but you have to constantly watch it. Its like a glorified auto-typer.
"Ignore all of the LinkedIn and social media hype about prompting apps into existence." Absolutely, its not hype, its pure marketing bullshitzen.
scuderiaseb
2 days ago
I must be doing something very different from everyone else, but I write what I want and how I want it and Opus 4.7 plans it for me, then I carefully review. Often times I need to validate and check things, sometimes I’ve revised the plan multiple times. Then implementation which I still use Opus for because I get a warning that my current model holds the cache so Sonnet shouldn’t implement. And honestly, I’m mostly within my Pro subscription, granted I also have ChatGPT Plus but I’ve mostly only used that as the chat/quick reference model. But yeah takes some time to read and understand everything, a lot of the time I make manual edits too.
wg0
2 days ago
>Then implementation which I still use Opus for because I get a warning that my current model holds the cache so Sonnet shouldn’t implement.
This is based on the premise that given detailed plan, the model will exactly produce the same thing because the model is deterministic in nature which is NOT the case. These models are NOT deterministic no matter how detailed plan you feed it in. If you doubt, give the model same plan twice and see something different churned out each time.
> And honestly, I’m mostly within my Pro subscription, granted I also have ChatGPT Plus but I’ve mostly only used that as the chat/quick reference model. But yeah takes some time to read and understand everything, a lot of the time I make manual edits too.
I do not know how you can do it on a Pro plan with Claude Opus 4.7 which is 7.5x more in terms of limit consumption and any small to medium size codebase would easily consume your limits in just the planning phase up to 50% in a single prompt on a Pro plan (the $20/month one that they are planning to eliminate)
scuderiaseb
2 days ago
> I do not know how you can do it on a Pro plan with Claude Opus 4.7 which is 7.5x more in terms of limit consumption and any small to medium size codebase would easily consume your limits in just the planning phase up to 50% in a single prompt
I also don’t understand because all I ever hear is people saying $100 Max plan is the minimum for serious work. I made 3-4 plans today, I’m familiar with the codebase and pointed the LLM in the direction where it needed to go. I described the functionality I wanted which wasn’t a huge rewrite, it touched like 4 files of which one was just a module of pydantic models. But one plan was 30% of usage and I had this over two sessions because I got a reset. I did read and understand everything line of code so that part takes me some time to do.
aforwardslash
2 days ago
One of the simple "reasons" is to keep context clean; if you're doing planning, you're not loading source code, its just the plan. Also, it may happen that if you're running parallel manual sessions, cache expires after 1h, so a prompt on an idle session will re-trigger re-evaluating the whole context (something quite heavy on a 1M context window). This burns a lot of credit.
_puk
2 days ago
Rather than vibe, write your thoughts and get the model to challenge you / flesh it out is my preferred approach.
Get it to write a context capsule of everything we've discussed.
Chuck that in another model and chat around it, flesh out the missing context from the capsule. Do that a couple of times.
Now I have an artifact I can use to one-shot a hell of a lot of things.
This is amazing for 0-1.
For brown field development, add in a step to verify against the current code base, capture the gotchas and bounds, and again I've got something an agent has a damn good chance of one-shotting.
coldtea
2 days ago
>I write detailed specs. Multifile with example code. In markdown. Then hand over to Claude Sonnet. With hard requirements listed, I found out that the generated code missed requirements, had duplicate code or even unnecessary code wrangling data (mapping objects into new objects of narrower types when won't be needed) along with tests that fake and work around to pass.
Stop doing that. Micromanage it instead. Don't give it the specs for the system, design the system yourself (can use it for help doing that), inform it of the general design, but then give it tasks, ONE BY ONE, to do for fleshing it out. Approve each one, ask for corrections if needed, go to the next.
Still faster than writing each of those parts yourself (a few minutes instead of multiple hours), but much more accurate.
dieortin
2 days ago
Might as well just write the code yourself at that point. And as a bonus, end up with a much better understanding of the codebase (and way better code)
coldtea
2 days ago
>Might as well just write the code yourself at that point
"We have this thing that can speed your code writing 10x"
"If it isn't 1000x and it doesn't give me a turnkey end to end product might as well write the whole thing myself"
People have forgotten balance. Which is funny, because the inability of the AI to just do the whole thing end to end correctly is what stands between 10 developers having a job versus 1 developer having a job telling 10 or 20 agents what to do end to end and collecting the full results in a few hours.
And if you do it the way I describe you get to both use AI, AND have "a much better understanding of the codebase (and way better code)".
dieortin
a day ago
Writing the code is usually not the bottleneck, so you don’t gain that much speeding it up. And as I said, you lose a lot of knowledge about the code when you don’t write it yourself.
Unless coding is most of your job, which is rare, you’re giving up really knowing what your software does in order to achieve a very minor speed up. Just to end up having to spend way more time later trying to understand the AI generated code when inevitably something breaks.
> And if you do it the way I describe you get to both use AI, AND have "a much better understanding of the codebase (and way better code)".
Using AI is not a goal in itself, so I don’t care about “getting to use AI”. I care about doing my job as efficiently as possible, considering all parts of my job, not just coding.
coldtea
a day ago
>Writing the code is usually not the bottleneck
I hear this repeated often and it's false.
Writing the code is A bottleneck. Except if by "writing the code" people just mean the mere physical act of typing it in. Which is not what I mean.
But if someone thinks that just the design/architecture decisions take time, and the fleshing out in actual code does not, they're wrong.
Some coders seem to think they're high end architects, and the fleshing out the design is a triviallity that's very fast. Wathing high end coders wrote, e.g. in code session streams or just someone at your office, will show you it's never that fast.
In actual programming practice, even if you know the design end to end, even if it's a 100-line thing, writing it takes time.
Look up how to call those APIs you need. Debug when you inevitable got some of them wrong. Figure out that regex you need to write. Fix the 2-3 things you do wrong on the first pass of the "trivial" algorithm. Add some logic to catch and report errors and handle edge cases. Add tests.
All these are "trivial", but combined can take a couple of hours for something the AI will most of the time spit out correct the first time in a minute. And of course as you write you also explore dozens of decisions that could go either way, even with the same exact design and external interface to your code.
Getting that ready from the LLM within a minute means you can explore alternative designs, handle new issues that occured, add more functionality to make it more usable and smarter , etc, all the while you'd still be writing the original cruder version.
>Using AI is not a goal in itself, so I don’t care about “getting to use AI”. I care about doing my job as efficiently as possible, considering all parts of my job, not just coding.
Not the point. Nobody said AI is a goal in itself.
AI however does speed up the work, and if you take the black-and-white "if AI can't do it all by itself end-to-end without me intervening then I'd rather write everything myself" (what I respond to), then you're not doing your job "as efficiently as possible".
Aurornis
2 days ago
The goalposts move every month. We’re at the stage where handing an entire specification to a mid-tier AI and walking away while it does all the work and then being disappointed that it wasn’t perfect means it’s useless.
coldtea
a day ago
sensanaty
a day ago
If I still have to do a ton of work to clean up whatever the AI shits out then it might as well have done nothing. The promise of these systems from the hypesters is that it can do everything, so don't be surprised when people expect exactly that.
coldtea
a day ago
>If I still have to do a ton of work to clean up whatever the AI shits out then it might as well have done nothing.
Either you find what AI produces is in general "shit" (which is not realistic to think for latest LLMs, but ok).
Or you take a knee-jerk all or nothing black and white attitude to it.
"If you have to do a ton of work"? Is that work much less than what you'd have done without any AI assistance?
hintymad
2 days ago
> With hard requirements listed, I found out that the generated code missed requirements,
This is hardly a surprise, no? No matter how much training we run, we are still producing a generative model. And a generative model doesn't understand your requirements and cross them off. It predicts the next most likely token from a given prompt. If the most statistically plausible way to finish a function looks like a version that ignores your third requirement, the model will happily follow through. There's really no rules in your requirements doc. They are just the conditional events X in a glorified P(Y|X). I'd venture to guess that sometimes missing a requirement may increase the probability of the generated tokens, so the model will happily allow the miss. Actually, "allow" is too strong a word. The model does not allow shit. It just generates.
teucris
2 days ago
But agents do keep task lists and check the tasks off as they go. Of course it’s not perfect either but it’s MUCH better than an LLM can offer on its own.
If you are seeing an agent missing tasks, work with it to write down the task list first and then hold it accountable to completing them all. A spec is not a plan.
mathisfun123
2 days ago
bro do you really not understand that that's a game played for your sake - it checks boxes yes but you have no idea what effect the checking of the boxes actually has. like do you not realize/understand that anthropic/openai is baking this kind of stuff into models/UI/UX to give the sensation of rigor.
jwitthuhn
2 days ago
The checkboxes inform the model as well as the user, and you can observe this yourself. For example in a C++ project with MyClass defined in MyClass.cpp/h:
I ask the model to rename MyClass to MyNewClass. It will generate a checklist like:
- Rename references in all source files
- Rename source/header files
- Update build files to point at new source files
Then it will do those things in that order.
Now you can re-run it but inject the start of the model's response with the order changed in that list. It will follow the new order. The list plainly provides real information that influences future predictions and isn't just a facade for the user.
dragandj
a day ago
And when it doesn't, it politely apologizes, at least :)
_puk
2 days ago
Not to knee jerk on a bro comment, but, bro..
Are you seriously saying that breaking a large complex problem down into it's constituent steps, and then trying to solve each one of them as an individual problem is just a sensation of rigour?
stvltvs
2 days ago
I believe they're saying that the checkboxes are window dressing, not an accurate reflection of what the LLM has done.
kazinator
2 days ago
To some extent, I could agree with that idea. One purpose of that process is to match the impedance between the problem, and human cognition. But that presumes problem solving inherently requires human cognition, which is false; that's just the tool that we have for problem solving. When the problem-solving method matches the cognitive strengths and weaknesses of the problem solvers, they do have a certain sensation of having an upper hand over the problem. Part of that comes from the chunking/division allowing the problem solvers to more easily talk about the problem; have conversations and narratives around it. The ability to spin coherent narratives feels like rigor.
mathisfun123
2 days ago
I'm saying that's not what the stupid bot is actually doing, it's what anthropic added to the TUI to make you feel good in your feelies about what the bot is actually doing (spamming).
Edit: I'll give you another example that I realized because someone pointed it out here: when the stupid bot tells you why it fucked up, it doesn't actually understand anything about itself - it's just generating the most likely response given the enormous amount of pontification on the internet about this very subject...
_puk
2 days ago
I'm not disagreeing in principle, but the detritus left after an anthropic outage is usually quite usable in a completely fresh session. The amount of context pulled and stored in the sandbox is quite hefty.
Whist I can't usually start from the exact same point in the decisioning, I can usually bootstrap a new session. It's not all ephemeral.
To your edit: I find that the most galling thing about finding out about the thinking being discarded at cache clear. Reconstruction of the logical route it took to get to the end state is just not the same as the step by step process it took in the first place, which again I feel counters your "feelies".
mathisfun123
2 days ago
> I find that the most galling thing about finding out about the thinking being discarded at cache clear
There's a really simple solution to this galling sensation: simply always keep in mind it's a stupid GenAI chat bot.
bmurphy1976
2 days ago
I'm starting to think a lot of the problem people are having is just that they have unrealistic expectations.
I'm not having the same problem as you and I follow a very similar methodology. I'm producing code faster and at much higher quality with a significant reduction in strain on my wrists. I doubt I'm typing that much less, but what I am typing is prose which is much more compatible with a standard QWERTY keyboard.
I think part of it is that I'm not running forward as fast as I can and I keep scope constrained and focused. I'm using the AI as a tool to help me where it can, and using my brain and multiple decades of experience where it can't.
Maybe you're expecting too much and pushing it too hard/fast/prematurely?
I don't find the code that hard to read, but I'm also managing scope and working diligently on the plans to ensure it conforms to my goals and taste. A stream of small well defined and incremental changes is quite easy to evaluate. A stream of 10,000 line code dumps every day isn't.
I bet if you find that balance you will see value, but it might not be as fast as you want, just as fast as is viable which is likely still going to be faster than you doing it on your own.
dragandj
a day ago
If the main problem is programming languages incompatibility with QWERTY, that problem has been solved many decades ago. The programmers can switch to Colemak, and save many trillions of dollars of AI expenses.
sevenseacat
4 hours ago
Colemak ftw
rsanek
2 days ago
I'm confused. If you have detailed, specific expectations, why aren't using the best model available? Even if you were using Opus 4.7, I would inquire if you're using high/xhigh effort by default.
Feels crazy to me for people to use anything other than the best available.
lp0_on_fire
2 days ago
> Feels crazy to me for people to use anything other than the best available.
Not everyone has unlimited budgets to burn on tokens.
MattRix
a day ago
Yeah but in a discussion about technology it’s a little silly. It’s like someone complaining about their phone and then finding out they still use a Nokia.
xpe
2 days ago
I also have the same question. That said, for some problems, at least over the last week or so, I did sometimes get better results from lower-effort Opus or even Sonnet. Sometimes I get (admittedly this is by feels) a better experience from voice mode which uses Haiku. This is somewhat surprising in some ways but maybe not in others. Some possible explanations include: (a) bugs relating to Anthropic's recent post-mortem [1] or (b) a tendency for a more loquacious Claude to get off in the weeds rather than offering a concise answer which invite short back-and-forth conversations and iteration.
[1]: https://www.anthropic.com/engineering/april-23-postmortem ... but also see the September 2025 one at https://www.anthropic.com/engineering/a-postmortem-of-three-...
linsomniac
2 days ago
>Then hand over to Claude Sonnet.
Have you tried Opus 4.6 with "/effort max" in Claude Code? That's pretty much all I use these days, and it is, honestly, doing a fantastic job. The code it's writing looks quite good to me. Doesn't seem to matter if it's greenfield or existing code.
If code is harder to read than to write, you're doing yourself a disservice by having the output stage not be top shelf.
dragandj
a day ago
I find it works even better with "/effort ultra".
mschulkind
a day ago
They renamed ultra to max about a week ago.
jwpapi
2 days ago
I have the same feeling.
Like there is no way in world that Gen AI is faster then an actual cracked coder shooting the exact bash/sql commands he needs to explore and writing a proper intent-communicating abstraction.
I’m thinking the difference is in order of magnitudes.
On top of that it adds context loss, risk of distraction, the extra work of reading after the job is done + you’ll have less of a mental model no matter how good you read, because active > passive.
Man it was really the weirdest thing that Claude Coded started hiding more and more changes. Thats what you need, staying closely on the loop.
eweise
2 days ago
I give Claude small incremental tasks to do and it usually does them flawlessly. I know how to design the software and break into incremental tasks. Claude does the work. The productivity increase has been incredible. I think I'll be able to bootstrap a single person lifestyle business just using Claude.
throwaway7783
2 days ago
I don't know. I don't write detailed specs, but make it very iterative, with two sessions. One for coding and one for reviews at various levels.
Just the coding window makes mistakes, duplicates code, does not follow the patterns. The reviewer catches most of this, and the coder fixes them all after rationalizing them.
Works pretty well for me. This model is somewhat institutionalized in my company as well.
I use CC Opus 4.7 or Codex GPT 5.4 High (more and more codex off late).
meroes
2 days ago
This is how I feel with AI math proofs. I’m not sure where they’re at now, but a year ago it took so much more time to check if an LLM proof was technically correct even if hard to understand, compared to a well structured human proof.
Maybe it was Timothy Gowers who commented on this.
Lots of human proofs have the unfortunate “creative leap” that isn’t fully explained but with some detectable subtlety. LLMs end up making large leaps too, but too often the subtle ways mathematicians think and communicate is lost, and so the proof becomes so much more laborious to check.
Like you don’t always see how a mathematician came up with some move or object to “try”, and to an LLM it appears random large creative leaps are the way to write proofs.
baranul
2 days ago
Now that there is Claw Code[1], seems like many of these cancellations are easier to do.
abustamam
2 days ago
This may be a bit silly but I do what you do and then I tell Claude to review the code it wrote and compare it to the specs. It will often find issues and fix it. Then I review the reviewed code, and it's leagues better than pre reviewed code.
This may be worth trying out.
arikrahman
2 days ago
I use open spec to negotiate requirements before the handoff, it's helped me a lot. You could also use GSD2 or Amazon's Kiro, or Spec Kit but I find they have too many stages and waste tokens.
moribunda
2 days ago
And it leaves 25 TODO comments in code silently, reporting to you that everything is done.
dannersy
2 days ago
Beautifully stated and I couldn't agree more. This is my experience.
GoToRO
2 days ago
you are holding it wrong. For real this time.
rob
2 days ago
I use the "Superpowers" plugin that creates an initial spec via brainstorming together, and then takes that spec and creates an implementation spec file based on your initial spec. It also has other agents make sure the spec doesn't drift between those two stages and does its own self-reviews. Almost every time, it finds and fixes a bunch of self-review issues before writing the final plan. Then I take that final plan and run it through the actual execution phase that does its own reviews after everything.
Just saying that I know a lot of people like to raw dog it and say plugins and skills and other things aren't necessary, but in my case I've had good success with this.
varispeed
2 days ago
You can quickly get something "working" until you realise it has a ton of subtle bugs that make it unusable in the long run.
You then spend months cleaning it up.
Could just have written it by hand from scratch in the same amount of time.
But the benefit is not having to type code.
hirvi74
2 days ago
That is why I still use the Chatbots and not the CLI/desktop tools. I am in 100% control. I mainly ask question surrounding syntax with languages I am not well experienced in, snippets/examples, and sometimes feedback on certain bits of logic.
I feel like I have easily multiplied my productivity because I do not really have to read more than a single chat response at a time, and I am still familiar with everything in my apps because I wrote everything.
I've been working on Window Manager + other nice-to-haves for macOS 26. I do not need a model to one-shot the program for me. However, I am thrilled to get near instantaneous answers to questions I would generally have to churn through various links from Google/StackOverflow for.
tengbretson
2 days ago
> or even unnecessary code wrangling data (mapping objects into new objects of narrower types when won't be needed)
Dude! The amount of ad-hoc, interface-specific DTOs that LLM coding agents define drives me up the wall. Just use the damn domain models!
CamperBob2
2 days ago
Then hand over to Claude Sonnet.
Well, there's your problem. Why aren't you using the best tool for the job?
xpe
2 days ago
I very much value and appreciate the first four paragraphs! [3] This is my favorite kind of communication in a social setting like this: it reads more like anthropology and less like judgment or overgeneralization.
The last two paragraphs, however, show what happens when people start trying to use inductive reasoning -- and that part is really hard: ...
> Therefore I need more time and effort with Gen AI than I needed before because I need to read a lot of code, understand it and ensure it adheres to what mental model I have.
I don't disagree that the above is reasonable to say. But it isn't all -- not even enough -- about what needs to be said. The rate of change is high, the amount of adaptation required is hard. This in a nutshell is why asking humans to adapt to AI is going to feel harder and harder. I'm not criticizing people for feeling this. But I am criticizing the one-sided-logic people often reach for.
We have a range of options in front of us:
A. sharing our experience with others
B. adapting
C. voting with your feet (cancelling a subscription)
D. building alternatives to compete
E. organizing at various levels to push back
(A) might start by sounding like venting. Done well it progresses into clearer understanding and hopefully even community building towards action plans: [1]> Hence Gen AI at this price point which Anthropic offers is a net negative for me because I am not vibe coding, I'm building real software that real humans depend upon and my users deserve better attention and focus from me hence I'll be cancelling my subscription shortly.
The above quote is only valid unless some pretty strict (implausible) assumptions: (1) "GenAI" is a valid generalization for what is happening here; (2) Person cannot learn and adapt; (2) The technology won't get better.
[1]: I'm at heart more of a "let's improve the world" kind of person than "I want to build cool stuff" kind of person. This probably causes some disconnect in some interactions here. I think some people primarily have other motives.
Some people cancel their subscriptions and kind of assume "the market and public pushback will solve this". The market's reaction might be too slow or too slight to actually help much. Some people put blind faith into markets helping people on some particular time scales. This level of blind faith reminds me of Parable of the Drowning Man. In particular, markets often send pretty good signals that mean, more or less, "you need to save yourself, I'm just doing my thing." Markets are useful coordinating mechanisms in the aggregate when functioning well. One of the best ways to use them is to say "I don't have enough of a cushion or enough skills to survive what the market is coordinating" so I need a Plan B!
Some people go further and claim markets are moral by virtue of their principles; this becomes moral philosophy, and I think that kind of moral philosophy is usually moral confusion. Broadly speaking, in practice, morality is a complex human aspiration. We probably should not not abdicate our moral responsibilities and delegate them to markets any more than we would say "Don't worry, people who need significant vision correction (or other barrier to modern life)... evolution will 'take care' of you."
One subscription cancellation is a start (if you actually have better alternative and that alternative being better off for the world ... which is debatable given the current set of alternatives!)
Talking about it, i.e. here on HN might one place to start. But HN is also kind of a "where frustration turns into entertainment, not action" kind of place, unfortunately. Voting is cheap. Karma sometimes feels like a measure of conformance than quality thinking. I often feel like I am doing better when I write thoughtfully and still get downvotes -- maybe it means I got some people out of their comfort zone.
Here's what I try to do (but fail often): Do the root cause analysis, vent if you need to, and then think about what is needed to really fix it.
[2]: https://en.wikipedia.org/wiki/Parable_of_the_drowning_man
[3]: The first four are:
I write detailed specs. Multifile with example code. In markdown.
Then hand over to Claude Sonnet.
With hard requirements listed, I found out that the generated code missed requirements, had duplicate code or even unnecessary code wrangling data (mapping objects into new objects of narrower types when won't be needed) along with tests that fake and work around to pass.
So turns out that I'm not writing code but I'm reading lots of code.