GolDDranks
an hour ago
I feel like I'm taking crazy pills. The article starts with:
> you give it a simple task. You’re impressed. So you give it a large task. You’re even more impressed.
That has _never_ been the story for me. I've tried, and I've got some good pointers and hints where to go and what to try, a result of LLM's extensive if shallow reading, but in the sense of concrete problem solving or code/script writing, I'm _always_ disappointed. I've never gotten satisfactory code/script result from them without a tremendous amount of pushback, "do this part again with ...", do that, don't do that.
Maybe I'm just a crank with too many preferences. But I hardly believe so. The minimum requirement should be for the code to work. It often doesn't. Feedback helps, right. But if you've got a problem where a simple, contained feedback loop isn't that easy to build, the only source of feedback is yourself. And that's when you are exposed to the stupidity of current AI models.
b33j0r
30 minutes ago
I usually do most of the engineering and it works great for writing the code. I’ll say:
> There should be a TaskManager that stores Task objects in a sorted set, with the deadline as the sort key. There should be methods to add a task and pop the current top task. The TaskManager owns the memory when the Task is in the sorted set, and the caller to pop should own it after it is popped. To enforce this, the caller to pop must pass in an allocator and will receive a copy of the Task. The Task will be freed from the sorted set after the pop.
> The payload of the Task should be an object carrying a pointer to a context and a pointer to a function that takes this context as an argument.
> Update the tests and make sure they pass before completing. The test scenarios should relate to the use-case domain of this project, which is home automation (see the readme and nearby tests).
gedy
17 minutes ago
What you’re describing makes sense, but that type of prompting is not what people are hyping
jasondigitized
13 minutes ago
I feel like I am taking crazy pills. I am getting code that works from Opus 4.5. It seems like people are living in two separate worlds.
SCdF
23 minutes ago
I am getting workable code with Claude on a 10kloc Typescript project. I ask it to make plans then execute them step by step. I have yet to try something larger, or something more obscure.
jasondigitized
11 minutes ago
This. I feel like folks are living in two separate worlds. You need to narrow the aperture and take the LLm through discrete steps. Are people just saying it doesn't work because they are pointing it at 1m loc monoliths and trying to oneshot a giant epic?
echohack5
30 minutes ago
I have found AI great in alot of scenarios but If I have a specific workflow, then the answer is specific and the ai will get it wrong 100% of the time. You have a great point here.
A trivial example is your happy path git workflow. I want:
- pull main
- make new branch in user/feature format
- Commit, always sign with my ssh key
- push
- open pr
but it always will
- not sign commits
- not pull main
- not know to rebase if changes are in flight
- make a million unnecessary commits
- not squash when making a million unnecessary commits
- have no guardrails when pushing to main (oops!)
- add too many comments
- commit message too long
- spam the pr comment with hallucinated test plans
- incorrectly attribute itself as coauthor in some gorilla marketing effort (fixable with config, but whyyyyyy -- also this isn't just annoying, it breaks compliance in alot of places and fundamentally misunderstands the whole point of authorship, which is copyright --- and AIs can't own copyright )
- not make DCO compliant commits ...
Commit spam is particularly bad for bisect bug hunting and ref performance issues at scale. Sure I can enforce Squash and Merge on my repo but why am I relying on that if the AI is so smart?
All of these things are fixed with aliases / magit / cli usage, using the thing the way we have always done it.
furyofantares
22 minutes ago
> why am I relying on that if the AI is so smart?
Because it's not? I use these things very extensively to great effect, and the idea that you'd think of it as "smart" is alien to me, and seems like it would hurt your ability to get much out of them.
dev_l1x_be
43 minutes ago
Well one way of solving this is to keep giving it simple tasks.
hmaxwell
32 minutes ago
Exactly 100%
I read these comments and articles and feel like I am completely disconnected from most people here. Why not use GenAI the way it actually works best: like autocomplete on steroids. You stay the architect, and you have it write code function by function. Don't show up in Claude Code or Codex asking it to "please write me GTA 6 with no mistakes or you go to jail, please."
It feels like a lot of people are using GenAI wrong.
GolDDranks
41 minutes ago
Just a supplementary fact: I'm in the beneficial position, against the AI, that in a case where it's hard to provide that automatic feedback loop, I can run and test the code at my discretion, whereas the AI model can't.
Yet. Most of my criticism is not after running the code, but after _reading_ the code. It wrote code. I read it. And I am not happy with it. No even need to run it, it's shit at glance.
elevation
25 minutes ago
Yesterday I generated a for-home-use-only PHP app over the weekend with a popular cli LLM product. The app met all my requirements, but the generated code was mixed. It correctly used a prepared query to avoid SQL injection. But then, instead of an obvious:
"SELECT * FROM table WHERE id=1;"
it gave me: $result = $db->query("SELECT * FROM table;");
for ($row in $result)
if ($["id"] == 1)
return $row;
With additional prompting I arrived at code I was comfortable deploying, but this kind of flaw cuts into the total time-savings.__MatrixMan__
31 minutes ago
You might get better code out of it if you give the AI some more restrictive handcuffs. Spin up a tester instance and have it tell the developer instance to try again until it's happy with the quality.
ReverseCold
38 minutes ago
> I can run and test the code at my discretion, whereas the AI model can't.
It sounds like you know what the problem with your AI workflow is? Have you tried using an agent? (sorry somewhat snarky but… come on)
GolDDranks
34 minutes ago
Yeah, you're right, and the snark might be warranted. I should consider it the same as my stupid (but cute) robot vacuum cleaner that goes at random directions but gets the job done.
The thing that differentiates LLM's from my stupid but cute vacuum cleaner, is that the (at least OpenAI's) AI model is cocksure and wrong, which is infinitely more infuriating than being a bit clueless and wrong.
storystarling
17 minutes ago
I've been trying to solve this by wrapping the generation in a LangGraph loop. The hope was that an agent could catch the errors, but it seems to just compound the problem. You end up paying for ten API calls where the model confidently doubles down on the mistake, which gets expensive very quickly for no real gain.
yaur
19 minutes ago
Give Cluade Code a go. It still makes a lot stupid mistakes, but its a vastly different experience from pasting back and forth with chat gpt.
tayo42
9 minutes ago
There's no free trial or anything?
t55
30 minutes ago
skill issue
GolDDranks
23 minutes ago
I don't love these kinds of throwaway comments without any substance, but...
"It Is Difficult to Get a Man to Understand Something When His Salary Depends Upon His Not Understanding It"
...might be my issue indeed. Trying to balance it by not being too stubborn though. I'm not doing AI just to be able to dump on them, you know.
antonvs
16 minutes ago
Skill comes from experience. It takes a good amount of working with these models to learn how to use them effectively, when to use them, and what to use them for. Otherwise, you end up hitting their limitations over and over and they just seem useless.
They're certainly not perfect, but many of the issues that people post about as though they're show-stoppers are easily resolved with the right tools and prompting.