sorcercode
5 days ago
> when we are yet to confidently have any model complete a single simple instruction???
i understand the author might be a little frustrated and employing hyperbole here, but are most folks genuinely having similar problems?
at this point I have found LLMs to more often than not follow my instructions. It requires diligent pruning of instructions, effective prompting and planning. but once you get a sense of how to do those three things, it's possible to fly with these coding agents.
it does get it wrong occasionally but anecdotally this is like 1/10 in my experience. and interrupting and course-correcting quickly gets me right back on track.
I'm just surprised at the skepticism at the usefulness of these tools from the HN comments. there's plenty of reasons to be worried and upset (cost, job transformation and displacement etc) but the effectiveness of coding agents being a common theme, in the comments here as well, is surprising to me.
bradfa
5 days ago
My experience has been that if you take the time to explain what the current state is, what your desired state should be, and to give information on how you want the agent to proceed, that then you can work with the agent to craft a plan, refine the plan, and finally execute the plan. In this mode of operation, the current state of the art is quite impressive.
You can't just give it a single sentence and expect it to do something complex correctly. It takes real effort and human time, just like if you were trying to get a smart and capable intern who has no real world experience to do something technical correctly. Just the AI agents work significantly faster than a human intern.
bigstrat2003
4 days ago
> My experience has been that if you take the time to explain what the current state is, what your desired state should be, and to give information on how you want the agent to proceed, that then you can work with the agent to craft a plan, refine the plan, and finally execute the plan.
People say this a lot, and I'm not even saying you're wrong. But that isn't useful to me. In the time it takes me to do all that, I can just solve the problem myself. If I have to hold its hand through finding a solution, then it is a time suck, not a time saver.
drcxd
4 days ago
Agreed. I also have to check if it has implemented the idea correctly.
If my workflow is;
1. Write documentation so that the problem and even the solution to the problem is well explained. 2. Instruct coding agents to work as the document described. 3. Check its if its implementation is correct, and improve its implementation if necessary.
I feel the experience is not as good as me implementing the solution myself, and it may even take more time.
bradfa
4 days ago
There are definitely times when it’s not faster to use the tool to do the full job. But sometimes just using the tool to plan the job helps to clarify the task so a human can do it better/faster.
But then there’s also tasks where using the tool is a HUGE speed up.
andoando
4 days ago
You can literally write "Add a feature on the UI where we get a live update of new posts using a websocket connection in the backend server at /app/backend."
And it will it will integrate websockets in your UI, backend and create the models, service logic, etc in under 20 seconds. Can you really do that?
mcv
4 days ago
That's boilerplate stuff. That's what it's best at. But the moment I want something slightly off the beaten path (and I always do), it struggles and makes mistakes.
credit_guy
3 days ago
Look at it differently: getting to use AI so it is a productivity multiplier and not a productivity sink is hard. It takes a lot of work. Or rather, a lot of experimentation. You will generate some AI slop, you will annoy some people, you will embarrass yourself, you will retrace your steps, etc, etc. It is hard. Ok, let me say that again. It is hard. It is not easy. It is a tool with a steep learning curve.
It is ok to decide not to use such a tool. But you will be left behind.
cindyllm
4 days ago
[dead]
qazxcvbnmlp
5 days ago
> My experience has been that if you take the time to explain what the current state is, what your desired state should be, and to give information on how you want the agent to proceed,
I have a pet theory. 1. This skill requires a strong theory of mind[1]. 2. Theory of mind is more difficult in those with autism 3. The same autism that makes people really good at coding, and gives them the time to post on online forms like hn, makes it hard to understand how to work with LLMs and how others work with llms.
To provide good context to the llm you need to have a good understanding of (1) what it will and will not know, (2) what you know and take for granted(ie a theory of your own mind) (3) what your expectations are. None of this you need to do when you are coding on your own, but are critical on getting a good response from the LLM.
See also the black and white thinking that is common in the responses on articles like this.[2]
[1]https://en.wikipedia.org/wiki/Theory_of_mind [2]https://www.simplypsychology.org/black-and-white-thinking-in...
saulpw
4 days ago
An LLM has no mind! What is your strong theory of mind for an LLM? That it knows the whole internet and can regurgitate it like a mindless zombie?
xg15
4 days ago
Whether or not it has a mind is irrelevant to the problem. I think the point is, if you pretend it had a mind and write your prompt accordingly, you will get the best results.
kelseyfrog
4 days ago
Source?
It sounds like an unprovable metaphysical statement than something that is supported by scientific evidence.
fao_
4 days ago
The burden of proof is on people stating that an AI has a theory of mind, not on the reverse. Until recently it was highly debated on if dogs have theory of mind, and it took decades of evidence to come to the conclusion that yes, they do.
judahmeek
4 days ago
GGP didn't say that AI has a theory of mind. GGP said that using AI productively requires a theory of mind, a.k.a. being able to build a mental model of the LLM's context.
kelseyfrog
4 days ago
The burden of proof is on the person making the claim. It doesn't matter whether the claim is positive or negative. The default position is "We don't know if AI has a ToM."
sirtaj
4 days ago
Am I incorrect in thinking this is as much true of the linux kernel or emacs as it is of an LLM?
rmwaite
3 days ago
If you read carefully you will see that they never said AI has a theory of mind.
wild_egg
4 days ago
This actually makes a disturbing amount of sense and I think I'm going to need to chew on it for a while. Thanks for sharing!
bigchillin
4 days ago
That simply psych article is a psyop
imtringued
4 days ago
That HN username is also bad news. Meanwhile yours is pretty cool. I really enjoyed the social credit memes with John Cena.
How exactly do you come up with a pet theory out of nowhere, randomly diagnose people on the internet with autism based on how they use LLMs and then start linking to a most likely AI generated blog post (there was simply too much repetition) that ascribes a lot of negative attributes to them with a username that is meant to be unrecognisable.
The post is basically a Kafka trap or engagement bait.
theshrike79
4 days ago
People don't understand that LLMs aren't humans. There's a lot of implicit context when humans are communicating. LLMs don't do that.
They do have biases, like if you tell them to do something with data, they'll pretty likely grab Python as the tool.
And different models have different biases and styles, you can try to guide them to your specific style with prompts, but it doesn't always work - depending on how esoteric your personal style is.
bradfa
4 days ago
Imagining the tool is like a college intern helps me. It has no idea how the real world works. It blindly follows things it previously found online. It’s great at very common boilerplate coding tasks. But it’s super naive and will need hand holding or you to provide a huge amount of context for it so it can operate on its own.
I’m still very much learning how to give it good instructions to accomplish tasks. Different tasks require different types and methods of instruction. It’s extremely interesting to me. I’m far from an expert.
theshrike79
4 days ago
I imagine LLMs as an endless stream of consultants, each can work only one day (context).
Every day you need to bring them up to speed (prompt, accessible documentation) and give them the task of the day. If it looks like they can't finish the task (context runs out), you need to tell them to write down where they left (store context to a memory, markdown file is fine) and kick them out the door.
Then GOTO 10, get the next one in.
satvikpendem
4 days ago
> that then you can work with the agent to craft a plan, refine the plan, and finally execute the plan
And Cursor just introduced a separate plan mode themselves, so it gets even better.
stpedgwdgfhgdd
4 days ago
The person in the Cursor platform is raising a different question and a valid one. We have tons of these frameworks out there, openspec, amplifier, etc. The ultimate dream is to have these subagents work in the background autonomously.
However reality tells us that you constantly have to keep Claude on the right track. Nudge here, nudge there. Close code reviews. Test, test more. Very interactive. Superpowers to the engineer.
It is this contradiction that also makes me believe that it will take another year for agents to work on enterprise codebases autonomously. Maybe more, look at autonomous self driving, surprisingly hard to get to the last 10%.
mfdupuis
3 days ago
I think this is the challenge and the dissonance. For something to truly run autonomously you need to provide it some many constraints that it almost loses its usefulness. I've tried using AI, or at least looked into what I could use AI for to automate marketing tasks and I just don't think I can seriously set up a workflow in n8n or AgentKit that would produce sufficiently good results without me jumping in. That said, AI is incredibly helpful in this semi-autonomous mode with the right parameters, to the point of the parent comment.
sharts
4 days ago
Moreover they’re not even that great as a search tool. Often just giving incorrect or outdated synthesized results. Marginally better than a raw google search because I can skip all the sponsored/SEO hack results with garbage info.
hattmall
4 days ago
Could you, or someone, make a video showing a real life coding scenario where an agentic system or just LLM provided significant value? That, to me, would be beyond the value one would get or could have previously gotten from a Google search or stack overflow post?
pas
4 days ago
I asked it to generate a simple EffectTS script and it did. (It was my first piece of code using Effect, but I know a bit of Scala ZIO, so it was helpful to get started.)
Effect's docs are quite sparse/terse.
And this is usually what I found, it's good when I have a blank page. (Adding new functionality to a legacy system for example.)
But I get upset in no time when I can't really give feedback to it, as the only options are accepting/rejecting the diff or getting stuck in editing manually, but it's in some strange limbo, so usually IDE tools don't work well, and you can't chat with it about the diff, only about what it already thinks are in the files.
(And while this seems like "just" a tooling issue, fundamentally the whole frenzy soured me, I tried Cursor/Windsurf/ClaudeCode and now I'm out of fucks to give ... even though chatting about solutions and writing Markdown docs would be great for everyone, for the projects I work on, etc.)
geldedus
4 days ago
no I won't. The more people would think agentic-AI-assisted programming doesn't work the more I will have a huge advantage over them
theshrike79
4 days ago
I just literally sat down on my computer after walking the dogs.
During my walk Codex (gpt-5-high) rewrote a Python application to Go in one shot. I sat down, tested and it works. Except now I can just distribute a single binary instead of a virtualenv mess =)
Maybe the instruction was SUPER simple? Dunno.
bn-l
4 days ago
Maybe the code and task was super simple. Maybe the whole rewrite was already in its training data (enough) in some way
theshrike79
4 days ago
Naturally it wasn't breaking new ground on computer science :)
It was a "script" (application?, the line is vague) that reads my Obsidian movie/anime/tv watchlist markdown files from my vault, grabs the title and searches for that title in Themoviedb
If there are multiple matches, it displays a dialog to pick the correct one.
Then it fills the relevant front matter, grabs a cover for Obsidian Bases and adds some extra info on the page for that item.
The Python version worked just fine, but I wanted to share the tool and I just can't be arsed to figure that out with Python. With Go it's a single binary.
Without LLM assistance I could've easily spent a few nights doing this, and most likely would've just not done it.
Now it was the effort of me giving the LLM a git worktree to safely go wild in (I specifically said that it can freely delete anything to clean up old crap) and that's what it did.
And this isn't the first Python -> Go transition I've done, I did the same for a bunch of small utility scripts when GPT3/3.5 was the new hotness. Wasn't as smooth then (many a library and API was hallucinated), but still markedly faster than doing it by hand.
hattmall
3 days ago
Now, that's actually a very reasonable usage, because it isn't coding, it's translation. The counter to that being beneficial is that many code translation programs already exists and it's certainly something that can be built and done so in a way to be certain of security, proper practices etc, which you can't guarantee with LLM and using a specialized program is orders of magnitude less resource intensive.
Of course it's nice as a hobbyist end user to do exactly what you did for a simple script and that's to the credit of the LLM. The over-arching issue is that extremely inefficient process is only possible thanks to subsidization from Venture capital.
theshrike79
3 days ago
I personally prefer Go with LLMs because it has a relatively large amount of analyzers and other tooling to statically check that there are no major issues with the code.
Also the compiler being a stickler for unused code etc. keeps the Agentic models in check, they can't YOLO stuff as hard like in, say, Python.
nasmorn
4 days ago
Rewriting something in another language is super simple. The original code has all the context. Even simple for a human but takes time. I literally did the same thing recently and had Claude rewrite a simple Python lib to elixir so I don’t need to run Python somewhere. Wasn’t perfect actually but very easy to fix the issues.
YouAreWRONGtoo
4 days ago
Millions of compilers have been written and those languages have simple grammars. As such, those are trivial problems.
user
3 days ago
theshrike79
3 days ago
Trivial with an LLM yes, but in reality nobody will bother with a full rewrite in another language just for fun if the original program works, but is a bit annoying to run.
This way it's pretty close to zero effort.
Yondle
3 days ago
I've had the exact same reaction and figured that devs like to say that AI is useless are either: trying to make themselves feel smarter by trying to say that what they are working on is beyond anything that AI has been trained on, OR, aren't able to effectively breakdown the problems they have into smaller digestible chunks.
true_religion
5 days ago
I'm not sure what people are considering instructions but it talks about the topics that I tell it to talk about, and when parsing prose it will take specific instruction as to word choice, or tone.
This is true across Gemini, ChatGPT, and Qwen.
frogperson
4 days ago
That just sounds like probablistic programming to me. Call me old fashioned, but i still prefer my code to be deterministic.
thfuran
4 days ago
"More often than not" is zero 9s.
mcv
4 days ago
More often than not is not good enough for me. But worse than not following instructions, is simply being wrong, and then doubling down on it.
This is something that happens to me a lot. When I ask it to do something moderately complex, or to analyse an error I get, quite often it leaps to the wrong conclusion and keeps doubling down on it when it doesn't work, until I tell it what the actual problem is.
It's great with simple boilerplate code, and I have actually used it to implement a new library successfully with a little bit of feedback from me, but it gets stuff wrong so often that I really don't need it doing anything beyond that. Although when I'm stuck, I still use it to spew ideas. Even if they're wrong, they can help me get going.
sorcercode
4 days ago
i'm genuinely curious - what do your prompts look like? do you have agent instructions for your repo that you've spent a little time pruning or maintaining? when you execute tasks are you planning these out or one-shotting them?
i ask because your comment is sounding like more often than not, you're actually getting *worse* results? (if i'm reading that right). and i want to understand is it a perception problem (like do you just have a higher bar for what you expect from a coding agent); or is it actually producing bad results, in which case i want to understand what's different in the ways we're using these agents.
(also can you provide a concrete example of how it got something wrong? like what were you asking in that moment and what did it do, and how did it double down.)
mcv
4 days ago
A few weeks ago, I was testing the difference between two libraries to draw graphs: cytoscape and NVL. I started with NVL, told it to implement a very basic case: just some simple mock data, and draw the thing on the screen. (Those weren't my prompts; I didn't save those.) I also took the advice of using a context prompt file to lay down coding standards.
It didn't get it right in one go, but after 7 attempts of analysing the error, it suddenly worked, and I was quite amazed. Step by step adding more features was terrible, however; it kept changing the names of functions and properties, and it turned out that was because I was asking for features the library didn't support, or at least didn't document. I ended up rejecting NVL because it was very immature, poorly documented, and there weren't many code examples available. I suspect that's what crippled the AI's ability to use it. But at no point did it tell me I was asking the impossible; it happily kept trying stuff and inventing nonexistent APIs until I was the one to discover it was impossible.
I'm currently using it to connect a neo4j backend to a cytoscape frontend; both of those are well established and well documented, but it still struggles to get details right. It can whip up something that almost works really quickly, but then I spend a lot of time hunting down little errors it made, and they often turn out to be based on a lack of understanding of how these systems work. Just yesterday it offered 4 different approaches to solve a problem, one of which blatantly didn't work, one used a library that didn't exist, one used a library that didn't do what I needed, and only one was a step in the right direction but still not quite correct in its handling of certain node properties.
I'm using Claude 3.7 thinking and Claude 4 for this.