hackernews client

muzani

3 months ago

Cursor has been doing this in production since the start of 2025. You give it instructions or heck, a screenshot of a bug. It searches for relevant code based on the problem. It searches in an area around the code, for tests, behaviors. If you've written a comment pointing to Jira on a certain bug that this code is responsible for fixing (instead of writing a test), it can check that ticket for expected behavior. It may write tests to fill in these gaps or it may just write the code. Then it runs the tests if possible. If a test fails, it compares to the new code. It writes new code and reruns tests.

Tool-use is common in most of the major AI models now and it's really the differentiator between how they perform when writing code. Few write correct code the first time. What makes them different is the ability to read and modify complex code across multiple files, without being told which files.

I think by next year, we could see this extend across the UI domain - it writes code, runs it, views the UI, critiques the results, then tweaks things like font and whitespace. I did a prototype mid-year which would even show it to a user, and it would talk them through what they liked or didn't like. But you can even chain it between multiple LLMs (designer, programmer, customer roles) and it would fit your definition.

spacemnstr42069

3 months ago

Agreed, but as mentioned in the original post, I am looking for use cases out of coding agents. Coding agents are scaling, yes! What are other use cases?

scosman

3 months ago

No they work fine. Not a panacea, but when the use case permits.

Usually layers of tools clustered under sub-agents, and fairly detailed orchestration prompts at higher levels. Orchestration via agent-prompts can be better than hard-coded workflows when they require qualitative assessments.

spacemnstr42069

3 months ago

any example of such agent you see?

iamflimflam1

3 months ago

A lot of things are happening behind closed doors.

The real value is in horribly manual internal processes where the solutions are agents driving very specific tools that drive weird and wacky systems.

Generic out of the box agents that will solve your particular problem are not a thing yet.

In regard to it’s just an “API pipeline” - the power of agents should be - which set of API calls do I string together to solve a user’s request.

Ask HN: Are Agents Just Hype?

8 Comments

muzani

spacemnstr42069

scosman

spacemnstr42069

iamflimflam1

spacemnstr42069

AIinyourAI

spacemnstr42069