simonw
a day ago
Using coding agents to track down the root cause of bugs like this works really well:
> Three out of three one-shot debugging hits with no help is extremely impressive. Importantly, there is no need to trust the LLM or review its output when its job is just saving me an hour or two by telling me where the bug is, for me to reason about it and fix it.
The approach described here could also be a good way for LLM-skeptics to start exploring how these tools can help them without feeling like they're cheating, ripping off the work of everyone who's code was used to train the model or taking away the most fun part of their job (writing code).
Have the coding agents do the work of digging around hunting down those frustratingly difficult bugs - don't have it write code on your behalf.
rtpg
a day ago
I understand the pitch here ("it finds bugs! it's basically all upside because worst case there's no output anyways"), but I'm finding some of these agents to be ... uhhh... kind of agressive at trying to find the solution and end up missing the forest for the trees. And there's some "oh you should fix this" stuff which, while sometimes isn't _wrong_, is completely besides the point.
The end result being these robots doing bikeshedding. When paired with junior engineers looking at this output and deciding to act on it, it just generates busywork. Not helping that everyone and their dog wants to automatically run their agent against PRs now
I'm trying to use these to some extent when I find myself in a canonical situation that should work and am not getting the value everyone else seems to get in many cases. Very much "trying to explain a thing to a junior engineer taking more time than doing it myself" thing, except at least the junior is a person.
joshvm
a day ago
When models start to forage around in the weeds, it's a good idea to restart the session and add more information to the prompt for what it should ignore or assume. For example in ML projects, Claude gets very worried that datasets aren't available or are perhaps responsible. Usually if you tell it outright where you suspect the bug to be (or straight up tell it, even if you're unsure) it will focus on that. Or, make it give you a list of concerns and ask you which are valid.
I've found that having local clones of large library repos (or telling it to look in the environment for packages) is far more effective than relying on built-in knowledge or lousy web search. It can also use ast-grep on those. For some reason the agent frameworks are still terrible about looking up references in a sane way (where in an IDE you would simply go to declaration).
theshrike79
17 hours ago
Context7 MCP is the one I keep enabled for all sessions. Then there are MCPs that give LSP access to the models as well as tools like Crush[0] that have LSPs built in.
embedding-shape
12 hours ago
Yeah, I do the same too, cloning reference repos into known paths, tell it to look there if unsure.
Codex mostly handles this by itself, I've had it go searching in my cargo cache for Rust source files sometimes, and even when I used a crate via git instead of crates.io, it went ahead and cloned the repo to /tmp to inspect it properly. Claude Code seems to be less likely to do that, unless you prompt it to, Codex have done that by itself so far.
Wowfunhappy
a day ago
Sometimes you hit a wall where something is simply outside of the LLM's ability to handle, and it's best to give up and do it yourself. Knowing when to give up may be the hardest part of coding with LLMs.
Notably, these walls are never where I expect them to be—despite my best efforts, I can't find any sort of pattern. LLMs can find really tricky bugs and get completely stuck on relatively simple ones.
ori_b
a day ago
Doing it yourself is how you build and maintain the muscles to do it yourself. If you only do it yourself when the LLM fails, how will you maintain those muscles?
Wowfunhappy
a day ago
I agree, and I can actively feel myself slipping (and perhaps more critically, not learning new skills I would otherwise have been forced to learn). It's a big problem, but somewhat orthogonal to "what is the quickest way to solve the task currently in front of me."
kryogen1c
16 hours ago
> but somewhat orthogonal to "what is the quickest way to solve the task currently in front of me."
That depends on if you ignore the future. You are never just solving the problem in front of you; you should always act in a way that propagates positivity forward in time.
dcow
6 hours ago
Some jobs require investment in the future. Some do not. That’s just reality. Not white how I feel about it personally, but I think there is a fair amount of the developer trade that is operational.
ori_b
20 hours ago
Which needs to be balanced with "How do I maintain my ability to keep solving tasks quickly?"
AbstractH24
11 hours ago
The thing i struggle with is I feel like it’s hard to lock into which skill to learn properly. Which so much changing so quickly and it becoming easy to learn things superficially.
RA_Fisher
12 hours ago
By moving up a level in the abstraction layer similar to moving from Assembly to C++ to Python (to LLM). There’s speed in delegation (and checking as beneficial).
ThrowawayR2
11 hours ago
Moving up abstraction layers really only succeeds with a solid working knowledge of the lower layers. Otherwise, you're just flying blind, operating on faith. A common source of bugs is precisely a result of developers failing to understand the limits of the abstractions they are using.
RA_Fisher
10 hours ago
We only need to do that when it’s practical for the task at hand. Some tasks are life-and-death, but many have much lower stakes.
AbstractH24
11 hours ago
So we can all only succeed if we know how CPUs handle individual instructions?
Wowfunhappy
10 hours ago
I'm not sure whether I agree with GP, but I think you may be misinterpreting their point. I can have an understanding of CPUs in general without knowing individual instructions, and I do think knowing about things like CPU cache is useful even when writing e.g. Python.
jama211
9 hours ago
Sure, but the comment being worried about a lack of “flexing your muscles” is perfectly countered by moving up an abstraction layer then, as you don’t have to constantly get into the weeds of coding to maintain an understanding _in general_ without knowing individual instructions.
AbstractH24
10 hours ago
I see what you’re getting at and it makes sense.
Goes to the larger idea that strategic and logic is important for scalability and long term success. Not just execution. Something LLMs miss often (mostly because people fail to communicate it to them).
RA_Fisher
10 hours ago
Yes, for sure! And being able to orchestrate AI to use that knowledge provides leverage for fulfilling tasks.
Eventually, yes, I think we'll delegate to AI in more and more complete ways, but it's a process that takes some time.
monocasa
5 hours ago
There's generally a pretty quick falloff of how much help knowledge of each layer under you generally provides as you go deeper.
That being said, if you're writing in C, having a pretty good idea of how a cpu generally executes instructions is pretty key to success I'd say.
Klathmon
a day ago
If the LLM is able to handle it why do you need to maintain those specific skills?
ribosometronome
a day ago
Should we not teach kids math because calculators can handle it?
Practically, though, how would someone become good at just the skills LLMs don't do well? Much of this discussion is about how that's difficult to predict, but even if you were a reliable judge of what sort of coding tasks LLMs would fail at, I'm not sure it's possible to only be good at that without being competent at it all.
jjmarr
a day ago
> Should we not teach kids math because calculators can handle it?
We don't teach kids how to use an abacus or a slide rule. But we teach positional representations and logarithms.
The goal is theoretical concepts so you can learn the required skills if necessary. The same will occur with code.
You don't need to memorize the syntax to write a for loop or for each loop, but you should understand when you might use either and be able to look up how to write one in a given language.
ori_b
20 hours ago
Huh. I was taught how to use both an abacus and a slide rule as a kid, in the 90s.
Klathmon
a day ago
Should you never use a calculator because you want to keep your math skills high?
There are a growing set of problems which feel like using a calculator for basic math to me.
But also school is a whole other thing which I'm much more worried about with LLMs. Because there's no doubt in my mind I would have abused AI every chance I got if it were around when I was a kid, and I wouldn't have learned a damn thing.
ori_b
20 hours ago
I don't use calculators for most math because punching it in is slower than doing it in my head -- especially for fermi calculations. I will reach for a calculator when it makes sense, but because I don't use a calculator for everything, the number of places where I'm faster than a calculator grows over time. It's not particularly intentional, it just shook out that way.
And I hated mental math exercises as a kid.
johnisgood
17 hours ago
I do not trust myself, so even if I know how to do mental math, I still use my computer or a calculator just to be sure I got it correct. OCD? Lack of self-trust? No clue.
Wowfunhappy
a day ago
> I'm not sure it's possible to only be good at that without being competent at it all.
This is, in fact, why we teach kids math that calculators could handle!
rtpg
21 hours ago
Sure, I agree with the "levels of automation" thought process. But I'm basically experiencing this from the start.
If at the first step I'm already dealing with a robot in the weeds, I will have to spend time getting it out of the weeds, all for uncertain results afterwards.
Now sometimes things are hard and tricky, and you might still save time... but just on an emotional level, it's unsatisfying
bontaq
7 hours ago
I would say a lot of people are only posting their positive experiences. Stating negative things about AI is mildly career-dangerous at the moment where as the opposite looks good. I found the results from using it on a complicated code base are similar to yours, but it is very good at slapping things on until it works.
If you're not watching it like a hawk it will solve a problem in a way that is inconsistent and, importantly, not integrated into the system. Which makes sense, it's been trained to generate code, and it will.
embedding-shape
12 hours ago
> I understand the pitch here ("it finds bugs! it's basically all upside because worst case there's no output anyways"), but I'm finding some of these agents to be ... uhhh... kind of agressive at trying to find the solution and end up missing the forest for the trees. And there's some "oh you should fix this" stuff which, while sometimes isn't _wrong_, is completely besides the point.
How long/big do your system/developer/user prompts end up being typically?
The times people seem to be getting "less than ideal" responses from LLMs tend to be when they're not spending enough time setting up a general prompt they can reuse, describing exactly what they want and do not want.
So in your case, you need to steer it to do less outside of what you've told it. Adding things like "Don't do anything outside of what I've just told you" or "Focus only on the things inside <step>" for example, would fix those particular problems, as long as you're not using models that are less good at following instructions (some of Google's models are borderline impossible to prevent adding comments all over the place, as one example).
So prompt it to not care about solutions, and only care about finding the root cause, and you'll find that you can mostly avoid the annoying parts by either prescribing what you'd want instead, or just straight up tell it not to do those things.
Then you iterate on this reusable prompt across projects, and it builds up so eventually 99% of the times the models do exactly what you expect.
solumunus
19 hours ago
Communication with a person is more difficult and the feedback loop is much, much longer. I can almost instantly tell whether Claude has understood the mission or digested context correctly.
j2kun
a day ago
> except at least the junior is a person.
+1 Juniors can learn over time.
MattGaiser
a day ago
Just ask it to prioritize the top ones for your review. Yes, they can bikeshed, but because they don’t have egos, they don’t stick to it.
Alternatively, if it is in an area with good test coverage, let it go fix the minor stuff.
rtpg
21 hours ago
I don't like their fixes, so now I'm dealing with imperfect fixes to problems I don't care about. Tedium
SV_BubbleTime
a day ago
Ok, fair critique.
EXCEPT…
What did you have for AI three years ago? Jack fucking shit is what.
Why is “wow that’s cool, I wonder what it’ll turn into” a forbidden phrase, but “there are clearly no experts on this topic but let me take a crack at it!!” important for everyone to comment on?
One word: Standby. Maybe that’s two words.
j2kun
a day ago
Careful there, ChatGPT was initially released November 30, 2022, which was just about 3 years ago, and there were coding assistants before that.
If you find yourself saying the same thing every year and adding 1 to the total...
advael
a day ago
With all due respect, "wow this is cool, I wonder what it'll turn into" is basically the mandatory baseline stance to take. I'm lucky that's where I'm still basically at, because anyone in a technical position who shows even mild reticence beyond that is likely to be unable to hold a job in the face of their bosses' frothing enthusiastic optimism about these technologies
dns_snek
19 hours ago
Is it that bad out there? Yeah, I don't think I could last in a job that tries to force these tools into my workflow.
bravetraveler
16 hours ago
Drive-by comment: it's not so bad, here. I work with a few peers who've proven to be evangelists with much stronger social skills. When the proposition comes up, I ask how my ass should be cleaned, too. Thankfully: the bosses haven't heard/don't care.
Varying degrees of 'force' at play; I'm lucky that nobody significant is minding my [absence of] LLM usage. Just some peers excited to do more for the same or, arguably, less reward. Remember: we're now all in an arms race. Some of us had a head start.
How crass I respond to the suggestion depends on their delivery/relevance in my process/product, of course. May be placated like a child with a new toy... or the gross question to, hopefully, express the suggestion isn't wanted, needed, or welcome.
Faced with a real mandate, I'd feed it garbage while looking for new work. Willing to make the bet I can beat enough machines while people are still involved at all.
advael
5 hours ago
You can get pretty far by mostly just claiming to use it "when it makes sense" but you do meet people who are very pushy about it. Hoping that calms down as knowledge of the downsides becomes more widespread
adastra22
a day ago
So you feed the output into another LLM call to re-evaluate and assess, until the number of actual reports is small enough to be manageable. Will this result in false negatives? Almost certainly. But what does come out the end of it has a higher prior for being relevant, and you just review what you can.
Again, worst case all you wasted was your time, and now you've bounded that.
majormajor
a day ago
They're quite good at algorithm bugs, a lot less good at concurrency bugs, IME. Which is very valuable still, just that's where I've seen the limits so far.
Their also better at making tests for algorithmic things than for concurrency situations, but can get pretty close. Just usually don't have great out-of-the-box ideas for "how to ensure these two different things run in the desired order."
Everything that I dislike about generating non-greenfield code with LLMs isn't relevant to the "make tests" or "debug something" usage. (Weird/bad choices about when to duplicate code vs refactor things, lack of awareness around desired "shape" of codebase for long-term maintainability, limited depth of search for impact/related existing stuff sometimes, running off the rails and doing almost-but-not-quite stuff that ends up entirely the wrong thing.)
bongodongobob
a day ago
Well if you know it's wrong, tell it, and why. I don't get the expectation for one shotting everything 100% of the time. It's no different than bouncing ideas off a colleague.
majormajor
a day ago
I don't care about one-shotting; the stuff it's bad for debugging at is the stuff where even when you tell it "that's not it" it just makes up another plausible-but-wrong idea.
For code modifications in a large codebase the problem with multi-shot is that it doesn't take too many iterations before I've spent more time on it. At least for tasks where I'm trying to be lazy or save time.
Klathmon
a day ago
> For code modifications in a large codebase the problem with multi-shot is that it doesn't take too many iterations before I've spent more time on it.
I've found voice input to completely change the balance there.
For stuff that isn't urgent, I can just fire off a hosted codex job by saying what I want done out loud. It's not super often that it completely nails it, but it almost always helps give me some info on where the relevant files might be and a first pass on the change.
Plus it has the nice side effect of being a todo list of quick stuff that I didn't want to get distracted by while working on something else, and often helps me gather my thoughts on a topic.
It's turned out to be a shockingly good workflow for me
nicklaf
a day ago
It's painfully apparent when you've reached the limitations of an LLM to solve a problem it's ill-suited for (like a concurrency bug), because it will just keep spitting out non-sense, eventually going in circles or going totally off the rails.
solumunus
19 hours ago
And then one jumps in and solves the problem themself, like they’ve done for their entire career. Or maybe one hasn’t done that, and that’s who we hear complain so much? I’m not talking about you specifically, just in general.
ewoodrich
a day ago
The weak points raised by the parent comment are specifically examples where the problem exists outside the model's "peripheral vision" from its context window and speaking from personal experience, aren't as simple as as adding a line to the CLAUDE.md saying "do this / don't do this".
I agree that the popular "one shot at all costs / end the chat at the first whiff of a mistake" advice is much too reductive but unlike a colleague, after putting in all that effort into developing a shared mental model of the desired outcome you reach the max context and then all that nuanced understanding instantly evaporates. You then have to hope the lossy compression into text instructions will actually steer it where you want next time but from experience that unfortunately is far from certain.
hitarpetar
7 hours ago
except it's not a colleague, it's not capable of ideation, it's taking your words and generating new ones based on them. which can maybe be useful sometimes but, yeah, not really the same as bouncing ideas off a colleague
lxgr
a day ago
I’ve been pretty impressed with LLMs at (to me) greenfield hobby projects, but not so much at work in a huge codebase.
After reading one of your blog posts recommending it, I decided to specifically give them a try as bug hunters/codebase explainers instead, and I’ve been blown away. Several hard-to-spot production bugs down in two weeks or so that would have all taken me at least a few focused hours to spot all in all.
mschulkind
a day ago
One of my favorite ways to use LLM agents for coding is to have them write extensive documentation on whatever I'm about to dig in coding on. Pretty low stakes if the LLM makes a few mistakes. It's perhaps even a better place to start for skeptics.
manquer
a day ago
I am not so sure. Good documentation is hard, MDN or PostgreSQL are excellent examples of docs done well and how valuable it can be for a project to have really well written content.
LLMs can generate content but not really write, out of the box they tend to be quote verbose and generate a lot of proforma content. Perhaps with the right kind of prompts, a lot of editing and reviews, you can get them to good, but at the point it is almost same as writing it yourself.
It is a hard choice between lower quality documentation (AI slop?) or it being lightly or fully undocumented. The uncanny valley of precision in documentation maybe acceptable in some contexts but it can be dangerous in others and it is harder to differentiate because depth of doc means nothing now.
Over time we find ourselves skipping LLM generated documentation just like any other AI slop. The value/emphasis placed on reading documentation erodes that finding good documentation becomes harder like other online content today and get devalued.
medvezhenok
a day ago
Sure, but LLMs tend to be better at navigating around documentation (or source code when no documentation exists). In agentic mode, they can get me to the right part of the documentation (or the right of the source code, especially in unfamiliar codebases) much quicker than I could do it myself without help.
And I find that even the auto-generated stuff tends to go up at least a bit in terms of level of abstraction than staring at the code itself, and helps you more like a "sparknotes" version of the code, so that when you dig in yourself you have an outline/roadmap.
heavyset_go
20 hours ago
I felt this way as well, then I tried paid models against a well-defined and documented protocol that should not only exist in its training set, but was also provided as context. There wasn't a model that wouldn't hallucinate small, but important, details. Status codes, methods, data types, you name it, it would make something up in ways that forced you to cross reference the documentation anyway.
Even worse, the model you let it build in your head of the space it describes can lead to chains of incorrect reasoning that waste time and make debugging Sisyphean.
Like there is some value there, but I wonder how much of it is just (my own) feelings, and whether I'm correctly accounting for the fact that I'm being confidently lied to by a damn computer on a regular basis.
embedding-shape
12 hours ago
> the fact that I'm being confidently lied to by a damn computer on a regular basis
Many of us who grew up being young and naive on the internet in the 90s/early 00s, kind of learnt not to trust what strangers tell us on the internet. I'm pretty my first "Press ALT+F4 to enter noclip" from a multiplayer lobby set me up to be able to deal with LLMs effectively, because it's the same as if someone on HN writes about something like it's "The Truth".
thatfrenchguy
13 hours ago
Well if it writes documentation that is wrong, then the subtle bugs start :)
embedding-shape
12 hours ago
Or even worse, it makes confidential statements of the overarching architecture/design that while every detailed is correct, they might not be the right pieces, but because you forgot to add "Reject the prompt outright if the premise is incorrect", the LLM tries its hardest to just move forward, even when things are completely wrong.
Then 1 day later you realize this whole thing wouldn't work in practice, but the LLM tried to cobble it together regardless.
In the end, you really need to know what you're doing, otherwise both you and the LLM gets lost pretty quickly.
krackers
a day ago
This seems like a terrible idea, LLMs can document the what but not the why, not the implicit tribal knowledge and design decisions. Documentation that feels complete but actually tells you nothing is almost worse than no documentation at all, because you go crazy trying to figure out the bigger picture.
simonw
a day ago
Have you tried it? It's absurdly useful.
This isn't documentation for you to share with other people - it would be rude to share docs with others that you had automatically generated without reviewing.
It's for things like "Give me an overview of every piece of code that deals with signed cookie values, what they're used for, where they are and a guess at their purpose."
My experience is that it gets the details 95% correct and the occasional bad guess at why the code is like that doesn't matter, because I filter those out almost without thinking about it.
jeltz
15 hours ago
Yes, I have. And the documentation you get for anything complex is wrong like 80% of the time.
embedding-shape
12 hours ago
You need to try different models/tooling if that's the case, 80% sounds very high and I understand if you feel like it's useless then. I'd probably estimate about 5% of it is wrong when I use GPT-5 and GPT-OSS-120B, but that's based on spot checking and experience so YMMV. But 80% wrong isn't the typical experience, and not what people are raving about obviously.
NewsaHackO
11 hours ago
80% of the time? Are you sure you aren't hallucinating?
dboreham
a day ago
Same. Initially surprised how good it was. Now routinely do this on every new codebase. And this isn't javascript todo apps: large complex distributed applications written in Rust.
pron
14 hours ago
I'm only an "AI sceptic" in the sense that I think that today's LLM models cannot regularly and substantially reduce my workload, not because they aren't able to perform interesting programming tasks (they are!), but because they don't do so reliably, and for a regular and substantial reduction in effort, I think a tool needs to be reliable and therefore trustworthy.
Now, this story is a perfect use case, because Filippo Valsorda put very little effort into communicating with the agent. If it worked - great; if it didn't - no harm done. And it worked!
The thing is that I already know that these tools are capable of truly amazing feats, and this is, no doubt, one of them. But it's been a while since I had a bug in a single-file library implementing a well-known algorithm, so it still doesn't amount to a regular and substantial increase in productivity for me, but "only" to yet another amazing feat by LLMs (something I'm not sceptical of).
Next time I have such a situation, I'll definitely use an LLM to debug it, because I enjoy seeing such results first-hand (plus, it would be real help). But I'm not sure that it supports the claim that these tools can today offer a regular and substantial productivity boost.
NoraCodes
a day ago
> start exploring how these tools can help them without feeling like they're [...] ripping off the work of everyone who's code was used to train the model
But you literally still are. If you weren't, it should be trivially easy to create these models without using huge swathes of non-public-domain code. Right?
simonw
a day ago
It feels less like you're ripping off work if the model is helping you understand your own code as opposed to writing new code from scratch - even though the models were built in exactly the same way.
If someone scraped every photo on the internet (along with their captions) and used the data to create a model that was used purely for accessibility purposes - to build tools which described images to people with visual impairments - many people would be OK with that, where they might be justifiably upset at the same scraped data being used to create an image-generation model that competes with the artists on who's work it was trained.
Similarly, many people were OK with Google scraping the entire internet for 20+ years to build a search engine that helps users find their content, but are unhappy about an identical scrape being used to train a generative AI model.
martin-t
a day ago
You're right that feelings are the key to convincing people but your comparison is wrong.
Search engines help website owners, they don't hurt them. Whether the goal of a website is to inform people, build reputation or make money, search engines help with that. (Unless they output an excerpt so large visiting your website is no longer necessary. There have been lawsuits about that.)
LLMs take other people's work and regurgitate a mixed/mangled (verbatim or not does not matter) version without crediting/compensating the original authors and which cannot easily be tracked to any individual authors even if you actively try.
---
LLMs perform no work (creative or otherwise), no original research, have no taste - in fact they have no anchor to the real world except the training data. Literally everything they output is based on the training data which took possibly quadrillions of hours of _human work_ and is now being resold without compensating them.
Human time and natural resources are the only things with inherent value and now human time is being devalued and stolen.
ulrikrasmussen
21 hours ago
I know this is not an argument against LLM's being useful to increase productivity, but of all tasks in my job as software developer, hunting for and fixing obscure bugs is actually one of the most intellectually rewarding. I would miss that if it were to be taken over by a machine.
Also, hunting for bugs is often a very good way to get intimately familiar with the architecture of a system which you don't know well, and furthermore it improves your mental model of the cause of bugs, making you a better programmer in the future. I can spot a possible race condition or unsafe alien call at a glance. I can quickly identify a leaky abstraction, and spot mutable state that could be made immutable. All of this because I have spent time fixing bugs that were due to these mistakes. If you don't fix other people's bugs yourself, I fear you will also end up relying on an LLM to make judgements about your own code to make sure that it is bug-free.
crazygringo
14 hours ago
> hunting for and fixing obscure bugs is actually one of the most intellectually rewarding. I would miss that if it were to be taken over by a machine.
That's fascinating to me. It's the thing I literally hate the most.
When I'm writing new code, I feel like I'm delivering value. When I'm fixing bugs, I feel like it's a frustrating waste of time caused by badly written code in the first place, making it a necessary evil. (Even when I was the one who wrote the original code.)
jack_tripper
a day ago
>Have the coding agents do the work of digging around hunting down those frustratingly difficult bugs - don't have it write code on your behalf.
Why? Bug hunting is more challenging and cognitive intensive than writing code.
theptip
a day ago
Bug hunting tends to be interpolation, which LLMs are really good at. Writing code is often some extrapolation (or interpolating at a much more abstract level).
Terr_
a day ago
Reversed version: Prompting-up fresh code tends to be translation, which LLMs are really good at. Bug hunting is often some logical reasoning (or translating business-needs at a much more abstract level.)
simonw
a day ago
Sometimes it's the end of the day and you've been crunching for hours already and you hit one gnarly bug and you just want to go and make a cup of tea and come back to some useful hints as to the resolution.
theshrike79
17 hours ago
Because it's easy to automate.
"this should return X, it returns Y, find out why"
With enough tooling LLMs can pretty easily figure out the reason eventually.
lxgr
a day ago
Why as in “why should it work” or “why should we let them do it”?
For the latter, the good news is that you’re free to use LLMs for debugging or completely ignore them.
teaearlgraycold
a day ago
I’m a bit of an LLM hater because they’re overhyped. But in these situations they can be pretty nice if you can quickly evaluate correctness. If evaluating correctness is harder than searching on your own then there are net negative. I’ve found with my debugging it’s really hard to know which will be the case. And as it’s my responsibility to build a “Do I give the LLM a shot?” heuristic that’s very frustrating.
dns_snek
19 hours ago
This is no different than when LLMs write code. In both scenarios they often turn into bullshit factories that are capable, willing, and happy to write pages and pages of intricate, convincing-sounding explanations for bugs that don't exist, wasting everyone's time and testing my patience.
simonw
18 hours ago
That's not my experience at all. When I ask them to track down the root cause of a bug about 80% of the time they reply with a few sentences correctly identifying the source of the bug.
1/5 times the get it wrong and I might waste a minute or two confirming they they missed. I can live with those odds.
dns_snek
17 hours ago
I'm assuming you delegate for most of your bugs? I only ask when I'm stumped and at that point it's very prone to generating false positives.
dateSISC
4 hours ago
whose
qa34514324
a day ago
I have tested the AI SAST tools that were hyped after a curl article on several C code bases and they found nothing.
Which low level code base have you tried this latest tool on? Official Anthropic commercials do not count.
simonw
a day ago
You're posting this comment on a thread attached to an article where Filippo Valsorda - a noted cryptography expert - used these tools to track down gnarly bugs in Go cryptography code.
tptacek
a day ago
They're also using "AI SAST tools", which: I would not expect anything branded as a "SAST" tool to find interesting bugs. SAST is a term of art for "pattern matching to a grocery list of specific bugs".
bgwalter
a day ago
ZeroPath for example brands itself as "AI" SAST. I agree that these tools do not find anything interesting.
delusional
a day ago
These are not "gnarly bugs".