pshirshov
4 days ago
Essentially, I've tried to throw a task which, I thought, Claude won't handle. It did with minimal supervision. Some things had to be done in "adversarial" mode where Claude coded and Codex criticized/reviewed, but it is what it is. An LLM was able to implement generics and many other language features with very little supervision in less than a day o_O.
I've been thrilled to see it using GDB with inhuman speed and efficiency.
yosefk
17 hours ago
I am very impressed with the kind of things people pull out of Claude's жопа but can't see such opportunities in my own work. Is success mostly the result of it being able to test its output reliably, and of how easy it is to set up the environment for this testing?
pshirshov
17 hours ago
> Is success mostly the result of it being able to test its output reliably, and of how easy it is to set up the environment for this testing?
I won't say so. From my experience the key to success is the ability to split big tasks into smaller ones and help the model with solutions when it's stuck.
Reproducible environments (Nix) help a lot, yes, same for sound testing strategies. But the ability to plan is the key.
orbifold
17 hours ago
One other thing I've observed is that Claude fares much better in a well engineered pre-existing codebase. It adopts to most of the style and has plenty of "positive" examples to follow. It also benefits from the existing test infrastructure. It will still tend to go in infinite loops or introduce bugs and then oscillate between them, but I've found it to be scarily efficient at implement medium sized features in complicated codebases.
pshirshov
17 hours ago
Yes, that too, but this particular project was an ancient C++ codebase with extremely tight coupling, manual memory management and very little abstraction.
tekacs
14 hours ago
жопа -> jopa (zhopa) for those who don't spot the joke
UncleEntity
16 hours ago
Claude will also tend to go for the "test-passing" development style where it gets super fixated on making the tests pass with no regards to how the features will work with whatever is intended to be built later.
I had to throw away a couple days worth of work because the code it built to pass the tests wasn't able to do the actual thing it was designed for and the only workaround was to go back and build it correctly while, ironically, still keeping the same tests.
You kind of have to keep it on a short leash but it'll get there in the end... hopefully.
UncleOxidant
15 hours ago
> Some things had to be done in "adversarial" mode where Claude coded and Codex criticized/reviewed
How does one set up this kind of adversarial mode? What tools would you need to use? I generally use Cline or KiloCode - is this possible with those?
KronisLV
8 hours ago
You can either use the orchestrator mode and tell it that it must run a subtask that reviews changes after every successful sub-task is done (works in RooCode, I’m guessing KiloCode should also have the feature).
Or you can just switch the models in a regular conversation and tell one to review everything up until now, optionally telling it to get a git diff of all the unstaged changes.
pshirshov
15 hours ago
My own (very dirty) tool, there are some public ones, probably I'll try to migrate to one of the more mature tools later. Example: https://github.com/ruvnet/claude-flow
> is this possible with those?
You can always write to stdin/read from stdout even if there is no SDK available I guess. Or create your own agent on top of an LLM provider.
1024bees
17 hours ago
how did you get gdb working with Claude? There are a few mcp servers that looks fine, curious what you used
pshirshov
16 hours ago
Well, just told it to use gdb when necessary, MCP wasn't required at all! Also it helps to tell it to integrate cpptrace and always look at the stacks.
formerly_proven
16 hours ago
MCP is more or less obsolete for code generation since agents can just run CLI tools directly.