gtirloni
a day ago
I was using this and superpowers but eventually, Plan mode became enough and I prefer to steer Claude Code myself. These frameworks are great for fire-and-forget tasks, especially when there is some research involved but they burn 10x more tokens, in my experience. I was always hitting the Max plan limits for no discernable benefit in the outcomes I was getting. But this will vary a lot depending on how people prefer to work.
marcus_holmes
17 hours ago
I ended up grafting the brainstorm, design, and implementation planning skills from Superpowers onto a Ralph-based implementation layer that doesn't ask for my input once the implementation plan is complete. I have to run it in a Docker sandbox because of the dangerously set permissions but that is probably a good idea anyway.
It's working, and I'm enjoying how productive it is, but it feels like a step on a journey rather than the actual destination. I'm looking forward to seeing where this journey ends up.
LogicFailsMe
7 hours ago
I find simple Ralph loops with an implementer and a reviewer that repeat until everything passes review and unit tests is 90% of the job.
I would love to do something more sophisticated but it's ironic that when I played both agents in this loop over the past few decades, the loop got faster and faster as computers got faster and faster. Now I'm back to waiting on agentic loops just like I used to wait for compilations on large code bases.
hatmanstack
3 hours ago
Curious what you mean by "played both agents" and "faster and faster"? API calls are API Calls or are you running an open-source model locally?
gavinray
2 hours ago
Rephrasing of the post in case it's clearer:
"I would love to do something more sophisticated, but it's ironic that when I performed both of the duties done nowadays by agents, the development loop got faster and faster as computers got faster and faster."
auggierose
9 hours ago
If it is working, why is it just a step on a journey? What is missing?
jghn
7 hours ago
did you hand modify the superpowers skills or are you managing this some other way?
hatmanstack
3 hours ago
For me, I just created my own prompt pipeline, with a nod towards GANs all of the necessary permissions get surfaced so I don't need to babysit it, and all are relatively simple. No need for Yolo or Dangerously setting Permissions.
jghn
a day ago
I've gone the other way recently, shifting from pure plan mode to superpowers. I was reminded of it due to the announcement of the latest version.
It is perhaps confirmation bias on my part but I've been finding it's doing a better job with similar problems than I was getting with base plan mode. I've been attributing this to its multiple layers of cross checks and self-reviews. Yes, I could do that by hand of course, but I find superpowers is automating what I was already trying to accomplish in this regard.
gtirloni
21 hours ago
Yes, it does help in that way. Maybe I'm still struggling to let go and let AI take the wheel from beginning to end but I enjoy the exploratory part of the whole process (investigating possible solutions, trying theories, doing little spikes, etc, all with CC's assistance). When it's time to actually code, I just let it do its own thing mostly unsupervised. I do spend quite a lot of time on spec writing.
jghn
21 hours ago
That’s part of what I’ve liked about it over plan mode. Again not a scientific measurement but I feel it’s better at interactive brainstorming and researching the big picture with me. And it’s built in multiple checkpoints also give me more space to pivot or course correct.
healsdata
16 hours ago
Just tried GSD and Plan Mode on the same exact task (prompt in an MD file). Plan Mode had a plan and then base implementation in twenty minutes. GSD ran for hours to achieve the same thing.
I reviewed the code from both and the GSD code was definitely written with the rest of the project and possibilities in mind, while the Claude Plan was just enough for the MVP.
I can see both having their pros and cons depending on your workflow and size of the task.
Rapzid
16 hours ago
I use GitHub Copilot and unfortunately there has been a weird regression in the bundled Plan mode. It suddenly, when they added the new plan memory, started getting both VERY verbose in the plan output and also vague in the details. It's adding a lot of step that are like "design" and "figure out" and railroads you into implementation without asking follow-up questions.
NSPG911
12 hours ago
> VERY verbose in the plan output
Is that an issue? GitHub charges per-request, not per-token, so a verbose output and short output will be the same cost
What model are you using?
jounker
3 hours ago
The problem might be that our brains charge per token, which makes reviewing hard. :)
whalesalad
15 hours ago
I find that even with opus 4.6, copilot feels like it’s handicapped. I’m not sure if it’s related to memory or what but if I give two tasks to opus4.6 one in CC and one in Copilot, CC is substantially better.
I’ve been really enjoying Codex CLI recently though. It seems to do just as well as Opus 4.6, but using the standard GPT 5.4
gtirloni
4 hours ago
I think this shows that the model alone isn't the complete story and that these "harnesses" (as people seem to be calling them) shape a lot of the experienced behavior of these tools.
Atotalnoob
6 hours ago
Copilot feels like being a caveman, Claude code feels like modern times comparatively.
nfg
13 hours ago
As a matter of interest are you using the copilot cli?
whalesalad
6 hours ago
yeah. copilot cli using opus 4.6 vs claude code using opus 4.6
chaostheory
8 hours ago
I have the same experience with Antigravity and Gemini CLI, both using Gemini 3 Pro. CLI works on the problem with more effort and time. Meanwhile, antigravity writes shitty python scripts for a few seconds and calls it a day. The agent harness matters a lot
sigbottle
5 hours ago
Yup yup yup. I burned literally a weeks worth of the 20$ claude subscription and then 20$ worth of API credits on gsdv2. To get like 500 LOC.
And that was AFTER literally burning a weeks worth of codex and Claude 20$ plans and 50$ API credits and getting completely bumfucked - AI was faking out tests etc.
I had better experiences just guiding the thing myself. It definitely was not a set and forget experience (6 hours of constant monitoring) but I was able to get a full research MVP that informed the next iteration with only 75% of a codex weekly plan.
FromTheFirstIn
4 hours ago
You spent $25 on 500 LOC?
sigbottle
2 hours ago
Well, there were milestones and docs and extra scaffolding that the gsd system produces, but yes. and it didn't seem like progress was going to go any faster.
SayThatSh
18 hours ago
I've played around a bit with the plugins and as you've said, plan mode really handles things fine for the most part. I've got various workflows I run through in Claude and I've found having CC create custom skills/agents created for them gets me 80% of the way there. It's also nice that letting the Claude file refer to them rather than trying to define entire workflows within it goes a long way. It'll still forget things here and there, leading to wasted tokens as it realizes it's being dumb and corrects itself, but nothing too crazy. At least, it's more than enough to let me continue using it naturally rather than memorizing a million slash commands to manually evoke.
abhisek
18 hours ago
I have been using superpowers for Gryph development for a while. Love the brainstorming and exploration that it brings in. Haven’t really compared token usage but something in my bucket.
hatmanstack
21 hours ago
Why are we using cli wrappers if you're using Claude Code? I get if you need something like Codex but they released sub agents today so maybe not even that, but it's an unnecessary wrapper for Claude Code.
roncesvalles
17 hours ago
So that you can have a fresh context for every little thing. These harnesses basically marry LLMs with deterministic software logic. The harness programmatically generates the prompts and stores the output, step by step.
You never want the LLM to do anything that deterministic software does better, because it inflates the context and is not guaranteed to be done accurately. This includes things like tracking progress, figuring out dependency ordering, etc.
odie5533
19 hours ago
Wrappers are useful for some tasks. I use ralph loops for things that are extremely complicated and take days of work. Like reverse engineering projects or large scale migration efforts.
hatmanstack
18 hours ago
Even with the 1 mil context windows? Can't you just keep the orchestrator going and run sub agents? Maybe the added space is too new? I also haven't tested out the context rot from 300K and up. Would love some color on it from first hand exp.
odie5533
18 hours ago
It's not a context issue so much as a focus issue. The agent will complete part of a task and then ask if I want it to continue. Even if I told it I want it to keep going until all tasks are complete. Using a wrapper deals with that behavior.
Most projects I do take 20 minutes or less for an agent to complete and those don't need a wrapper. But for longer tasks, like hours or days, it gets distracted.
mrhaugan
6 hours ago
Damn, what kind of tasks are you making your agents work on that takes days???
gtirloni
21 hours ago
GSD and superpowers aren't CLI wrappers?
hatmanstack
21 hours ago
It's a cli wrapper. Don't know how you could say it wasn't.
edit: GSD is a cli wrapper, Superpowers not so much. Both are over-engineered for an easy problem IMHO.
ramoz
20 hours ago
Both are dramatically over-engineered. & That's okay. I find them to be products of an industry reconciling how to really work with AI as well as optimize workflows around it. Similar to Gastown et al.
Otherwise, if you can own your own thinking, orchestrating, and steering of agents, you're in a more mature place.
mycall
19 hours ago
I also see it as fleeting as right when you have it figured out, a new model will work differently and may/may not need all their engineering layers.
hatmanstack
20 hours ago
I think that's fair, if they were created today I'm sure the creators would make different decisions, a penalty of getting there first.
hermanzegerman
20 hours ago
No it's not. It's using Skills and Agents and runs always inside of Claude Code, Gemini CLI etc...
swingboy
17 hours ago
GSD delegates a lot of the deterministic work to a JavaScript CLI. That might be what the poster is talking about.
gtirloni
4 hours ago
That's definitely not a CLI wrapper. But people are calling Claude Code (clearly a TUI) a CLI so :shrug:
GSD is a collection of skills, commands, MCPs(?), helper scripts, etc that you use inside Claude Code (and others). If anything, Claude Code is the wrapper around those things and not the other way around.
Re: helper scripts. Anyone doing extensive work in any AI-assisted platform has experienced the situation where the agent wants to update 10k files individually and it takes ages. CC is often smart enought to code a quick Python script for those changes and the GSD helper scripts help in the same way. It's just trying to save tokens. Hardly a wrapper around Claude Code.
whalesalad
15 hours ago
Same experience. Superpowers are a little too overzealous at times. For coding especially I don’t like seeing a comprehensive design spec written (good) and then turning that into effectively the same doc but macro expanded to become a complete implementation with the literal code for the entire thing in a second doc (bad). Even for trivial changes I’d end up with a good and succinct -design.md, then an -implementation.md, then end with a swarm of sub agents getting into races while more or less just grabbing a block from the implementation file and writing it.
A mess. I still enjoy superpowers brainstorming but will pull the chute towards the end and then deliver myself.
gtirloni
4 hours ago
Yes. I sometimes had to specifically ask it to NOT add any code to the specs because that would be done at a later stage.
andai
18 hours ago
What's happening with the other 90%?
locknitpicker
13 hours ago
> I was using this and superpowers but eventually, Plan mode became enough and I prefer to steer Claude Code myself.
Plan mode is great, but to me that's just prompting your LLM agent of choice to generate an ad-hoc, imprecise, and incomplete spec.
The downside of specs is that they can consume a lot of context window with things that are not needed for the task. When that is a concern, passing the spec to plan mode tends to mitigate the issue.