tunesmith
6 hours ago
I feel like I must have plateued and don't know what to do next to level up. I'm currently on the $100/month codex plan and it seems fine using 5.5-xhigh all the time. I think of what to do next, have a chat session to determine exactly what to ask for up to the point of being ready to implement, and then codex churns on a commit-sized task whereupon I briefly check it on my local dev server. If necessary I ask for a change. Then I ask it to commit and recommend the next step based off the spec. Oftentimes I have to "approve" an out-of-sandbox request anyway.
I haven't found anything that requires running all night. I could tell it to one-shot a big plan but given how often I realize I want an intermediary thing to be slightly different it seems like a waste of effort.
I'm guessing the next thing I should probably look into is some sort of machine vm I can tunnel my codex-gui requests to so I don't have to deal with the sandbox approvals (I don't want to give it "dangerous" access to my entire mac).
I don't understand what people are doing with their side projects that is leading them to churn through tokens so quickly, to the point of requiring two $200/month subscriptions and a bunch of token charges besides.
vitally3643
4 hours ago
That's because you're treating the problem as an engineer instead of an "influencer" or "10xer" or whatever. You're treating it as a problem to be solved with engineering and AI is merely a tool to do so. It is, in my experience, vanishingly rare for an engineer to have a problem that needs to be solved with multiple hours of unattended AI code generation.
I've only found one single application where it makes even the slightest amount of sense to have an AI grind away for hours on end. I'm reverse engineering a widget which contains five separate firmware images. I've dumped the binary from the widget and I set the AI to decompile and reverse engineer these interrelated firmware projects. It's a compelx task, but very well bounded. It's not complicated work, but it's a lot of work, and the end result is a C-shaped pile of text that is only informative, it never would be compilable on its own even if I did it by hand. The quality of the output is tightly bounded by the input assembly and the overall output artifact is documentation in the shape of code.
I don't have any qualms about letting an AI go ham on it unattended because the stakes are zero. But if the AI can beat the assembly into a recognizable C project, it's much easier for me to read and reason about. Easy win, I think.
rbalicki
4 hours ago
I'll add another use case for letting an AI go ham: many small, atomic refactors where the name of the game is never breaking anything.
My personal OSS projects don't have the scale to necessarily make this worth it, but at work I run three pipelines using Barnum (https://barnum-circus.github.io/). First, one that ingests files, identifies refactors (from a pre-approved list), and places a precise description of the refactor to be done in a queue; second, one that reads from said queue, implements and creates PRs (there is a lot of "check that the PR is correct" here as well); and a third that babysits PRs until they land. I've landed hundreds of PRs in this way, with very little effort on my part.
frizlab
4 hours ago
I recently in $COMPANY had a coworker try fable to do a refactor where not breaking anything was the game.
It broke something at the first PR.
I think we’re not there yet.
sunrunner
3 hours ago
I've found that adding "Make no mistakes." to my prompt usually helps with this kind of problem...
cubano
2 hours ago
perhaps simply threatening to fire it would also do the trick...it sure has worked well on us for a long time now.
A_D_E_P_T
7 minutes ago
You laugh, but this is real, and PUA means what you think it means: https://github.com/tanweai/pua
Also, it works amazingly well, which is just lol.
dozerly
3 hours ago
We are so many layers deep in AI hype that I honestly can’t tell if this is /s or not
12_throw_away
an hour ago
"Make no mistakes" is I thought a phrase used to make fun of "prompt engineering," not something people really do?
efavdb
20 minutes ago
Pleading has worked for me. “My job depends on this, please help me” and ChatGPT would do a task it previously claimed it wasn’t able to (extract text from an image, it claimed it couldn’t make it out at first)
ynxshiny
23 minutes ago
"Claude make me 1 million by tomorrow, no mistakes"
lemming
2 hours ago
Or if the code is really important, sometimes even “please make no mistakes” is necessary.
albertgoeswoof
5 hours ago
I’ve watched a bunch of layman videos where they create stuff with AI, these people burning through 12 hour tasks are literally not reading the output or understanding what it’s doing. Like they’ll ask for a program, and then right after it’s been created they ask the AI how to run it. Then when there’s a bug, they ask the AI what went wrong, or scrap the entire thing and switch model/harness and try again.
Here’s an example https://m.youtube.com/watch?v=xc1296HY8Fw&ra=m
It’s completely different to a professional workflow (what you described). It’s a toy for consumers
MrGilbert
4 hours ago
Amazingly, there are people out there (apart from creators), that work that way in their day-to-day job. I had the pleasure to work with such a person. After several months, he got removed from the position. He left a mess that hasn't been cleaned up completely to this point.
albertgoeswoof
4 hours ago
It won’t be long till employers get wise to this stuff, they just need to burned a couple of times.
It seems AI is good, great even at many things. But it doesn’t seem like it’s going to change the world as much as some people believe it will. And if it does it’s going to take time
galaxyLogic
4 minutes ago
It's more power to power-users. And more dumbness for dumbos
fishfasell
4 hours ago
Yeesh that sounds painful. There's definitely a fine line between vibe coding as a professional engineer and vibe coding as an outsider.
kapperchino
9 minutes ago
On the topic of access control, I’m building a coding agent with no shell access, currently only supports rust though. https://github.com/Kapperchino/agent-joe
calgoo
5 hours ago
I have downgraded my Claude to the $20 one, and basically only use it for the web chat right now. For coding, I use DeepSeek @API Rates configured in Claude Code. I have spent around $4.8 for 320,000,000 tokens. I always felt like i was not using Claude plan, that i had to have the LLM working on something all the time to justify the price. Now with DeepSeek i don't think about it anymore. I don't feel bad when not using the subscription anymore, and i don't worry about limits as i just pay more. Where i really felt this was on running things in parallel as there are no hourly limits anymore!
rjh29
3 hours ago
Gemini changed their rate limits recently and I find the free plan is sufficient for any 'hard' problems that DeepSeek might have trouble with. The combination of the two has reduced my AI spend to $5/month. I agree that it's nice not to have to worry about maxing out your subscription - I'm not doing personal projects 24/7.
wrs
4 hours ago
>I think of what to do next
As everyone trying to do real work is finding, that's the actual bottleneck. If the system is keeping up with your thinking, you're doing fine. You can't "level up" your thinking by paying for more tokens. The people doing more automatic stuff are probably outpacing their own thinking, and that will bite them eventually.
gaflo
an hour ago
Can I ask what exactly you are building? Your experience tracks for me when building a real product -- something I want other people to use. Most of my time on these projects is spent talking to my users and carefully refining my requirements and design.
For personal pet projects I can definitely see how you can blow through your token budget very quickly. If I just point my coding agent to iteratively come up with some heuristics for some NP-hard problem, it will read intermediary outputs and constantly make small changes "in the dark" until it either finds a small improvement or gives up. In a similar vein I found that you can burn many many tokens if you try to let the agent reverse engineer something where you don't have the source code. If you just give it a binary or some interface to work with and a vague task you can easily burn your entire budget with 1 prompt.
I wouldn't want anyone to use these fully vibe coded toy projects though; it is more of an exploratory curiosity for me where I learn more about some problems I'm interested in as well as gauge how good the agents are at tasks that I seem to have a much better intuition on how to approach.
wincy
5 hours ago
I’m using $200 a month Codex working on a game for my kids for fun and curiosity since I’m a dev, I’ve played games, but I’ve never done dev for games. and have all night tasks but mostly they’re “spend time tending to and adding stuff to my 3D asset pipeline”. My RTX 5090 runs Trellis2 -> ultrashapes -> Trellis2 -> wiring up rigging and setting up animations.
But like 99% of that task is just Codex waiting for the output. So it’ll run for 12 hours but mostly it’s just setting lots of sleeps. I haven’t gotten close to running out of tokens. The $100 a month codex I hit usage limitations almost immediately, about 3 days in of working like crazy with 10 agents going at once, mostly coding an asset pipeline, I ran into my weekly limit and upgraded. So with the $200 a month plan at 4x more credits I haven’t hit any walls at all and can absolutely cook.
59nadir
3 hours ago
This sounds like you're overcomplicating things a lot and like you're very unlikely to be learning anything useful, I would suggest making something simple yourself to get a handle on what making the different parts of a game actually means in practice.
Knowing LLMs and their output I would also bet that you're getting nonsense output that sucks.
bthornbury
34 minutes ago
promote yourself to PM only and use agents for authoring, verification, tests, checking the tests
orchestrator -> parallel subagents with investigation, authoring, verification, benchmarking subagents and integration / final verification handled by parent has improved my productivity too.
I feel like from here its agent swarms against a whole spec but haven't got there yet.
Still getting plenty of bugs in the more complex scenarios, but mostly (in some projects) i never have to look at the code and treat it like a black box
dnautics
6 hours ago
I have been on $100/mo claude and it has been churning out quite good software for months now. like i estimate what would have taken me three ish years, assuming i didn't burn out from failure (i would have). i only hit limits when i double fisted claude with my main project and my side project. just the other day i noticed i had been stuck on 4.5 because i failed to update the npm package.
PeterStuer
6 hours ago
I'm on $100 Claude. I have a setup with bespoke local services that mitigates some high token consumption scenarios with local LAN services. I screen mcp's and hooks for cache poisoning. I run 100% on Opus with max effort, and never came close to hitting 5 hour or weekly limits before the Fable release. I am in Claude Code at least 20hrs a week.
I see people just completely wasting tokens with ridiculous setups, 100% hitting cache misses as well as dumping huge files into context all the time.
Just learn how these things work, or pay the price I guess.
seviu
3 hours ago
I usually hit the limit when I am frustrated and I don’t want to understand what the problem is.
I am an engineer, and when I understand what’s going on, I never hit any limit.
sheremetyev
6 hours ago
> I don't want to give it "dangerous" access to my entire mac
I'm running Claude/Codex inside native macOS sandbox, configured with a simple script - https://github.com/sheremetyev/sandfence
always in "bypass permissions" mode - it works until task is solved, sometime 1 hour or more (which includes running tests etc)
contingencies
6 hours ago
recommend converting to https://github.com/apple/container
sheremetyev
5 hours ago
Linux VM doesn't run native macOS toolchain and requires copying files back and forth
contingencies
3 hours ago
I am skeptical there are many real use cases that require native macOS not arbitrary unix. For files, use a readonly mount https://github.com/apple/container/blob/main/docs/how-to.md#... (ie. /path:ro)
aerhardt
4 hours ago
Well, if you believe the people who sell the tokens, you should be creating loops that keep yanking the bandit’s arm.
rsanek
2 hours ago
While it's a little unstable, I've found Docker's sbx to be a great sandbox to run agents with --dangerously-skip-permissions
tchock23
5 hours ago
Same boat here. I’m able to get a lot done on CC at $100/mo and feel like I’m not being creative or productive enough somehow when I hear of people blowing past that in a day.
hedgehog
4 hours ago
Patches to existing sizable codebases and reverse engineering binaries both can run a long time and use a lot of tokens without wandering off into the weeds.
greyb
4 hours ago
Claude allows you to reverse engineer binaries now? That's pretty cool. I'm quite surprised to hear that, I thought it was one of their guardrails. Most of the reverse engineering projects I've seen seem to rely on Chinese models.
dyauspitr
3 hours ago
I usually say run the full regression suite, all the simulator tests, install simulators and take a screenshot of every page on all applicable devices and do comprehensive fuzzing and chaos testing before I go to bed. It usually takes atleast 3-4 hours, usually longer, especially the UI/simulator tests.
apsurd
3 hours ago
I just recently learned about hooks[1] from another HN comment. Conceptually, running CI doesn't have to impose an Agentic tax right?
In other words, isn't there a way to orchestrate this NOT as a long running token maxxing setup given that triggers and CI runs can be run deterministically.
disclaimer: I haven't done this, just interested.
coldtea
3 hours ago
>I feel like I must have plateued and don't know what to do next to level up.
Why do you need to "level up"? To have it shit out slop faster?
Just use it rationally for what you need to do.