mmaunder
18 hours ago
We were heavy users of Claude Code ($70K+ spend per year) and have almost completely switched to codex CLI. I'm doing massive lifts with it on software that would never before have been feasible for me personally, or any team I've ever run. I'll use Claude Code maybe once every two weeks as a second set of eyes to inspect code and document a bug, with mixed success. But my experience has been that initially Claude Code was amazing and a "just take my frikkin money" product. Then Codex overtook CC and is much better at longer runs on hard problems. I've seen Claude Code literally just give up on a hard problem and tell me to buy something off the shelf. Whereas Codex's ability to profoundly increase the capabilities of a software org is a secret that's slowly getting out.
I don't have any relationship with any AI company, and honestly I was rooting for Anthropic, but Codex CLI is just way way better.
Also Codex CLI is cheaper than Claude Code.
I think Anthropic are going to have to somehow leapfrog OpenAI to regain the position they were in around June of this year. But right now they're being handed their hat.
latexr
4 hours ago
Feels like with every announcement there’s the same comment: “this LLM tool I’m using now is the real deal, the thing I was using previously and spending stupid amounts of money on looked good but failed at XYZ, this new thing is where it’s at”. Rinse and repeat.
Which means it wasn’t true any of the previous times, so why would it be true this time? It feels like an endless loop of the “friendship ended” meme with AI companies.
https://knowyourmeme.com/editorials/guides/what-is-the-frien...
It’s much more likely commenters are still in the honeymoon hype phase and (again) haven’t found the problems because they’re hyper focused on what the new thing is good at that the previous one wasn’t, ignoring the other flaws. I see that a lot with human relationships as well, where people latch on to new partners because they obviously don’t have the big problem that was a strain on the previous relationship. But eventually something else arises. Rinse and repeat.
igneo676
an hour ago
People just aren't used to how LLMs and their tools are developed
Depending on the time of the year you can expect fresh updates from any given company on how their new models and tools perform and they'll generally blow the competition out of the water.
The trick is to either realize that your current tools will just become magically better in a few months OR lean in and switch companies as their tools and models update.
iammrpayments
2 hours ago
It could also be covert advertising like you see in reddit
jimmydoe
35 minutes ago
trust your instinct. internet is dead.
raducu
3 hours ago
> “this LLM tool I’m using now is the real deal".
GPT-5 is not the final deal, but it's incredibly good as is at coding.
Anecdotal, but it's something completely else in terms of capabilities, ignore it at your own peril, but I think it will profoundly change software development.
latexr
2 hours ago
> ignore it at your own peril
I’m not arguing for ignoring it, my point is different.
> but I think it will profoundly change software development.
The point is that this is said every time, together with “the previous thing to which the exact same praise was given, wasn’t it”. So it’s several rounds of “yes yes, the previous time the criticisms were right, but this time it’s different, trust me”. So everyone else is justified in being skeptical.
No one wants AI to have the problems it has (technical, ethical, and others). If they didn’t it would be better for everyone. Criticism is a way of surfacing issues so they can be fixed.
And sure, I’ll grant that some people want to bash the other side more than they want to arrive at the truth, but those exist in all sides of the argument (probably in roughly equal measure?). So to have a productive conversation we need to go in with the mindset of “we’re on the same side in the goal of not having this suck”.
blizdiddy
32 minutes ago
So what is your thesis? The tools keep getting better, so that’s some kind of gotcha that the emporer has no clothes? Some people prefer the absolute latest and greatest so people on the previous gen were all fakers making Pelican svgs?
Maybe the productive thing is actually to ignore naysayers and goalpost movers and use the tools.
You aren’t enlightened for not liking a tool. “Oh, hammers? Absolutely a bubble, after all they never fixed the hit-your-thumb issue i blogged about, and nail guns just let you hurt your thumbs faster”
tim333
an hour ago
I think it's just a function of the models getting better. One is the best and then next month another overtakes it as so on.
zsoltkacsandi
an hour ago
From my experience this mostly happened to Antrophic models, and not because of some honeymoon period, but after the introduction of their models, the model quality and the limits are starting to decline.
Many people are complaning about this on HN and Reddit. I do not have any proof, but there is a pattern, I suppose Antrophic first attracts customers, then starts to optimize costs/margins.
latexr
an hour ago
> Many people are complaning about this on HN and Reddit. I do not have any proof, but there is a pattern, I suppose Antrophic first attracts customers, then starts to optimize costs/margins.
jswny
16 hours ago
I find Codex CLI to be very good too, but it’s missing tons of features that I use in Claude Code daily that keep me from switching full time.
- Good bash command permission system
- Rollbacks coupled with conversation and code
- Easy switching between approval modes (Claude had a keybind that makes this easy)
- Ability to send messages while it’s working (Codex just queues them up for after it’s done, Claude injects them into the current task)
- Codex is very frustrating when I have to keep allowing it to run the same commands over and over, Claude this works well when I approve it to run a command for the session
- Agents (these are very useful for controlling context)
- A real plan mode (crucial)
- Skills (these are basically just lazy loaded context and are amazing)
- The sandboxing in codex is so confusing, commands fail all the time because they try to log to some system directory or use internet access which is blocked by default and hard to figure out
- Codex prefers python snippets to bash commands which is very hard to permission and audit
When Codex gets to feature parity, I’ll seriously look at switching, but until then it’s just a really good model wrapped in an okay harness
libraryofbabel
15 hours ago
I don't think anyone can reasonably argue against Claude Code being the most full-featured and pleasant to use of the CLI coding agent tools. Maybe some people like the Codex user experience for idiosyncratic reasons, but it (like Gemini CLI) still feels to me rather thrown together - a Claude Clone with a lot of rough edges.
But these CLI tools are still fairly thin wrappers around an LLM. Remember: they're "just an LLM in a while loop with access to tool calls." (I exaggerate, and I love Claude Code's more advanced features like "skills" as much as anyone, but at the core, that's what they are.) The real issue at stake is what is the better LLM behind the agent: is GPT-5 or Sonnet 4.5 better at coding. On that I think opinion is split.
Incidentally, you can run Claude Code with GPT-5 if you want a fair(er) comparison. You need a proxy like LiteLLM and you will have to use the OpenAI api and pay per-token, but it's not hard to do and quite interesting. I haven't used it enough to make a good comparison, however.
ants_everywhere
13 hours ago
> but it (like Gemini CLI) still feels to me rather thrown together - a Claude Clone with a lot of rough edges.
I think this is because they see it as a checkbox whereas Anthropic sees it as a primary feature. OpenAI and Google just have to invest enough to kill Anthropic off and then decide what their own vision of coding agents looks like.
paulddraper
9 hours ago
You can run the Claude code router and choose the model you want (including based on dynamic conditions)
jessmartin
4 hours ago
Can you say more? Link?
fragmede
9 hours ago
Thick or thin, the wrapper so that users aren't manually copy and pasting code around is material to it being used and useful. Plus the systems prompt is custom to each tool and greatly affect how well the tool works.
Palmik
7 hours ago
I am not sure copying your competitors feature-by-feature is always a good strategy. It can make the onboarding of your competitor's users easier, but lead to a worse product overall.
This is especially the case in a fast moving field such as this. You would not want to get stuck in the same local minimum as your competitor.
I would rather we have competing products that try different things to arrive at a better solution overall.
jacurtis
14 hours ago
Yeah I think the argument is the tooling vs agent. Maybe the OpenAI agent is performing better now, but the tooling is significantly better from anthropic.
The anthropic (ClaudeCode) tooling is best-in-class to me. You listed many features that I have become so reliant on now, that I consider them the Ante that other competitors need to even be considered.
I have been very impressed with the Anthropic agent for code generation and review. I have found the OpenAI agent to be significantly lacking by comparison. But to be fair, the last time I used OpenAI's agent for code was about a month ago, so maybe it has improved recently (not at all unreasonable in this space). But at least a month ago when using them side-by-side the codex CLI was VERY basic compared to the wealth of features and UI in the ClaudeCode CLI. The agents for Claude were also so much better than OpenAI, that it wasn't even close. OpenAI has always delivered me improper code (non-working or invalid) at a very high rate, whereas Claude is generally valid code, the debate is just whether it is the desired way to build something.
stared
2 hours ago
Claude Code has a lot or UX polish: https://newsletter.pragmaticengineer.com/p/how-claude-code-i...
Footprint0521
12 hours ago
I agree!! But this repo
https://github.com/just-every/code
Fixed all of these in a heartbeat. This has been a game changer
ryuuseijin
8 hours ago
I'm using opencode which I think is now very close to covering all the functionality of claude code. You can use GPT5 Codex with it along with most other models.
jatora
11 hours ago
to fix having to approve commands over and over - use windows WSL. codex does not play nice with permissions/approvals on windows. WSL solves that completely
virtualritz
39 minutes ago
When you say Claude Code, what model do you refer to? CC with Opus still outperforms Codex (gpt-5-codex) for me for anything I do (Rust, computer graphics-related).
However, Anthropic restricted Opus use for Max plan users 10 days or so ago severly (12-fold from 40h/week down to 5h week) [1].
Sonnet is a vastly inferioir model for my use cases (but still frequently writes better Rust code than Codex). So now I use Codex for planning and Sonnet for writing the code. However, I usually need about 3--5 loops with Codex reviewing, Sonnet fixing, rinse & repeat.
Before I could use one-shot Opus and review myself directly, and do one polish run following my review (also via Opus). That was possible from June--mid October but no more.
deaux
4 minutes ago
Agreed that Opus is stronger than Sonnet 4.5 and GPT-5 High. It's the bitter pill - bigger, more expensive models are just "smarter", even if it doesn't always show in synthetic benchmarks. Similar with o1-pro (now almost a year old, an eternity in this space) vs GPT-5 high. There's also GPT-5 Pro now, which comes at an API cost of $120/M output, and is also noticeably smarter, just like Opus.
They all like to push synthetic benchmarks for marketing, but to me there's zero doubt that both Anthropic and OpenAI are well aware that they're not representative of logical thinking and creativity.
pkreg01
17 hours ago
I totally agree. I remember the June magic as well - almost overnight my abilities and throughput were profoundly increased, I had many weeks of late nights in awe and wonder trying things that were beyond my ability to implement technically but within the bounds of my conceptual understanding.
Initially, I found Codex CLI with GPT-5 to be a substitute for Claude Code - now GPT-5 Codex materially surpasses it in my line of work, with a huge asterisk. I work in a niche industry, and Codex has generally poor domain understanding of many of the critical attributes and concepts. Claude happens to have better background knowledge for my tasks, so I've found that Sonnet 4.5 with Claude Code generally does a better job at scaffolding any given new feature. Then, I call in Codex to implement actual functionality since Codex does not have the "You're absolutely right" and mocked/placeholder implementation issues of CC, and just generally writes clean, maintainable, well-planned code. It's the first time I've ever really felt the whole "it's as good as a senior engineer" hype - I think, in most cases, GPT5-Codex finally is as good as a senior engineer for my specific use case.
I think Codex is a generally better product with better pricing, typically 40-50% cheaper for about the same level of daily usage for me compared to CC. I agree that it will take a genuinely novel and material advancement to dethrone Codex now. I think the next frontier for coding agents is speed. I would use CC over Codex if it was 2x or 3x as fast, even at the same quality level. Otherwise, Codex will remain my workhorse.
thecoppinger
15 hours ago
> trying things that were beyond my ability to implement technically but within the bounds of my conceptual understanding
This is a really neat way of describing the phenomenon I've been experiencing and trying to articulate, cheers!
Arisaka1
5 hours ago
When I was in high school, I would see the algebra teacher work through expressions and go "ohhh, that makes sense". But when I got back home to work with the homework, I couldn't make the pieces fit.
Isn't that the same? Just because you recognize something someone else wrote and makes you go "ohh, I understand it conceptually" doesn't mean that you can apply that concept in a few days or weeks.
So when the person you responded to says:
>almost overnight *my abilities* and throughput were profoundly increased
I'd argue the throughput did but his abilities really weren't, because without the tool in question you're just as good as before the tool. To truly claim that his abilities were profoundly increased, he has to be able to internalize the pattern, recognize the pattern, and successfully reproduce it across variable contexts.
Another example would be claiming that my painting abilities and throughput were profoundly increased, because I used to draw stick figures and now I can draw Yu-Gi-Oh! cards by using the tool. My throughput was really increased, but my abilities as a painter really haven't.
catigula
16 hours ago
>I think, in most cases, GPT5-Codex finally is as good as a senior engineer for my specific use case.
This is beyond bananas to me given that I regularly see codex high and Gpt-5-high both fail to create basic react code slightly off the normal distribution.
evilduck
an hour ago
If you really want to see it fail at something easy, try to have write something that can use JSX but doesn't use React (Bun, Hono, etc). Seems like no amount of context management and detailed instructions will keep it from reaching for React-isms.
hansvm
15 hours ago
That might say something about the understandability of the react framework/paradigm ;)
Quality varies a lot based on what you're doing, how you prompt it, how you orchestrate it, and how you babysit and correct it. I haven't seen anything I'd call senior, but I have seen it, for some classes of tasks, turn this particular engineer into many seniors. I still have to supply all the heavy lifting (here's the concurrency model, how you'll ensure exactly-once-delivery, particular functions and classes you definitely want, a few common pitfalls to avoid, etc), but then it can flesh out the details extremely well.
aaronblohowiak
15 hours ago
It makes me waaayyyy faster but, like you, that’s because I already know what has to be done.
pkreg01
15 hours ago
Do you mind if I ask what kind of React code you're working on? I've had good success using Codex for my frontend development, especially since all of my projects consistently rely on a pretty widely used and well documented component library. I realize that makes my use case fairly narrow, so I don't think I've discovered the limits you have.
catigula
15 hours ago
Normal legacy react enterprise application.
Today I was trying to get it to temporarily shim in for development and consume the value of a redux store via merely putting a default in the reducer. Depending on that value, the application would present different state.
It failed to accomplish this and added a disgusting amount of defensive nonsense code in my saga, reducer and component to ensure the value was there. It took me a very short time to correct it but just watching it completely fail at this task was borderline absurd.
pkreg01
15 hours ago
Thanks for the context! I feel the same way. When it fails it fails hard. This is why I'm extremely skeptical of any of the non-cli cloud solutions - as you observed, I think the failures compound and cascade if you don't stop them early, which requires a compelling interface and the ability to manually intervene very fast.
bad_haircut72
14 hours ago
Im not saying this is a paid endorsement but the internet is dead and I wonder what openAI would pay, if they could, to get such a glowing review as top comment on HN
neya
11 hours ago
For what it's worth, I'm not affiliated with Open AI (you can verify by my comment history [1] and account age) and I agree with the top comment. I do Elixir consulting primarily and nothing beats OpenAI's model at the moment for Elixir. Previously, their O3 models were quite decent. But, GPT-5 is really damn good. Claude code will unnecessarily try to complicate a problem solution.
dns_snek
8 hours ago
This is hilarious because for me Cursor with GPT-5 often generates Elixir that isn't even syntactically correct. It needs to be told not to use return statements, and not to try to index linked lists as arrays. Code is painfully non-idiomatic to the point of being borderline useless even in the simpler cases. Claude Sonnet 4.5 is marginally better, but not by much. Any ambitious overhaul, refactoring or large feature ends in tears and regret.
Neither tool is worth paying even $20 a month for when it comes to Elixir, that's how little value I get out of them, and it's not because I can't afford it.
neya
8 hours ago
Gemini is also good, I recommend you try it as well. Usually my workflow is GPT-5 as the primary, but yes, as you mentioned it is not perfect. But Gemini surprisingly compliments GPT-5 for my use cases atleast. It's good at LiveView related stuff, whereas GPT-5 is more of architecting side.
Both LLMs suck if you let it do everything without architecting the solution first. So, I always instruct the high level architecture of how I want something, specifically around how the data should flow and be consumed and what I really want to avoid. With these constraints and bit of some prompt engineering, they are actually quite good.
dns_snek
4 hours ago
> Both LLMs suck if you let it do everything without architecting the solution first.
I always do that. Last time I spent an hour planning, going through the requirements, having it ask questions, only for it to completely botch the implementation.
Sure, I can treat it like a junior and spend 2-3 hours planning everything down to the individual function level and it's going to implement it alright. The code will work but it won't be idiomatic. Or I can just do it myself in 3 hours total to a much higher standard of quality, without gambling on a successful outcome, while simultaneously improving my own knowledge, understanding, and abilities.
No matter how I try to use them, agentic coding is always a net negative on my productivity (disposable one-off scripts excluded).
johnisgood
7 hours ago
Personally I found Claude to be relatively OK at Elixir. With a lot of hand holding. My main problem when it comes to Elixir and Erlang is many amount of files. For that kind of boilerplate, it is good. Otherwise just use "erlang-skels.el" with Emacs. :D
Palmik
7 hours ago
I'm not saying this was a paid comment, but if we're going to speculate, we could just as easily ask what Anthropic would pay, if they could, to drown out a strongly pro-OpenAI take sitting at the top of their own promotional HN thread.
That said, you're right that the broader internet (Reddit especially) is heavily astroturfed. It's not unusual to see "What's the best X?" threads seeded by marketers, followed by hoard of suspiciously aligned comments.
But without actual evidence, these kind of meta comments like yours (and mine) are just a cynical noise.
vietvu
11 hours ago
I heard this opinion a lot recently. Codex is getting better, and Claude is getting worse so it's must happen sooner or later. Well, it's competition so waiting for Claude to catch up. The web Claude Code is good, but they really need to fix their quota. It's unusable. I would choose a worse model (maybe at 90%), but has better quota and usable. Not to mention GPT-5 and GPT-5-codex seems catch up or even better now.
dbbk
an hour ago
You're absolutely right!
hluska
8 hours ago
Are you really going to call someone a shill? I’d argue that you’re why the internet is dying - a million options and you had to choose the most offensive?
h34t
4 hours ago
to be fair, they spent a lot on compute.
a_victorp
13 hours ago
This is an underrated comment
brigandish
9 hours ago
The only way to tell human from AI now is disagreeableness, it’s the one thing the GPTs refuse to do. I can’t stand their cloying sycophancy but at least it means that serial complainers will gain some trust, at least for as long as Americans are leading the hunt and deciding to baby us.
dr_dshiv
5 hours ago
On the other hand, formulaic disagreement underpins most of modern media; made by humans or not, it ends up as dehumanizing as a train wreck.
visiondude
12 hours ago
I completely agree with this. The amount of unprompted “I used to love Claude Code but now…” content that follows the exact same pattern feels really off. All of these people post without any prompts for comparison, and OP even refused to share specifics so we have to take his claim as ‘trust me bro’
loveparade
12 hours ago
It doesn't feel off to me because that's the exact experience I've had as well. So it's unsurprising to me that many other people share that experience. I'm sure there is a bunch of paid promotion going on for all kinds of stuff on HN (especially what gets onto the front page), but I don't think this is one of those cases.
visiondude
11 hours ago
Oh cool, can you share concrete examples of times codex out performed Claude Code? I’m my experience both tools needs to be carefully massaged with context to fulfill complex task.
loveparade
5 hours ago
I don't really see how examples are useful because you're not going to understand the context. My prompt may be something like "We recently added a new transcription backend api (see recent git commits), integrate it into the service worker. Before implementing, create a detailed plan, ask clarifying questions, and ask for approval before writing code"
Does that help you? I doubt it. But there you go.
typpilol
10 hours ago
In my experience. Claude wants to try and finish everything as quickly as possible where codex is happy to take 5x the length.
The best answer is each has its uses. Using codex to do bulk edits is dumb because it takes forever, etc etc
hluska
8 hours ago
Nobody has to give you examples. People can express opinions. If you disagree, that’s fine but requesting entire prompt and response sets is quite demanding. Who are you to be that demanding?
dns_snek
4 hours ago
> Who are you to be that demanding?
Let's call it the skeptical public? We've been listening to a group of people rave about how revolutionary these tools are, how they're able to perform senior level developer work, how good their code is, and how they're able to work autonomously through the use of sub-agents (i.e. vibe coding), without ever providing evidence that would support any of those grandiose claims.
But then I use these tools myself[1] and I speak to real developers who have used them and our evaluation centers around lukewarm, e.g. good at straightforward, junior level tasks, or good for prototyping, or good for initially generating tests, or good for answering certain types of questions, or good for one-off scripts, but approximately none of them would trust these LLMs to implement a more complex feature like a mid-level or senior developer would without very extensive guidance and hand-holding that takes longer than just doing it ourselves.
Given the overwhelming absence of evidence, the most charitable conclusion I can come to is that the vast majority of people making these claims have simply gone from being 0.2X developers to being 0.3X developers who happen to generate 5X more code per unit of time.
[1] e.g. my reply to https://news.ycombinator.com/item?id=45651948
hattmall
12 hours ago
I'm not saying it is, but if ANYTHING was the exact combination of prerequisites to be considered paid promotion on HN, this is the type of comment it would be.
hluska
8 hours ago
So, let’s see if I get this straight. A highly identifiable person whose company sells a security product is the ideal shill? That doesn’t make any sense whatsoever. On the other hand, someone with a different opinion makes complete sense.
WXLCKNO
18 hours ago
I agree with this and actually Claude Code agrees with it too. I've had Codex cli (gpt-5-codex high) and claude code 4.5 sonnet (and sometimes opus 4.1) do the same lengthier task with the same prompt in cloned folders about 10x now and then I ask them to review the work in the other folder and determine who did the best job.
100% of the time Codex has done a far better job according to both Codex and Claude Code when reviewing. Meeting all the requirements where Claude would leave things out, do them lazily or badly and lose track overall.
Codex high just feels much smarter and more capable than Claude currently and even though it's quite a bit slower, it's work that I don't have to go over again and again to get it to the standards I want.
pkreg01
17 hours ago
I share your observations. It's strange to see Anthropic loosing so much ground so fast - they seemed to be the first to crack long-horizon agentic tasks via what I can only assume is an extremely exotic RL process.
Now, I will concede that for non-coding long-horizon tasks, GPT-5 is marginally worse than Sonnet 4.5 in my own scaffolds. But GPT-5 is cheaper, and Sonnet 4.5 is about 2 months newer. However, for coding in a CLI context, GPT-5-Codex is night-and-day better. I don't know how they did it.
typpilol
10 hours ago
Every since 4.5, I can't get Claude to do anything that takes a while
4.0 would chug a long for 40 mins. 4.5 refuses and straight up says the scope is too big sometimes.
My theory is anthropic is super compute constrained and even though 4.5 is smarter, the usage limits and it's obsession with rushing to finish was put in mainly to save their servers compute.
swah
15 hours ago
I haven't been able to get anything done with Codex. Claude Code is fast and "gets it". Also does better at running and testing its own stuff.
Its very odd because I was hoping they were very on par.
didibus
13 hours ago
Same, I find Codex not good to be honest. I have better success manually copy/pasting into GPT5 chat. There's something about Codex that just wants to change everything and use the weirdest tool commands.
It also often fails to escalate a command, it'll even be like, oh well I'm in a sandbox so I guess I can't do this, and will just not do it and try to find a workaround instead of escalating permission to do the command.
jacurtis
13 hours ago
The last time I used them both side by side was a month ago, so unless its significantly improved in the past month, I am genuinely surprised that someone is making the argument that Codex is competitive with ClaudeCode, let alone it somehow being superior.
ClaudeCode is used by me almost daily, and it continues to blow me away. I don't use Codex often because every time I have used it, the output is next to worthless and generally invalid. Even if it does get me what I eventually want, it will take much more prompting for me to get the functioning result. ClaudeCode on the other hand gets me good code from the initial prompt. I'm continually surprised at exactly how little prompting it requires. I have given it challenges with very vague prompts where it really exceeds my expectations.
clarkmoreno
10 hours ago
OpenAI astroturfing is a real thing. It's all over Twitter. Unsurprising but still wild to see it here on HN.
barneybooroo
4 hours ago
I think the enthusiasm for Codex coincided with the extended period of degraded quality CC was experiencing around a couple of months ago? During that time I cancelled my Claude sub and tried out Codex, which by comparison was feeling significantly better. I haven't tried them out side by side since Claude has been de-borked but even if Codex is objectively poorer I could believe that flattering comparison has stuck for people who switched?
acangiano
14 hours ago
I use both but I agree that they are generally not on par. I find Claude Code does a better job and doesn't overengineer as much. Where sometime Codex does better is in debugging a tough bug that stumps Claude Code. Codex is also more likely to get lazy and claiming to have finished a large task, when it reality it just wrote some placeholder lines. Claude has never done that. They might be on par soon, however, and I think Anthropic is playing a dangerous game with their limit enforcement on people who are on subscriptions.
spoiler
4 hours ago
My experience is that if I know what I want, CC will produce better code, given I specify it correctly. The planning mode is great for this too, as we can "brainstorm" and what I have seen help a lot is if I ask questions about why it did a certain way. Often it'll figure out on its own why that's wrong, but sometimes it requires a bit of course correction.
On the other hand, last time I tried GPT-5 from Cursor, it was so disappointing. It kept getting confused while we were iterating on a plan, and I had to explain to it multiple times that it's thinking about the problem the same way. After a while I gave up, opened a new chat and gave it my own summary of the conversation (with the wrong parts removed) and then it worked fine. Maybe my initial prompt was vague, but it continually seemed to forget course corrections in that chat.
I mostly tend to use them more to save me from typing, rather than asking it to design things. Occasionally we do a more open ended discussion, but those have great variance. It seems to do better with such discussions online than within the coding tool (I've bounced maths/implementation ideas off of while writing shaders on a personal project)
baq
4 hours ago
gpt-5-high is amazing, but so slow I'll revert to sonnet when I know what I need done on a low level.
when making boilerplatish changes in the product in areas I'm not familiar with (it's a large codebase) gpt-5-high is a monster.
cesarvarela
18 hours ago
Can you share an example of the tasks you found Codex being much better? From my experience Claude Code is much better.
mordymoop
18 hours ago
I'm on the same page here. I have seen this sentiment about Codex suddenly being good a few times now, so I booted Codex CLI thinking-high back up after a break and asked it to look for bugs. It promptly found five bugs that didn't actually exist. It was the kind of truly impressively stupid mistake that I haven't seen Claude Code make essentially ever, and made me wonder if this isn't the sort of thing that's making people downplay the power of LLMs for agentic coding.
stavros
13 hours ago
I asked Sonnet 4.5 to find bugs in the code, it found five high-impact bugs that, when I prompted it a second time, it admitted weren't actually bugs. It's definitely not just Codex.
throwaway-0001
13 hours ago
In my case codex fixed a bug in one shot. Took 10 min to debug and find it.
Claude struggled long time and still didn’t find.
intellectronica
17 hours ago
Codex works much better for long-running tasks that require a lot of planning and deep understanding.
Claude, especially 4.5 Sonnet, is a lot nicer to interact with, so it may be a better choice in cases where you are co-working with the agent. Its output is nicer, it "improvises" really well even if you give it only vague prompts. That's valueable for interactive use.
But for delegating complete tasks, Codex is far better. The benchmarks indicate that, as do most practicioners I talk to (and it is indeed my own experience).
In my own work, I use Codex for complete end-to-end tasks, and Claude Sonnet for interactive sessions. They're actually quite different.
incoming1211
16 hours ago
I disagree, Codex always gets stuck and wants to double check and clarify things, its like "dammit just execute the plan and don't tell me until its completely finished"
The output of codex is also not as great. Codex is great at the planning and investigation portion but sucks at execution and code quality.
ewoodrich
15 hours ago
I've been dealing with this on Codex a lot lately. It confidently wraps up a task, I go to check it's work... and it's not even close.
Then I do a double take and re-read the summary message and realize that it pulled a "and then draw the rest of the owl", seemingly arbitrarily picking and choosing what it felt like doing in that session and what it punted over to "next steps to actually get it running".
Claude is more prone to occasional "cheating" with mocked data or "tbd: make this an actual conditional instead of hardcoded If True" stuff when it gets overwhelmed which is annoying and bad. But it at least has strong task adherence for the user's prompt and doesn't make me write a lawyer-esque contract to avoid any loopholes Codex will use to avoid doing work.
aaronblohowiak
14 hours ago
Are you using something like spec-kit?
shmoogy
17 hours ago
Can / Does Codex actually check docker logs and other things for feedback while iterating on something that isnt working ? That is where the true magic of Claude comes for me. Often things cant be one shot, but being able to iteratively check logs, make an adjustment, rebuild the docker containers, send a curl, and confirm fixed is huge improvement.
intellectronica
16 hours ago
Yes, in this regard it's very similar. It works as an agent and does whatever you need it to do to complete the task. In comparison to Claude it tends to plan more and improvise less.
simplify
17 hours ago
Same here. I tried codex a few days ago for a very simple task (remove any references of X within this long text string) and it fumbled it pretty hard. Very strange.
fragmede
16 hours ago
yeah I'm in the same boat. Codex can't do this one task, and constantly forgets what I've told it, and I'm reading these comments saying how is so great to the point that I'm wondering if I'm the one taking the crazy pills. Maybe we're being A/B tested and don't know about it?
hattmall
12 hours ago
No, no one that's super boosting the LLMs ever tells you what they are working on or give any reasonable specifics about how and why it's beneficial. When someone does, it's a fairly narrow scope and typically inline with my experience.
They can save you some time by doing some fairly complex basic tasks that you can write in plain language instead of coding. To get good results you really need a lot of underlying knowledge yourself and essentially, I think of it as a translator. I can write a program in very good detail using normal language and then the LLM can convert it to code with reasonable accuracy.
I haven't been able to depend on it to do anything remotely advanced. They all make up API endpoints or methods or fill in data with things that simply don't exist, but that's the nature of the model.
fragmede
12 hours ago
You misread me. I'm one of the people you're complaining about. Claude code has been great in my experience and no I don't have a GitHub repo of code that's been generated for you to tell me that's trivial and unadvanced and that a child could do it.
What I'm saying was to compare my experience with Claude code vs Codex with GPT-5. CC's better than codex in my experience, contrary to GP's comment.
FuckButtons
11 hours ago
Maybe, just maybe, people are lying on the internet. And maybe those people have a financial interest in doing so.
Palmik
6 hours ago
Curiously, you yourself did not provide an example where, from your experience, Claude Code was much ebtter.
the_duke
17 hours ago
IMO gpt5-codex medium is much better as soon as the task becomes slightly complex, or the context grows a bit.
Sora 4.5 tends to randomly hallucinate odd/inappropriate decisions and goes to make stupid changes that have to be patched up manually.
jacurtis
13 hours ago
Yes Sora hallucinates significantly more than Claude.
I find that Codex generally requires me to remove code to get to what I want, whereas Claude I tend to use what it gives me and I add to it. Whether this is from additional prompting or from manual typing, i just find that codex requires removal to get to desired state, and Claude requires adding to get to desired state. I prefer adding incrementally than removing.
mmaunder
18 hours ago
I can not. We're all racing very hard to take full advantage of these new capabilities before they go mainstream. And to be honest, sharing problem domains that are particularly attractive would be sharing too much. Go forth and experiment. Have fun with it. You'll figure it out pretty fast. You can read my other post here about the kinds of problem spaces I'm looking at.
deadbabe
18 hours ago
Ah, super secret problem domains that have been thoroughly represented in the LLM training data. Nice.
aprilthird2021
an hour ago
Why would you even comment that Codex CLI is potentially worth switching an enormous amount of spend over ($70k) and give literally 0 evidence of why it's better? That's all you've got? "Trust me bro"?
mmaunder
17 hours ago
I'm seeing the downvotes. I'm sorry folks feel that way. I'm regretting my honesty.
Edit: I'd like to reply to this comment in particular but can't in a threaded reply, so will do that here: "Ah, super secret problem domains that have been thoroughly represented in the LLM training data. Nice."
This exhibits a fundamental misunderstanding of why coding agents powered by LLMs are such a game changer.
The assumption this poster is making is that LLMs are regurgitating whole cloth after being trained on whole cloth.
This is a common mistake among lay people and non-practitioners. The reality is that LLMs have gained the ability to program, by learning from the code of others. Much like a human would learn from the code of others, and then be able to create a completely novel application.
The difference between a human programmer an an agentic coder is that the agent has much broader and deeper expertise across more programming languages, and understands more design patterns, more operating systems, more about programming history, etc etc and it uses all this knowledge to fulfill the task you've set it to. That's not possible for any single human.
It's important for the poster to take two realities on board: Firstly, agentic coding agents are not regurgitating whole cloth from whole cloth. Instead they are weaving new creations because they have learned how to program. Secondly, agentic coding agents have broader and deeper knowledge than any human that will ever exist, and they never tire, and their mood and energy level never changes. In fact that improves on a continuous basis as the months go by and progress continues. This means we can, as individual practitioners or fast moving teams, create things that were never before possible for us without raising huge amounts of money and hiring large very expensive teams, and then having the overhead of lining everyone up behind a goal AND dealing with the human issues that arise, including communication overhead.
This is a very exciting time. Especially if you're curious, energetic, and are willing to suspend disbelief to go and take a look.
nik_0_0
15 hours ago
I don't have any particular horse in this race, but looking at this exchange, I hope its clear where the issue is coming from.
The original post states "I am seeing Codex do much better than Claude Code", and when asked for examples, you have replied with "I don't have time to give you examples, go do it yourself, its obvious."
That is clearly going to rub folks (anyone) the wrong way. This refrain ("Wheres the data?") pops up frequently on HN, if its so obvious, giving 1 prompt where Codex is much greater than Claude doesn't seem like a heavy lift.
In absence of such an example, or any data, folks have nothing to go on but skepticism. Replying with such a polarizing comment is bound to set folks off further.
Vegenoid
15 hours ago
We've all been hearing from people talking about how amazing AI coding agents are for a while now. Many skeptics have tried them out, looked into how to make good use of them, used modern agentic tools, done context engineering, etc. and found that they did not live up to the claims being made, at least for their problem domain.
Talk is cheap, and we're tired of hearing people tell us how it's enabling them to make incredible software without actually demonstrating it. Your words might be true, or they might be just another over-exaggeration to throw on the pile. Without details we have no way of knowing, and so many make the empirically supported choice.
chaboud
14 hours ago
I agree. It’s pretty easy to put-up or shut up.
I recently vibe coded a video analysis pipeline with some related arduino-driven machine control. It was work to prototype an experience on some 3D printed hardware I’ve been skunking out.
By describing the pipeline and filters clearly, I had the analysis system generating useful JSON in an hour or so, including machine control simulation, all while watching TV and answering emails/slacks. Notable misses were that the JSON fields were inconsistent, and the python venvs were inconsistent for the piped way that I wanted the system to operate with.
Small fixes.
Then I wired up the hardware, and the thing absolutely crapped itself, swapping libraries, trying major structural changes, and creating two whole new copies of the machine control host code (asking me each time along the way). This went on for more than three hours, with me debugging the mess for about 20 minutes before resorting to 1) ChatGPT, which didn’t help, followed by 2) a few minutes of good old fashioned googling on serial port behavior on Mac, which, with an old sitting on the shelf Uno R3, meant that I needed to use the cu.* ports instead of tty.*, something that Claude Code had buried deeply in a tangle of files.
Curious about the failure, I told Claude Code to stop being an idiot and use a web browser to go research the problem of specifically locking up on the open operation. 30 seconds later, and with some reflective swearing from Opus 4.1, which I appreciate, I had the code I should have had 3 hours prior (along with other garbage code to clean up).
For my areas of sensing, computer vision, machine learning, etc., these systems are amazingly helpful if the algorithms can be completely and clearly described (e.g., Kalman filter to IoU, box blur followed by subsampling followed by split exponential filtering, etc.).
Attempts to let the robots work complex pipelines out for themselves haven’t gone as well for me.
zamadatix
17 hours ago
Never hold regret for having honesty, it tends to lose its value completely if you only care about it when you have good news to deliver. If for anything, hold regret for when you didn't have something better appreciated to be honest about.
The easier threading-focused approach to the conversation might be to add the additional comment as an edit at the end of the original and reply to the child https://news.ycombinator.com/item?id=45649068 directly. Of course, I've broken the ability to do that by responding to you now about it ;).
mmaunder
17 hours ago
Thanks. I wasn't able to reply in a thread earlier - I guess HN has a throttle on that. So I edited the comment above to add a few more thoughts. It's a very exciting time to be alive.
jamiek88
15 hours ago
Just click on the time. Where yours says ‘two hours ago’ now, if you click on that you can reply directly to any sub comment in a thread.
mmaunder
17 hours ago
lol, thanks.
johnfn
16 hours ago
You’re getting downvoted because the amount of weight I place on your original comment is contingent on whether or not you’re actually using AI to do meaningful work ot not. Without clarifying what you’re doing, it’s impossible to distinguish you from one of those guys that says he’s using AI to do tons of work and then you peek under the hood and he’s made like 15 markdown files and his code is a mess that doesn’t do anything.
Well, that, and it’s just a bit annoying to claim that you’ve found some amazing new secret but that you refuse to share what the secret is. It doesn’t contribute to an interesting discussion whatsoever.
preommr
15 hours ago
> I'm seeing the downvotes. I'm sorry folks feel that way. I'm regretting my honesty.
What honesty? We're not at the point of "the Godfather was a good/bad movie", we're at "no, trust, there's a really good movie called the Godfather".
Your honesty means nothing for an issue that isn't about taste or mostly subjectivness. How useful AI is and in what way is a technical discussion where the meat of the subject matter is. You've shared nothing on that front. I am not saying you have to, but like obviously people are going to downvote you - not because they might agree/disagree but because it's contributed nothing different from every other ai-hype man selling a course or something.
kobe_bryant
17 hours ago
this is absurd. no one needs or wants your AI generated answer that's a whole lot of nothing
mmaunder
17 hours ago
Comments like this reveal the magnitude of polarization around this issue in tech circles. Most people actually feel this kind of animosity towards AI, and so having comment threads like this even be visible on HN is unusual. Needless to say, all my comments here are hand written. But the poster knows that, of course.
maherbeg
18 hours ago
Yeah this has been my experience as well. The Claude Code UI is still so much better, and the permissioning policy system is much better. Though I'm working on closing that gap by writing a custom policy https://github.com/openai/codex/blob/main/codex-rs/execpolic...
Kinda sick of Codex asking for approval to run tests for each test instance
mmaunder
18 hours ago
Ah the tension between cybersecurity best practices and productivity is brutal right now.
maherbeg
16 hours ago
lol yeah, but mostly just want to allow more types of reads for getting context, and primarily for test running / linting etc. I shouldn't have to approve every invocation of `pytest` or `bazel test`.
fragmede
16 hours ago
--dangerously-bypass-approvals-and-sandbox isn't enough for you?
rtfeldman
18 hours ago
You don't have to use Codex in its terminal UI - e.g. you can use it in the Zed IDE out-the-box:
PantaloonFlames
17 hours ago
And also in emacs or neovim
Rebuff5007
4 hours ago
> We were heavy users of Claude Code ($70K+ spend per year)
Claude code has only been generally available since May last year (a year and half ago)... I'm surprised by the process that you are implying; within a year and a half, you both spent 70k on claude code, and knew enough about it and its competition to switch away from it? I dont think I'd be able to due diligence even if LLM evaluation was my fulltime job. Let alone the fact that the capabilities of each provider are changing dramatically every few weeks.
bcrosby95
18 hours ago
Yeah, after correcting it several times I've gotten Claude Code to tell me it didn't have the expertise to work in one of my problem domains. It was kinda surprising but also kinda refreshing that it knew when to give up. For better or worse I haven't noticed similar things with Codex.
mmaunder
18 hours ago
I've chosen problems with non-negotiable outcomes. In other words, problem domains where you either are able to clearly accomplish the very hard thing, or not, and there's no grey area. I've purposely chosen these kinds of problems to prove what AI agents are capable of, so that there is no debate in my mind. And with Codex I've accomplished the previously impossible. Unambiguously. Codex did this. Claude gave up.
It's as if there are two vendors saying they can give up incredibly superpowers for an affordable price, and only one of them actually delivers the full package. The other vendor's powers only work on Tuesdays, and when you're lucky. With that situation, in an environment as competitive as things currently stand, and given the trajectory we're on, Claude is an absolute non-starter for me. Without question.
Aeolun
16 hours ago
I don’t think Claude is actually incapable, you just spend a lot of time telling it to yes, please actually do the difficult thing. Do not give up halfway through.
Codex says “This is a lot of work, let me plan really well.”
Claude says “This is a lot of work, let me step back and do something completely different that you didn’t ask for.”
corndoge
17 hours ago
Can you expound a bit on the problem domains? I am curious
skybrian
16 hours ago
We need product reviewers who can demonstrate things like this in public. Without details, "it works for me on my projects" only goes so far.
lherron
18 hours ago
Still a toss-up for me which one I use. For deep work Codex (codex-high) is the clear winner, but when you need to knock out something small Claude Code (sonnet) is a workhorse.
Also CC tool usage is so much better! Many, many times I’ve seen Codex writing a python script to edit a file which seems to bypass the diff view so you don’t really know what’s going on.
hn_saver
10 hours ago
How did you spend $70k per year for a tool that's not a single year old?
TkTech
10 hours ago
API pricing rates probably. If I take a look at my current usage since it came out, it'd be about $12000 CAD if paid at API rates. Ridiculously easy to rack up absurd bills via the API, and I'm mostly just using it for code review. Someone using it heavily could easily, easily get way over 70k.
tstrimple
10 hours ago
Also the statement was "We". It's not a single user's billable usage and we have zero details as to how many people made up "We". So any analysis into the cost or value are meaningless.
koakuma-chan
10 hours ago
Why did they not buy a subscription? It would be a flat fee.
NiloCK
3 hours ago
At that spend, no subscription is available to serve that much traffic - they are all rate limited.
I understand the 70K spend as a corporate expense, not an individual... right?
p337
15 hours ago
On the topic of comparing OpenAI models with Anthropocene models, I have a hybrid approach that seems really nice.
I set up an MCP tool to use gpt-5 with high reasoning with Claude Code (like tools with "personas" like architect, security reviewer, etc), and I feel that it SIGNIFICANTLY amplifies the performance of Claude alone. I don't see other people using LLMs as tools in these environments, and it's making me wonder if I'm either missing something or somehow ahead of the curve.
Basically instead of "do x (with details)" I say "ask the architect tool for how you should implement X" and it gets into this back and forth that's more productive because it's forcing some "introspection" on the plan.
jrk
14 hours ago
This is an established, though advanced, idea.
Sourcegraph Amp (https://sourcegraph.com/amp) has had this exact feature built in for quite a while: "ask the oracle" triggered an O1 Pro sub-agent (now, I believe, GPT-5 High), and searching can be delegated to cheaper, faster, longer-context sub-agents based on Gemini 2.5 Flash.
kelvinjps10
16 hours ago
I did the opposite I switched to Claude code once the released the new model last week of the one before, I tried using codex, but there was issues with the terminal and prompting (multiple characters getting deleted) I found Claude code to have more features and less bugs, like the edit on vim for the prompt being really useful and find it better to iterate. Also I like more its tool usage and the use of the shell. Sometimes codex prefer to use python instead of doing the equivalent shell command. Maybe it's like the other people say here, that codex it's better for long running tasks, I prefer to give Claude small tasks and I'm usually satisfied with the result and I like to work alongside the agent
didibus
13 hours ago
Interesting, I find codex CLI is really bad, like the worst coding agent I've tried.
Fails to escalate permissions, gets derailed, loves changing too many things everywhere.
GPT5 is good, but codex is not.
CompoundEyes
15 hours ago
Claude Code is still good but I don’t TRUST it. With Claude Code and Sonnet I’m expecting failure. I can get things done but there’s an administrative overhead of futzing around with markdown files, defensive commit hooks and unit tests to keep it on rails while managing the context panic. Codex CLI with gpt-5-codex high reasoning is next gen. I’m sure Sonnet 5 will match it soon. At that point I think a lot of the workflows people use in Claude Code will be obsolete and the sycophancy will disappear.
slaymaker1907
15 hours ago
I haven’t used Codex a lot, but GPT-5 is just a bit smarter in agent mode than Claude 4.5. The most challenging thing I’ve used it for is for code review and GPT-5 somewhat regularly found intricate bugs that Claude missed. However, Claude seemed to be better at following directions exactly vs GPT-5 which requires a lot more precision.
pythonbase
3 hours ago
What is your general use case with Claude Code / Codex? $70K/year is a significant spend.
jakenuts
an hour ago
Same!
mi_lk
18 hours ago
What model are you using respectively? Not sure I share your observations
mmaunder
18 hours ago
Have tried all and continue to eval regularly. I spend up to 14 hours a day. Currently recovering from a herniated disk because I spent 6 weeks sitting at a dining room table, 14 hours a day, leaning foward. Don't do that. lol. So my coverage is pretty good. I'm using GPT5-codex-high for 99% of my work. Also I have a team of 40 folks, about a third of which are software engineers and the other third are cybersecurity analysts, so I get feedback from them too and we go deep on our engineering calls re the latest learnings and capabilities.
sabareesh
17 hours ago
Similar feeling. Seems it is good at certain things and if something doesnt work it want to do things simply and in turn becomes something that you didnt ask for and certain times opposite of what you wanted. On the other hand with codex certain time you feel the AGI but that is like 2 out of 10 sessions. This is primarily may be due to how complete the prompt and how well you define the problems.
durron
18 hours ago
Do you find this to still be true with the Sonnet 4.5 model?
extr
18 hours ago
IMO Sonnet 4.5 is great but it just isn’t as comprehensive of a thinker. I love Anthropic and primarily use CC day to day but for any tricky problems or “high stakes, this must not have bugs” issues, I turn to Codex. I do find if you let Codex run on it its own too long it will produce comparably sloppy or lacking-in-vision type issues that people criticize Sonnet for, however.
PantaloonFlames
17 hours ago
That’s a curious approach. Why would you use both? Why not just use the more reliable dependable option for all purposes?
extr
17 hours ago
Sonnet 4.5/CC is faster, more direct, and is generally better at following my intent rather than the letter of my prompt. A large chunk of my tasks are not "solve this concurrency bug" or "write this entire feature" but rather "CLI ops", merging commits, running a linter, deploying a service, etc. I almost use it like it was my shell.
Also while not quite as smart, it's a better pair programmer. If I'm feeling out a new feature and am not sure how exactly it should work yet, I prefer to work with Sonnet 4.5 on it. It typically gives me more practical and realistic suggestions for my codebase. I've noticed that GPT-5 can jump right into very sophisticated solutions that, while correct, are probably not appropriate.
Sonnet 4.5: "Why don't we just poll at an interval with exponential backoff?"
GPT-5: "The correct solution is to include the data in the event stream...let us begin by refactoring the event system to support this..."
That said, if I do want to refactor the event system, I definitely want to use Codex for that.
NiloCK
2 hours ago
Even inside the claude-code ecosystem, more than ever there are tradeoffs on raw speed vs intelligence vs cost.
Moving a bunch of verbose templated HTML around while watching results on a devserver? Haiku all day. It's a bonus that it's cheaper, but the real treat is its speed.
Adding a feature whose planning will involve intake of several files? Sonnet.
Working specifically on 'copy' or taste issues? Still I tend to prefer Opus here.
Individual experiences may vary!
macNchz
15 hours ago
I frequently have multiple coding assistants going at once—Gemini 2.5 Pro via Aider as the workhorse for most standard changes, Sonnet 4.5 via Claude Code for question answering, documentation, test case development, or broad based changes to many files in a project, then GPT-5 for more complex diagnostic or architectural type things—I don’t generally like the code it writes, but it will often be able to fix situations where the other models get stuck in some kind of local maxima.
wrs
17 hours ago
In my experience, there isn’t a model that is more dependable for all purposes. They each have some unique strengths.
theshrike79
18 hours ago
I'm like 80% sure Sonnet 4.5 is just rebranded Opus.
Sonnet 4 was a coding companion, I could see what it was doing and it did what I asked.
Sonnet 4.5 is like Opus, it generates massive amounts of "helper scripts" and "bootstrap scripts" and all kinds of useless markdown documentation files even for the tinies PoC scripts.
esafak
18 hours ago
I don't. Sonnet is faster too.
mmaunder
18 hours ago
Yes. Sadly. And it really does make me sad. I was rooting for Anthropic. Still kinda am.
bgirard
18 hours ago
I have a very similar experience. I was heavily invested in Anthropic/Claude Code, and even after Sonnet 4.5, I'm finding that Codex is performing much better for my game development project.
mmaunder
16 hours ago
It seems particularly good at high performance programming in low level languages.
purnesh
16 hours ago
My experience is similar, but for me, Claude Code is still better when designing or developing a frontend page from scratch. I have seen that Codex follows instructions a bit too literally, and the result can feel a little cold.
CC on the other hand feels more creative and has mostly given better UI.
Of course, once the page is ready, I switch to Codex to build further.
blueside
14 hours ago
As we all know here, if the the title of this post was about Codex on the web, the top comment would have been about using Claude instead.
YMMV, but this definitely doesn't track with everything I've been seeing and hearing, which is that Codex is inferior to Claude on almost every measure.
dakom
4 hours ago
fwiw I'm happy to see this - been trying to tackle a hairy problem (rendering bugs) and both models fail, but:
1. Codex takes longer to fail and with less helpful feedback, but tends to at least not produce as many compiler errors 2. Claude fails faster and with more interesting back-and-forth, though tends to fail a bit harder
Neither of them are fixing the problems I want them to fix, so I prefer the faster iteration and back-and-forth so I can guide it better
So it's a bit surprising to me when so many people are pickign a "clear winner" that I prefer less atm
citizenpaul
15 hours ago
Does no one use Blocks Goose CLI anymore? I went to a hackathon in SF at the beginning of the year and it seemed like 90% of the groups used Goose to do something in their Agent project. I get that the CLI agent scene has exploded since then I just wonder what what is so much better in the competition?
poorman
17 hours ago
Totally agree. I was just thinking that I wouldn't want this feature for Claude Code but for Codex right now it would be great! I can simply let tasks run in Codex and I know it's going to eventually do what I want. Where as with Claude Code I feel like I have to watch it like a hawk and interrupt it when it goes off the rails.
013
6 hours ago
What are the usage limits for Codex compared to Claude Code?
catigula
16 hours ago
This is such an interesting perspective because I feel codex is hugely impressive but falls apart on any even remotely difficult task and is too autonomous and not eager enough.
Claude feels like a better fit for an experienced engineer. He's a positive, eager little fellow.
nadermx
15 hours ago
I'm just happy alternatives exist.
tstrimple
10 hours ago
> I've seen Claude Code literally just give up on a hard problem and tell me to buy something off the shelf
I've been seeing more of this lately despite initial excellent results. Not sure what's going on, but the value is certainly dropping for me. I'll have to check out codex. CLI integration is critical for me at this point. For me it is the only thing that actually helps realize the benefits of LLM models we have today. My last NixOS install was completely managed by Claude Code and it worked very well. This was the result of my latest frustrations:
https://i.imgur.com/C4nykhA.png
Though I know the statement it made isn't "true". I've had much better luck pursuing other implementation paths with CC in the same space. I could have prompted around this and should have reset the context much earlier but I was drunk "coding" at that point and drove it into a corner.
asdev
17 hours ago
do you use the CLI or the web UI? or both?
tonyhart7
8 hours ago
is is that better tho????
I thought claude code is still better in tool calling and something like that
dboreham
17 hours ago
This is going to be situation normal for 10 years: everyone will need to keep track of "model-du-jour" as each vendor makes incremental improvements.
mvkel
16 hours ago
This is why Anthropic is a zombie company.
They put all of their eggs in the coding basket, with the rest of their mission couched as "effective altruism," or "safetyism," or "solving alignment," (all terms they are more loudly attempting to distance themselves from[0], because it's venture kryptonite).
Meanwhile, all OpenAI had to do was point their training cannon at it for a run, and suddenly Anthropic is irrelevant. OpenAI's focus as a consumer company (and growing as a tool company) is a safe, venture-backable bet.
Frontier AI doesn't feel like a zero-sum game, but for now, if you're betting on AI at all, you can really only bet on OpenAI, like Tesla being a proxy for the entire EV industry.
[0] https://forum.effectivealtruism.org/posts/53Gc35vDLK2u5nBxP/...
F7F7F7
16 hours ago
For non-vibe coding purposes I've found that my $200 Claude (Claude Code) account regularly outperformed my $200 ChatGPT (Codex) account. This was after 2 months of heavily testing both mostly in Terminal TUI/CLI form and most recently with the latest VSCode/Cursor incarnations.
Even with the additional Sora usage and other bells & whistles that ChatGPT @ $200 provides, Claude provides more value for my use cases.
Claude Code is just a lot more comfortable being in your workflow and being a companion or going full 'agent(s)' and running for 30 minutes on one ticket. It's also a lot happier playing with Agents from other APIs.
There's nothing wrong with Anthropic wanting to completely own that segment and not have aspirations of world domination like OpenAI. I don't see how that's a negative.
If anything, the more ChatGPT becomes a 'everything app' the less likely I am to hold on to my $20 account after cancelling the $200 account. I'm finding the more it knows about me the more creeped out and "I didn't ask for this" I become.
mvkel
15 hours ago
> There's nothing wrong with Anthropic wanting to completely own that segment and not have aspirations of world domination
It's very clear by their actions (not words) that they are shooting for the moon in order to survive. There is no path to sustainability as a collection of dev tools.
fragmede
16 hours ago
Especially now that sama wants us to sext with ChatGPT