Topfi
10 hours ago
Very preliminary testing is very promising, seems far more precise in code changes over GPT-5 models in not ingesting irrelevant to the task at hand code sections for changes which tends to make GPT-5 as a coding assistant take longer than sometimes expected. With that being the case, it is possible that in actual day-to-day use, Haiku 4.5 may be less expensive than the raw cost breakdown may appear initially, though the increase is significant.
Branding is the true issue that Anthropic has though. Haiku 4.5 may (not saying it is, far to early to tell) be roughly equivalent in code output quality compared to Sonnet 4, which would serve a lot users amazingly well, but by virtue of the connotations smaller models have, alongside recent performance degradations making users more suspicious than beforehand, getting these do adopt Haiku 4.5 over Sonnet 4.5 even will be challenging. I'd love to know whether Haiku 3, 3.5 and 4.5 are roughly in the same ballpark in terms of parameters and course, nerdy old me would like that to be public information for all models, but in fairness to companies, many would just go for the largest model thinking it serves all use cases best. GPT-5 to me is still most impressive because of its pricing relative to performance and Haiku may end up similar, though with far less adoption. Everyone believes their task requires no less than Opus it seems after all.
For reference:
Haiku 3: I $0.25/M, O $1.25/M
Haiku 4.5: I $1.00/M, O $5.00/M
GPT-5: I $1.25/M, O $10.00/M
GPT-5-mini: I $0.25/M, O $2.00/M
GPT-5-nano: I $0.05/M, O $0.40/M
GLM-4.6: I $0.60/M, O $2.20/M
Topfi
8 hours ago
Update, Haiku 4.5 is not just very targeted in terms of changes but also really fast. Averaging at 220token/sec is almost double most other models I'd consider comparable (though again, far to early to make a proper judgement) and if this can be kept up, that is a massive value add over other models. That is nearly Gemini 2.5 Flash Lite speed for context.
Yes, we got Groq and Cerebras getting up to 1000token/sec, but not with models that seem comparable (again, early, not a proper judgement). Anthropic has been historically the most consistent in outperforming personal benchmarks vs public benchmarks, for what that is worth so I am optimistic.
If speed, performance and pricing are something Anthropic can keep consistent long term (i.e. no regressions), Haiku 4.5 really is a great option for most coding tasks, with Sonnet something I'd tag in only for very specific scenarios. Past Claude models have had a deficiency in longer term chains of tasks. Beyond 7 minutes roughly, performance does appear to worsen with Sonnet 4.5, as an example. That could be an Achilles heel for Haiku 4.5 as well, if not this really is a solid step in terms of efficiency, but I have not done any longer task testing yet.
That being said, Anthropic once again has a rather severe issue it seems, casting a shadow upon this release. From what I am seeing and others are reporting, Claude Code currently does count Haiku 4.5 usage the same as Sonnet 4.5 usage, despite the latter being significantly more expensive. They also did not yet update the Claude Code support pages to reflect the new models usage limits [0]. I really think such information should be public by launch day and hope they can improve their tooling and overall testing, it really continues to throw a shadow over their impressive models.
[0] https://support.claude.com/en/articles/11145838-using-claude...
qingcharles
2 hours ago
It's insanely fast. I didn't know it had even been released, but I went to select the copilot SWE test model in VSCode and it was missing and Haiku 4.5 was there instead. I asked for a huge change to a web app and the output from Haiku scrolled the text faster than Windows could keep up. From a cold start. Wrote a huge chunk of code in about 40 seconds. Unreal.
p.s. it also got the code 100% correct on the one-shot p.p.s. Microsoft are pricing it out at 30% the cost of frontier models (e.g. Sonnet 4.5, GPT5)
katchu11
5 hours ago
Hey! I work on the Claude Code team. Both PAYG and Subscription usage look to be configured correctly in accordance with the price for Haiku 4.5 ($1/$5 per M I/O tok).
Feel free to DM me your account info on twitter (https://x.com/katchu11) and I can dig deeper!
peddling-brink
4 hours ago
lol, I don’t know if you work there or not, but directing folks to send their account info to a random Twitter address is, not considered best practice.
ethbr1
an hour ago
Being charitable, let's assume parent wasn't talking about secrets.
rbitar
7 hours ago
Where do you get the 220 token/second? Genuinely curious as that would be very impressive for a model comparable to sonnet 4. OpenRouter currently publishing around 116/tps[1]
Topfi
7 hours ago
Was just about to post that Haiku 4.5 does something I have never encountered before [0], there is a massive delta between token/sec depending on the query. Some variance including task specific is of course nothing new, but never as pronounced and reproducible as here.
A few examples, prompted at UTC 21:30-23:00 via T3 Chat [0]:
Prompt 1 — 120.65 token/sec — https://t3.chat/share/tgqp1dr0la
Prompt 2 — 118.58 token/sec — https://t3.chat/share/86d93w093a
Prompt 3 — 203.20 token/sec — https://t3.chat/share/h39nct9fp5
Prompt 4 — 91.43 token/sec — https://t3.chat/share/mqu1edzffq
Prompt 5 — 167.66 token/sec — https://t3.chat/share/gingktrf2m
Prompt 6 — 161.51 token/sec — https://t3.chat/share/qg6uxkdgy0
Prompt 7 — 168.11 token/sec — https://t3.chat/share/qiutu67ebc
Prompt 8 — 203.68 token/sec — https://t3.chat/share/zziplhpw0d
Prompt 9 — 102.86 token/sec — https://t3.chat/share/s3hldh5nxs
Prompt 10 — 174.66 token/sec — https://t3.chat/share/dyyfyc458m
Prompt 11 — 199.07 token/sec — https://t3.chat/share/7t29sx87cd
Prompt 12 — 82.13 token/sec — https://t3.chat/share/5ati3nvvdx
Prompt 13 — 94.96 token/sec — https://t3.chat/share/q3ig7k117z
Prompt 14 — 190.02 token/sec — https://t3.chat/share/hp5kjeujy7
Prompt 15 — 190.16 token/sec — https://t3.chat/share/77vs6yxcfa
Prompt 16 — 92.45 token/sec — https://t3.chat/share/i0qrsvp29i
Prompt 17 — 190.26 token/sec — https://t3.chat/share/berx0aq3qo
Prompt 18 — 187.31 token/sec — https://t3.chat/share/0wyuk0zzfc
Prompt 19 — 204.31 token/sec — https://t3.chat/share/6vuawveaqu
Prompt 20 — 135.55 token/sec — https://t3.chat/share/b0a11i4gfq
Prompt 21 — 208.97 token/sec — https://t3.chat/share/al54aha9zk
Prompt 22 — 188.07 token/sec — https://t3.chat/share/wu3k8q67qc
Prompt 23 — 198.17 token/sec — https://t3.chat/share/0bt1qrynve
Prompt 24 — 196.25 token/sec — https://t3.chat/share/nhnmp0hlc5
Prompt 25 — 185.09 token/sec — https://t3.chat/share/ifh6j4d8t5
I ran each prompt three times and got (within expected variance meaning less than 5% plus or minus) the same token/sec results for the respective prompt. Each used Claude Haiku 4.5 with "High reasoning". Will continue testing, but this is beyond odd. I will add that my very early evals leaned heavily into pure code output, where 200 token/sec is consistently possible at the moment, but it is certainly not the average as claimed before, there I was mistaken. That being said, even across a wider range of challenges, we are above 160 token/sec and if you solely focus on coding, whether Rust or React, Haiku 4.5 is very swift.
[0] Normally not using T3 Chat for evals, just easier to share prompts this way, though was disappointed to find that the model information (token/sec, TTF, etc.) can't be enabled without an account. Also, these aren't the prompts I usually use for evals. Those I try to keep somewhat out of training by only using paid for API for benchmarks. As anything on Hacker News is most assuredly part of model training, I decided to write some quick and dirty prompts to highlight what I have been seeing.
rbitar
4 hours ago
Interesting and if they are using speculative decoding that variance would make sense. Also your numbers line up with what openrouter is now publishing at 169.1tps [1]
Anthropic mentioned this model is more then twice as fast as claude sonnet 4 [2], which OpenRouter averaged at 61.72 tps for sonnet 4 [3]. If these numbers hold we're really looking at an almost 3x improvement in throughput and less then half the initial latency.
[1] https://openrouter.ai/anthropic/claude-haiku-4.5 [2] https://www.anthropic.com/news/claude-haiku-4-5 [3] https://openrouter.ai/anthropic/claude-sonnet-4
cromulen
6 hours ago
That's what you get when you use speculative decoding and focus / overfit the draft model on coding. Then when the answer is out of distribution for the draft model, you get increased token rejections by the main model and throughput suffers. This probably still makes sense for them if they expect a lot of their load will come from claude code and they need to make it economical.
larodi
9 hours ago
Been waiiting for the Haiku update as I still do a lot of dumb work with the old one, and it is darrn cheap for what you get out of it with smart prompting. Very neat they finally release this, updating all my bots... sorry agents :)
deadbabe
10 hours ago
Those numbers don’t mean anything without average token usage stats.
distalx
37 minutes ago
Exactly, token per dollar rates are useful, but without knowing the typical input output token distribution for each model on this specific task, the numbers alone don’t give a full picture of cost.