Aurornis
14 hours ago
This is their hosted-only model, not an open weight model like they’ve become known for. They got a lot of good publicity for their open weight model releases, which was the goal. The hard part is pivoting from an open weight provider to being considered as a competitor to Claude and ChatGPT. Initial reactions are mostly anger from everyone who didn’t realize that the play along was to give away the smaller models as advertising, not because they were feeling generous.
Comparing to Opus 4.5 instead of the current 4.6 and other last-gen models is clearly an attempt to deceive, which isn’t winning them any points either.
I think there is a moderately large market for models like this that aren’t quite SOTA level but can be served up much cheaper. I don’t know how successful they’ll be in the race to the bottom in this market niche, though. Most users of cheap API tokens are not loyal to any brand and will change providers overnight each time someone releases a slightly better model.
zozbot234
13 hours ago
> not an open weight model like they’ve become known for.
Right, they state that they'll release "smaller" variants openly at some point, with few details as to what that means. Will there be a ~300B variant as with Qwen 3.5? The blog post doesn't say.
dietr1ch
8 hours ago
I wish they had a revenue goal to release openly, that way spending money in them would contribute to better open models in the long run.
This is how I view that the public can fund and eventually get free stuff, just like properly organized private highways end up with the state/society owning a new highway after the private entity that built it got the profits they required to make the project possible.
drob518
7 hours ago
As a publicity stunt, releasing a 300B open model is pretty smart. You can talk about its strong performance and it being “open” and “available,” but it’s so large that most people can’t use it themselves and might try out the cloud-based offering.
mogili1
2 hours ago
There are plenty of model providers that can serve them though at cheaper prices and cannibalize Alibaba revenue.
zozbot234
7 hours ago
The large models are actually MoE these days so they're usable on ordinary hardware with weights streaming from SSD, just very slow. You're nonethess right that it makes the cloud-based offering more popular, since you can use that for convenience after testing a few inferences locally.
echelon
10 hours ago
I'm not interested in adopting an inferior closed source weight from a geopolitical rival. The open source weights argument was the one thing China had going and that I was seriously cheering them on for. They could have been our saviors and disrupted the US tech giants - and if it was open, I'd have welcomed it.
Now they show their true colors. They want to train models on our engineering to replace us, while simultaneously giving nothing back? No thanks. I'd rather fund the shitty US hyperscalers. At least that leads to jobs here.
If there's a company willing develop and foster large scale weights in the open, I'll adopt their tooling 100%. It doesn't matter if they're a year behind. Just do it open and build an entire ecosystem on top of it.
The re-AOLization of the internet into thin clients is bullshit, and all it takes is one player to buck the rules to topple the whole house of cards.
Zetaphor
9 hours ago
> I'm not interested in adopting an inferior closed source weight from a geopolitical rival. The open source weights argument was the one thing China had going and that I was seriously cheering them on for. They could have been our saviors and disrupted the US tech giants - and if it was open, I'd have welcomed it.
Qwen is not the only Chinese lab, and the others have shown no change in their commitment to open source. Allegedly Qwen hasn't either if their recent statements are to be believed. They're just hoping to capture market share with *-claw customers before releasing an open weights version. We'll have to wait and see how before they decide to release that.
zozbot234
8 hours ago
> the others have shown no change in their commitment to open source
I wouldn't call this totally accurate, especially as of late. What's closer to the truth however is that there's lots of second-rate players in China doing open models, that will be getting a lot more attention from local AI proponents if the big names seriously slow down their AI releases. The local AI scene as a whole is quite healthy.
zozbot234
10 hours ago
> I'm not interested in adopting an inferior closed weights model from a geopolitical rival.
That's a very reasonable stance. It doesn't change the fact that we do have plenty of local models (up to and including Qwen 3.5) that are still quite useful.
jhancock
an hour ago
z.ai models are open weights. GLM-5.1 is very close to Opus with obvious exception of session length.
Only academic models will be true open source as companies can't legally afford to disclose learning inputs.
In regards to "They want to train models on our engineering to replace us". Some software engineers in China can run circles around some of the best teams in Silicon Valley. Days of U.S. hegemony are over. I recommend you make peace and make friends.
evilduck
9 hours ago
This is not even the first closed weights Qwen model.
benatkin
7 hours ago
> I'm not interested in adopting an inferior closed source weight from a geopolitical rival.
I'm USian myself, but I don't think the site should be very US-centric.
cmrdporcupine
10 hours ago
Whereas I as a Canadian am absolutely eager to see a serious competitor from a rival to the US because sending money south to Anthropic and OpenAI who think it's ok to spy on (or worse) their non-American customers, and are headquartered in a country that is trying to crush my country's economy, interfere in our domestic politics, and put us out of work and making threats on political allies.
I'd prefer them to be open weight, but I'd love to sub a decent competitive coding plan from a European or Chinese provider. Right now they're not quite there. If closing it and charging for it brings them closer to competitive, that's ok.
If the US tech and AI industry long term wants customers and a broad market outside of their own domestic base, they need to reconsider who they are bending the knee to, and how they are defining their policies in relation to the Trump administration.
Bring on the Chinese competition.
zozbot234
9 hours ago
China (meaning the Chinese government specifically, not the people of course) is widely considered to be a low-key geopolitical rival to the developed West in general including Canada and Europe, not just the U.S. I don't exactly like this and would certainly prefer that this wasn't the case, but we can't exactly ignore the facts. This matters when we choose whom to rely on for things like certain hosted third-party services, including AI inference. GP's stance actually makes a lot of sense from this POV, even though it's just as true that many Chinese folks are doing wonderful work on open-weight local AI.
Filligree
9 hours ago
China has never threatened war against my country; America has. Between the two, it’s clearly safer to lean towards the Chinese options if EU ones aren’t available.
andsoitis
7 hours ago
That’s incredibly naïve.
3acctforcom
6 hours ago
Meh, people have their own interests and values. And you can't force people to spend money no matter how much you may disagree with them
Bring on the Chinese, fuck the Americans.
whimblepop
5 hours ago
More naive than blithely blowing off threats of war?
NonHyloMorph
7 hours ago
How so?
jhancock
an hour ago
I've been using z.ai and codex latest models since last September. Each release has been an improvement.
codex handles longer sessions but the quality seems to decline and it tends to over engineer and lose focus. It will happily add slop on top of slop...which may pass immediate tests of "code works" but doesn't pass my criteria of "code as craft"
I'm using z.ai GLM with opencode. It's obvious when GLM loses its mind when the session gets too long.
I've been using AI to support programming for around 3 years now. The models have gotten amazing. However, unless there is a significant breakthrough I have determined that it's best for me to focus on short sessions.
I a) organize my work, b) improve my AGENTS.md, ensure source has appropriate comments to guide the models to the patterns and separation of concerns c) use shorter sessions d) review and test without AI. This approach means I still own my code. The AI is just an assistant.
With this approach GLM-5.1 is an excellent model. I never run out of token allotment on z.ai or codex plans. At this point, I only keep my OpenAI subscription as the ChatGPT desktop app is excellent at long web research tasks and I get codex with it.
mikrotikker
4 hours ago
You're giving up the rest of your country to a geopolitical rival from a separate region, in a separate hemisphere with smiling expansionist goals, even allowing armed Chinese security to protect Chinese installations in country. So why not give the rest of your country to China.
It will help them get a good flank on the USA such that even when that temporarily embarrassed country gets a leader you, and the rest of the world do like, it will be too late to do anything.
A perfect definition of cutting off your nose to spite your face laid bare for all to see.
zwaps
4 hours ago
The US under Trump is politically and strategically almost identical to China, and can be trusted about the same.
And then, compared to China, the US acts overtly hostile: threatening us with war, starting a war in order to collapse energy supplies outside of the US. Opportunistic beyond even China, much more hostile.
Will the US even be a democracy in two years? Is it now?
Nah man, balancing between China and the US is the only thing a smaller country can do in order not to be crushed
cmrdporcupine
3 hours ago
"Temporarily embarassed" doesn't even begin to describe what's happening down there.
We have an American neighbour actively funding and amplifying a formerly extremely fringe separatist movement in Alberta -- shades of the Donbas, North American edition --and a US "ambassador" who has the behaviour of a 4chan troll.
The bridge has been blown up. Americans might think they are a midterm election away from salvation, but we're on the whole not so naive.
miki123211
12 hours ago
Ah, so that explains the recent wave of Qwen team-member departures.
dhfs
9 hours ago
Interesting! What is your reasoning behind that? I just learned there where closed models from the team before this so that shouldn’t have been a surprise for the employees? Or do you think the internal communication was: we will release better open models the the existing closed ones to push everything forward and now when they are getting competitive they are becoming proprietary?
jimbokun
10 hours ago
I’m starting to wonder where the most is for any of these models.
Sure they are not cheap to train. But if open weight models continue to be trained and continue to become available on cheaper hardware, how do dedicated AI companies protect their margins?
nwienert
9 hours ago
4.5 is better than 4.6 though in practice. 4.6 was purely a cost savings change with enough benchmark gamification to look better.
girvo
6 hours ago
Exactly. 3.6 plus in the exact same coding agent harness is notably worse in all of my testing compared to 3.5 plus.
The former gets stuck in ridiculous thought loops on the exact same tasks I’m testing. Fascinating really, I expected more for some reason.
true_religion
12 hours ago
Opus was released in Feb 2026. Even though it feels like a long 2 months has passed, its' not really clear that they were developing this as a competitor to that product.
There's nothing really strange about not competing directly with the best, but rather showing whom you are as good as.
Aurornis
10 hours ago
I don’t know why anyone would do the mental backflips to defend this.
They posted charts with logos for Claude and others. You had to read the fine details to realize they weren’t comparing to the latest offerings from those companies. They were counting on you not noticing.
There’s zero reason to compare to old models unless you’re trying to mislead.
epolanski
6 hours ago
I use different models in production and model's "personality" as in tendency to not go off script, not consume gazillions of tokens recursively, follow instructions etc, are more relevant than "brute" power which is okayish as a metric for agentic coding on generous token plans.
Chinese models are very competitive in that regard, you'll often look at 70-90% price reduction at the same quality.
cyanydeez
6 hours ago
The business model, howver, is lobster in a bucket. Any model that starts gaining as a private model will have competitors to release comparable open models because those locked in customers will not swotch unless you demo the capabilities.
So expect every now and then a open model burp from the trailing frontiers. Afterall, its all sunk cost so once you have it and no customers, theres zero reason not to spike your competition and try again or exit.
cubefox
14 hours ago
> I think there is a moderately large market for models like this that aren’t quite SOTA level but can be served up much cheaper.
There isn't, pretty much everyone wants the best of the best.
PhilippGille
13 hours ago
The OpenRouter usage stats indicate the opposite: https://openrouter.ai/rankings?view=month
jjice
13 hours ago
OpenRouter usage is likely skewed towards LLMs that are more niche and/or self-hostable by solid hardware that's available, but most consumers don't have on hand. I can imagine Anthropic and OpenAI LLMs often get called directly from their APIs instead.
At least from my experience and friends of mine, we use OpenRouter for cases where we want to use smaller LLMs like Qwen, but when I've used ChatGPT and Claude, I use those APIs directly.
elbear
10 hours ago
I use ChatGPT and Claude on OpenRouter, because it's just easier than buying credits on each platform separately.
senordevnyc
12 hours ago
Same, and my little SaaS is pushing more than 0.1% of the TOTAL volume of tokens on OpenRouter, so the reality is they’re TINY.
vorticalbox
12 hours ago
what happened around jan this year(26) that caused such a climb in usage?
wcallahan
11 hours ago
Openclaw
thraxil
12 hours ago
No. Right now I'm upset that Google has removed (or at least is in the process of removing) the Gemini 2.0 flash model. We use it for some pretty basic functionality because it's cheap and fast and honestly good enough for what we use it for in that part of our app. We're being forced to "upgrade" to models that are at least 2.5 times as expensive, are slower and, while I'm sure they're better for complex tasks, don't do measurably better than 2.0 flash for what we need. Yay. We've stuck with the GCP/Gemini ecosystem up until now, but this is kind of forcing us to consider other LLM providers.
toofy
11 hours ago
this is one of the reasons im hearing more and more people are using open/locally hosted models. particularly so we dont have to waste time to entirely redo everything when inevitably a company decides to pull the rug out from under us and change or remove something integral to our flow, which over the years we've seen countless times, and seems to be getting more and more common.
products entirely disappearing or significantly changing will be more and more common in the llm arena as things move forward towards companies shutting down, bubbles deflating, brand priorities drastically reshifting, etc...
i think, we're at or at least close to a time to really put some thought into which pieces of your flow could be done entirely with an open/local model and be honest with ourselves on which pieces of our flow truly needs sota or closed models that may entirely disappear or change. in the long run, putting a little bit of thought into this now will save a lot of headache later.
thraxil
8 hours ago
Yeah. Back when Gemma2 came out we benchmarked it and were looking at open models. For our use case though, while the tasks are pretty simple, we do need a pretty large context window and Gemini had a big lead there over the open models for quite a while. I'll probably be evaluating the current batch of open models in the near future though.
jimbokun
10 hours ago
What’s interesting about this is that for previous technologies you could define a standard and demonstrate compliance with interfaces and behavior.
But with LLMs, how do you know switching from one to another won’t change some behavior your system was implicitly relying on?
elbear
10 hours ago
In case you don't know, Gemini 2.5 flash is hosted on DeepInfra. They also have 1.5 flash but not 2.0 flash.
I have no affiliation with DeepInfra. I use them, because they host open-source models that are good.
thraxil
9 hours ago
Thanks. Yeah, for now we're moving to 3.1 flash lite as that's the new cheapest at $.25/1M and is also still "good enough". 2.5 flash is more expensive at $.30/1M (looks like Deep Infra charges the same as GCP/VertexAI for it). I might check them out for Gemma though. We benchmarked Gemma2 when that came out and it wasn't remotely usable for us largely because the context window was way too small. It looks like 3 or 4 might be worth evaluating though.
nl
an hour ago
Xiaomi's mimo-v2-flash is great if you care about speed and performance - it's 1/10 the price of Gemini 3.1 Flash Lite and faster (on OpenRouter).
GCP does server other non-Google models, but I'm not sure what they have other than Anthropic models. I don't think Haiku is a great model though.
Someone1234
13 hours ago
> There isn't, pretty much everyone wants the best of the best.
For direct user interaction or coding problems, perhaps. But as API calls get cheaper, it becomes more realistic to use them for completely automated workflows against data-sets, or as sub-agents called from expensive SOTA models.
For example, in Claude, using Opus as an orchestrator to call Sonnet sub-agents, is a popular usage "hack." That only gets more powerful, as the Sonnet equivalent model gets cheaper. Now you can spawn entire teams of small specialized sub-agents with small context windows but limited scope.
alexsmirnov
13 hours ago
Exactly.
I did create my own MCP with custom agents that combine several tools into a single one. For example, all WebSearch, WebFetch, Context7 exposed as a single "web research" tool, backed by the cheapest model that passes evaluation. The same for a codebase research
Use it with both Claude and Opencode saves a lot of time and tokens.
hadlock
7 hours ago
I'd be interested in seeing the source for this if you have a moment
thinkcontext
13 hours ago
> But as API calls get cheaper, it becomes more realistic to use them for completely automated workflows against data-sets
Seems like a huge waste of money and electricity for processes that can be implemented as a traditional deterministic program. One would hope that tools would identify recurrent jobs that can be turned into simple scripts.
Someone1234
11 hours ago
It depends on the specific task.
For example: "Here our dataset that contains customer feedback comment fields; look through them, draw out themes, associations, and look for trends." Solving that with a deterministic program isn't a trivial problem, and it is likely cheaper solved via LLM.
jimbokun
10 hours ago
That is a very complex, high level use case that takes time to configure and orchestrate.
There are many simpler tasks that would work fine with a simpler, local model.
freehorse
7 hours ago
Not all tasks require models like opus. If they do not, then it is more efficient to use cheaper and faster models. For most of my tasks now I use the big kimi/qwen/glm models because they are cheap and good enough, if not even the smaller locals ones.
I would say that for a significant part of the current market open-source models are good enough to fill a part of it.
joefourier
13 hours ago
Ever hit your daily limit on Claude Code and saw how expensive it is to pay per token?
girvo
5 hours ago
All the time now… it’s wild how little usage you get with Opus on the Pro sub now haha
sidrag22
13 hours ago
maybe there isnt, but as understanding grows people will understand that having an orchestration agent delegate simple work to lesser agents is significant not only for cost savings, but also for preserving context window space.
scoopdewoop
14 hours ago
That isn't true. In a Codex or Claude Code instance, sure... but those are not the main users of APIs. If you are using LLMs in a service for customers, costs matter.
wongarsu
13 hours ago
For coding I want the best. Both I and $work do lots of things besides coding where smaller models like qwen3.5-27b work great, at much lower cost.
Aurornis
14 hours ago
The market for API tokens is bigger than people like you and I (who also want the best) using then for code.
There are a lot of data science problems that benefit from running the dataset through an LLM, which becomes bottlenecked on per-token costs. For these you take a sample subset and run it against multiple providers and then do a cost versus accuracy tradeoff.
The market for API tokens is not just people using OpenCode and similar tools.
wolttam
13 hours ago
Nope. I get very good results from GLM 5 and 5.1. I’m not working on anything so complex and groundbreaking that I need the best.
Coding is a rung on the ladder of model capability. Frontier models will grow to take on more capabilities, while smaller more focused models start becoming the economical choice for coding
girvo
5 hours ago
GLM-5 is surprisingly good to be fair. Punches well above its weight IMO
regularfry
13 hours ago
Everyone may want the best, but the amount of AI-addressable work outstrips the budget available for buying the best by quite a wide margin.
noman-land
13 hours ago
OpenCode allows for free inference tho.
wolvoleo
12 hours ago
Not really. It depends on the usecase. For private stuff I'm very happy to take what was SOTA a year or 2 ago if I can have it all running in my home and don't have to share any of my data with some sleazy big tech cloud.
The price is a concern too of course. But privacy is a bigger one for me. I absolutely don't trust any of their promises not to use data for training purposes.
esafak
13 hours ago
That's only because current models don't saturate people's needs. Once they are fast and smart enough people will pick cheaper ones.
jstummbillig
12 hours ago
> Initial reactions are mostly anger from everyone who didn’t realize that the play along was to give away the smaller models as advertising, not because they were feeling generous.
The naivety around this has been staggering quite frankly. All of a sudden, people thinking that meta etc are releasing free models because they believe in open access and distribution of knowledge. No, they just suck comparatively. There is nothing to sell. Using it to recruit and generate attention is the best play for them.
miki123211
11 hours ago
I thought Qwen was releasing open-weight because China can't compete with America (because of people's privacy concerns), so the only thing they could do is salt the ground economically with open models, and make sure everybody loses.
Apparently that wasn't actually the play here.
zozbot234
11 hours ago
Qwen is actually a pretty strong player in the Chinese market. There is an implied "salt the ground" play but it's mostly from hardware makers, who are trying to keep the big AI players honest and also stand to gain if local inference becomes popular.
Gracana
12 hours ago
I don't think there's so much naivety. People can be aware of the the plan and still be frustrated and disappointed when it happens.
GorbachevyChase
7 hours ago
I’m waiting for the grand reversal where Anthropic abuses the Qwen API to train the next Haiku.
Aurornis
10 hours ago
For a brief moment there were a lot of comments about how Chinese tech companies are our saviors in the age of AI because they were releasing their models. It was an edgy contrarian take that was getting a lot of traction, mostly from commenters who were unfamiliar with Alibaba and thought it was the anti-Big-tech
zozbot234
12 hours ago
I'm not frustrated or disappointed, we have lots of models from Qwen already. We haven't really lost anything. And plenty of players only release "smaller" models anyway, so it's hardly unprecedented.
dev_l1x_be
13 hours ago
How stupid somebody has to be to mix up Opus with Qwen?
cieplok
12 hours ago
OP didn't say about confusing Opus with Qwen but rather people being confused about Qwen3.6-Plus not being available as an "open weight" model available for self hosting.