kamranjon
2 days ago
DeepSeek continues to not only push the boundaries but also publish these incredible papers explaining how they achieved their gains - something the American labs no longer do unfortunately. Chinese labs are doing the most interesting work in AI right now.
sigmar
2 days ago
>publish these incredible papers explaining how they achieved their gains - something the American labs no longer do unfortunately.
Google is still releasing a lot of llm architecture research. They introduced speculative decoding of LLMs in 2022[1], then released the code to perform sceculative decoding for their Gemma 4 model this year[2]
[1] https://arxiv.org/abs/2211.17192
[2] https://github.com/google-gemma/cookbook/blob/main/docs/mtp/...
kamranjon
2 days ago
Thanks for the clarification - Google does publish more than others - and I actually really appreciate the work they are doing with the Gemma models, which are truly competitive open models. I do wish they’d publish more in depth papers on their Gemma models but appreciate that they are open weights.
DiabloD3
2 days ago
They weren't the first to do MTP like this, and arguably did it wrong: the MTP heads are kept in a separate file and have to be welded in by the inference engine.
Qwen 3.6 shipped with working MTP first, and had working MTP in llama.cpp first.
spijdar
2 days ago
Given the MTP drafter is basically a separate model, keeping it separate makes more sense IMO. It's out of my wheelhouse but it seems like you could adjust the MTP drafter model separately from the main model, too.
Ultimately though the real explanation, I think, is Google doesn't care since for their own purposes (in LiteRT-LM), they do bundle them. As far as I know, anyway.
DiabloD3
2 days ago
MTP models share internal state with the main model, and also refer to parameters in the model.
They are more like a single model that has two separate attention head mechanisms.
girvo
2 days ago
Being grafted onto the main model reduces layer duplication that you’d otherwise have: at least for Step and Qwen 3.6
alfiedotwtf
a day ago
Step 2.7’s MTP seems broken (at least for ik_llama.cpp) where the draft model starts and ends in block 3 but ik_llama bails out looking for block 0 :(
girvo
a day ago
Aw that’s a shame; I’m running the official llama.cpp on my Spark-alike, and it works great now. Proper triple head too which is what it is trained on, gets me up to 35-40tk/s decode
anaisbetts
2 days ago
I mean just like GGUFs aren't technically necessary yet are _way_ more convenient than using Safetensors and configuring the default Jinja prompt by-hand, it makes sense to bundle the draft model too. For all intents and purposes, the only people who will train a draft model are the people who train the original model
kcb
2 days ago
Nvidia's Nemotron 3 Super also shipped with MTP.
janalsncm
2 days ago
They also shipped Gemma models with their new Matformer architecture which allows for dynamic computation.
sieabahlpark
2 days ago
[dead]
tomalaci
2 days ago
Probably because American AI companies are on the hook for quite a lot of investment money. I think they are trying to find the magical moat to justify their valuation.
Revealing optimizations similar to these would pretty much reduce their competitive position.
lwansbrough
2 days ago
Chinese labs are also still behind, so they’re incentivized to collaborate and have no reason to do it in private.
I suspect their tune will change if they ever take the lead..
c7b
2 days ago
The question is also what game they're playing. Deepseek came out of a hedge fund. I think it's no coincidence that their publications tend to have a large impact on AI stock prices.
Destroying the growth story of overvalued stocks is an interesting investment strategy. It's not even new. Shortsellers understandably get terrible rep from execs, but their actions are more often in the public interest than you'd think. Normally it's exposing fraud, but here we get the really fortunate side benefit of what could eventually amount to the most significant contribution to the general software community since Linux.
CharlesLau
16 hours ago
I always see these malicious speculations without evidence, simply because these companies are from China.
merelydev
2 days ago
> The question is also what game they're playing. Deepseek came out of a hedge fund. I think it's no coincidence that their publications tend to have a large impact on AI stock prices.
Its revealing that they always seem to publish after some big announcement by American AI companies. But regardless, this is one of the benefits of a duopoly.
nozzlegear
2 days ago
No more revealing than OpenAI, Anthropic and Google always having some new model that just so happens to be waiting in the wings whenever their competitors announce their own model bump.
mycall
2 days ago
That's because OpenAI, Anthropic and Google work on many models in parallel which work cooperatively from the user's POV. So GPT-5.6 is just a checkpoint of their multi-model development.
nl
2 days ago
I think you are reading a lot into this.
There's always an announcement by one of the frontier labs.
user
2 days ago
jingpostmedia
2 days ago
The framing that Chinese labs open-source because they're behind assumes it's purely a competitive tactic. But there's a structural dimension: DeepSeek operates under a completely different funding model than US labs. They're backed by a quantitative hedge fund that views AI as infrastructure, not as a product to monetize directly. The ROI for them comes from trading alpha, not API revenue.
Chinese AI companies also face a domestic market where open-source distribution is often the only way to reach enterprise clients who won't pay SaaS premiums. The business logic aligns with openness in a way that US labs' VC-funded models don't.
NitpickLawyer
2 days ago
> They're backed by a quantitative hedge fund that views AI as infrastructure, not as a product to monetize directly. The ROI for them comes from trading alpha, not API revenue.
That used to be true, but now they've raised ~7B$, so we'll see how / if that changes.
disgruntledphd2
2 days ago
Yeah, they were in a tough position though. All their competitors were offering equity and they didn't.
yogthos
2 days ago
Also, we’re seeing a classic commoditization spiral with open models rapidly closing the gap and driving prices towards the marginal cost of inference. The reality is that models themselves are general commodities and there's just not enough difference between them. A company can get ahead of others by a few months, but then the rest quickly close the gap. It's a really low margin business because there's no way to differentiate yourself.
Chinese companies understand this and they're treating models as shared infrastructure akin to Linux. The money is going to be in customization niches. Companies will charge to tune models for specific use cases and charge support for that. There's also going to be money at the bottom for hardware vendors making chips and memory. But the middle tier of generic LLMs is seeing involution where there's relentless competition driving profits towards the bottom.
localdeclan
2 days ago
And OpenAI has the audacity to call themselves a non profit. All leading US models are closed source for one purpose, money.
try-working
2 days ago
Nope. It is purely a marketing and distribution strategy. Without open sourcing their models, their businesses would have never gotten off the ground. I've written about this here: https://try.works/writing-1#why-chinese-ai-labs-went-open-an...
foltik
2 days ago
[dead]
taneq
2 days ago
[dead]
oefrha
2 days ago
Which is a good thing. Self-serving motives are more reliable than altruistic ones.
intended
2 days ago
The world runs on incentives. Altruism/Self-serving are down stream of that.
Wikipedia is altruistic, and serves humanity quite well.
theturtletalks
2 days ago
Open-source is also altruistic. If DeepSeek does become self-serving once they get the top spot, it doesn’t take away from the altruistic contributions that they made towards open models.
wqaatwt
2 days ago
> Open-source is also altruistic
Contributing to it might not necessarily be. Most open source development is funded by large companies after all and from their perspective it can function as a cost saving measure. Allowing them to focus on their core products and removing the possibility of their rivals from getting a competitive advantage due to having a superior low level stack under their product.
Which is why open source is so successful in areas where software is a cost-center but mostly failed for consumer products (since spending resources on them would actually be altruistic unlike e.g. Linux kernel development)
spongebobstoes
2 days ago
altruism is not discernable from the outside
any altruistic act can be perceived as self serving
brookst
2 days ago
And ultimately the motivation for those contributions just doesn’t matter, except to those who like to anthropomorphize company and argue about their souls.
threethirtytwo
2 days ago
No parent is right. The core root driver of the world is capitalism, open source exists downstream of that.
Software engineers need money to survive. If they exclusively work on open source stuff where are they getting money from to survive? Follow the money trail… even a donation… eventually it leads to an incentive based source or action.
mejutoco
2 days ago
> If they exclusively work on open source stuff where are they getting money from to survive?
From open source. You can earn money from open source. Open source is not opposed to capitalism, idk where you got that idea.
antonvs
2 days ago
> Open source is not opposed to capitalism
You’re right. One of its big uses is for big companies to destroy the ability of individual developers or small companies to make a living selling software. It’s also used to harm larger competitors whose revenue depends on some non open source software. Essentially, open source is the new “dumping”.
That’s not to say open source is bad. But in a capitalist context, it’s certainly a very double-edged sword.
threethirtytwo
2 days ago
Young blood, allow me to explain.
I said open source is derivative to capitalism. Meaning open source cannot exist without capitalism. I never said they oppose each other.
Second I said you need to follow the money trail. Money given to people who work on open source comes from non-open source places.
mejutoco
a day ago
> I never said they oppose each other.
Are you not implying below, with your words, that working exclusively on open source cannot bring money?
> If they exclusively work on open source stuff where are they getting money from to survive?
You seem to imply that open source is incompatible with making money. You seem to believe that if someone is making money they are not doing "exclusively open source" but something else in addition to open source.
> you need to follow the money trail. Money given to people who work on open source comes from non-open source places.
Money spent on coffee comes from non-coffee places, mostly. Does that mean one cannot make money exclusively selling coffee?
I get your point, it is very uncommon to live only of open source. That I can agree with. It is the exaggeration and dogmatism that is untrue.
threethirtytwo
19 hours ago
> Are you not implying below, with your words, that working exclusively on open source cannot bring money?
No I am implying that it in itself does not generate money because it is given away freely. Open source can draw donations but that is not part of a transaction and the source of that money is from incentivized labor.
> You seem to imply that open source is incompatible with making money. You seem to believe that if someone is making money they are not doing "exclusively open source" but something else in addition to open source.
Usually yes but not necessarily. They may exclusively do open source but then in this case they must be supported by some kind of monetary source. That monetary source will be a for profit incentive. For example: donations. Donation money must be generated by for profit ventures. Or salary money paid by companies for employees to work on open source. That salary money is generated from for profit ventures.
> Money spent on coffee comes from non-coffee places, mostly. Does that mean one cannot make money exclusively selling coffee?
Dude. You’re missing the point and making a wrong analogy. It is not about where the money comes from but whether a transaction occurred. When you sell coffee you make a transaction. When you give away software… no transaction occurs. You lose time effort and money for nothing in return. Thus the effort must logically be supported by something else. I am exclusively talking about a logistical issue with a logistical consequence. That’s it.
> That I can agree with. It is the exaggeration and dogmatism that is untrue.
Where did I exaggerate or dogmatize anything? I am talking about a logistical reality and you’re getting emotionally worked up over it. Don’t fucking accuse me of shit I didn’t do.
unknownfuture
2 days ago
> If they exclusively work on open source stuff where are they getting money from to survive?
These are orthogonal. One can have a paid job while contributing to open source for entirely altruistic reasons.
> Follow the money trail… even a donation… eventually it leads to an incentive based source or action.
BS. Humans do things for altruistic reasons devoid of individual reward all the time.
I, myself, maintain multiple OSS projects entirely for fun and with the hope that others will find it useful. That's it, that's all. I also donate entirely anonymously to charities simply because I believe others deserve support and dignity.
This form of cold, American libertarianism you espouse is pure poison in the body politic, both in this US and globally. It degrades all of human interaction to transactions. Its no wonder that the US is where sociopaths like Zuck were birthed.
threethirtytwo
2 days ago
[flagged]
unknownfuture
2 days ago
I'm quite certain I was criticising a set of ideas, not you personally.
That I misunderstood your point in context is a different issue, in which case, yup, my mistake.
threethirtytwo
2 days ago
Bro, this:
> This form of cold, American libertarianism you espouse is pure poison in the body politic, both in this US and globally. It degrades all of human interaction to transactions. Its no wonder that the US is where sociopaths like Zuck were birthed.
Was an outright attack. But whatever, all water under the bridge, stuff like this happens. I might attack someone too if I misinterpreted something. Not a big deal.
> That I misunderstood your point in context is a different issue, in which case, yup, my mistake.
Thank you for taking the extra time to understand.
operatingthetan
2 days ago
>Talking like this is not only against the rules here but it is some of the most vile and direct insults I’ve ever fucking read. And it doesn’t even stem from us disagreeing. It stems from you misunderstanding what was said. Why don’t you read over what I wrote and my explanation before making such a stupid comment. Let me be clear. You’re not stupid, but your reply is stupid. And your reply is stupid because of a misunderstanding. So make yourself understand and clean up your act because shit like this dies worse damage for the world than psychopaths. More wars are started over misunderstanding and uncalibrated anger than actual psychopathic tendencies.
I suggest you practice a much greater degree of self awareness.
threethirtytwo
2 days ago
I am self aware. I’m fully aware of the potential feelings that what I said could evoke but reality is reality and a forum is one of the few places we can still talk about reality without cancel culture or feelings muddling everything up.
I’d rather speak the truth and what I believe in rather than cater to the feelings of people who cannot face objective reality.
And I didn’t openly or directly insult anyone. I criticize where it’s deserved and where it is true. He personally attacked me and I criticized his attack as appropriately as I could.
If you disagree with my premise, attack my argument. Don’t make it personal by telling me to be self aware and replying to a section of my comment not meant for you.
threethirtytwo
2 days ago
Responding to the operatingthetan sibling reply because for some reason it’s dead (he just deleted it):
> Your comment is highly uncivil because it relies on aggressive profanity, condescending directives, and absurdist hyperbole to attack the other person. It lacks self-awareness because you hypocritically engage in the exact same hostile, rule-breaking behavior you are condemning.
From my interpretation here he attacked me. A defensive response is appropriate, but one where I do not attack his character directly but one where his unjust actions are criticized honestly. Unfortunately criticism is always somewhat condescending and for people to even be receptive of it you need to match the tone.
1. Profanity helps illustrate my point and I never directed at him.
2. I disagree with hyperbole. I think you’re outright lying to me here. So this is on you. You’re exaggerating things. He literally associated me with psychopaths and I called what he did a “misunderstanding”. So really you need to be more self aware, not me.
3. I never commented on my own rule breaking behavior and my stance on the rules of HN. I just commented ON his rule breaking behavior just as you are commenting on my behavior while ignoring your own.
I think this is enough. I’m not perfect with the rules here but this is becoming a bit of flame war and I’m fucking done. I’m ejecting myself from this thread.
user
2 days ago
user
2 days ago
microgpt
2 days ago
Is it though? A large number of people get to experience a lot of power over others because they moderate Wikipedia. That's certainly why some of them do it, just like on Reddit
threethirtytwo
2 days ago
This statement is factually true and you are voted down because many people lack knowledge.
Any individual that provides free labor cannot survive off of said free labor. He must work for money to survive or get donations from someone who earned that money from incentive based labor in order to even buy the food he needs to exist as a living human being. Much of the time that labor is actually closed source.
This is a logistical reality. A lot of open source advocates are unable to get their brains out of the whole mentality that open source literally cannot exist without incentive based software supporting it. Who pays for GitHub to exist? Who pays for the food swes eat? I just code for open source all day and money falls out of the sky.
My smart friend says there are jobs that pay you to work on open source exclusively. Smart guy. In this case you follow the money trail. How does that company get enough money to pay a guy to work exclusively on open source?
docfort
2 days ago
Free labor enables capitalism, especially if you consider labor arbitrage as a mixture of free labor and properly compensated (according to the real value) labor. From literally being born, to family culture, education, and whatever level of broad social cohesion, it’s all free labor. Without that background, money itself loses its value, since an individual cannot have reasonable confidence in trading it for something of actual tangible value. It is abstract stored value, banked into society for free. Indeed, in many cases, the free labor is in the rational self interest of a group. But stability and love and peace aren’t monetized to their true value. Otherwise, markets should be much less stable. Bubbles are only notable for the large impact of a small group of bad actors. Overall, it’s pretty amazing what free labor does. Open source is just another instance of this long and critical tradition.
threethirtytwo
2 days ago
Free labor is derivative to incentivized labor. Your statement here doesn’t disprove or counter what I said. Again, follow the money trail. Everything you said if you follow the origin of the money it comes from paid, incentivized labor. Parents need money to raise kids… where do they get that money?? Our economy is called capitalism for a reason there is literally zero reference to charity or altruism in the vocabulary or even standard models that describe our economy and economic theory.
Put it another away: if we removed your ability to do incentivized labor and all you can do is charity work… you would run out of money and die from starvation. If we did the opposite and we removed your ability to do charity work… you’d be fine.
All of this re-emphasizes the point of this thread: In our objective reality, the world is driven by incentive based work while altruism is a side effect of surplus wealth generated by incentive based work. That is the fundamental reality.
docfort
a day ago
I know this is such a late reply, but for clarity: my foundational point is the exact opposite hypothesis. Free labor enables incentivized labor. Economic theory has a neat concept of externalities, a necessary mechanism that only that which can be valued can be traded. If we removed all economic activity, then after most people die, the remnants would revert to much earlier models of free cooperation existing in small communities. Our world does require huge numbers of people to forcibly cooperate to create resources to then allocate.
My one other point is that incentivized labor is not the same as the value it creates. Indeed, it must be less. Otherwise, our economic system could support only a fixed number of people (subsistence), which would decay inevitably because there is no margin for error. But my point is that margin in reality isn’t fully realized, even by trillionaires, because then there would be no growth to support more people growing in their standard of living. There must be slack in this distributed system and the slack wasn’t valued: it’s free labor. It’s mixed in with incentivized labor, so I understand if you reject the premise entirely, but I do believe this is the essence of modern (specialized) capitalism. If skilled workers try to optimize or invent, more resources will be available for distribution for the same incentive (i.e. “worker productivity”). You can say “yeah that’s their job,” and I can say “that productivity wasn’t fully monetized because otherwise productivity would be lower overall.”
So, incentivized labor presupposes free labor, and economic productivity is a mix of free and monetized labor.
nozzlegear
2 days ago
I hate to quote pithy proverbs, but "the road to hell is paved with good intentions." One can have an altruistic goal which ends up harming people too, which is where that proverb comes from. Prohibition and The War on Drugs in the US are two good examples of something that had altruistic origins[†] but ended up doing way more harm than good.
[†] Another problem with altruism: we don't all agree on whether a goal is altruistic, and what's altruistic in the enactor's eyes might not be in yours. Curating a fountain of human knowledge like Wikipedia? Probably altruistic. Protecting humanity from itself by installing your company as the stewards of frontier LLMs? Not so altruistic in my view.
intended
2 days ago
[dead]
dragonwriter
2 days ago
> Prohibition and The War on Drugs in the US are two good examples of something that had altruistic origins
The War on Drugs had the purpose (not just in its origin but in its perpetuation) of inflicting harm on elite-disfavored subsets of the population that could not be openly targeted for Constitutional reasons, which is about as far from an altruistic reason as it possible to get.
nozzlegear
2 days ago
Yes, that's my point. What looks like altruism to one person is not altruism to another, and those causes can be used by bad actors.
Der_Einzige
2 days ago
Go read Max Stirner. True "Alturism" doesn't exist. It's all egoism, even if and especially if you think it's not.
user
2 days ago
localdeclan
2 days ago
Agreed
nubg
2 days ago
Very interesting take
broodbucket
2 days ago
Look at how far OpenAI has drifted from their original mission. Everything comes back to greed, so it's ideal for the world if selfish motives happen to coincide with what's good for the world, like advancements in open models
spongebobstoes
2 days ago
can you elaborate? the original mission was "advance digital intelligence in a way that benefits all of humanity"
I don't see an inconsistency. money is pragmatic, the mission needs money
threethirtytwo
2 days ago
Every company on the face of the earth has a mission statement involving some bs goal that sounds altruistic. For a good example look at googles mission statement.
The real mission statement for most companies is to make as much money as possible.
spongebobstoes
20 hours ago
Google is only a good example because you can point to major inconsistencies between their mission and their actions
OpenAI might have the same flaw, but you need to demonstrate it, not just assume it
roenxi
2 days ago
It's a standard take since it is how markets tend to work. They aren't powered by altruism, it is a big system for turning greed into good results. We don't have all this stuff because people suddenly woke up one morning and decided to be nice.
breezybottom
2 days ago
Yes but there's more to the world than markets.
wqaatwt
2 days ago
On aggregate mainly because humans often tend to behave “irrationally” due to various reasons though
lelanthran
2 days ago
I don't understand what is interesting about it: it's the default.
Markets don't run on altruism.
FooBarWidget
2 days ago
The standard is applied very inconsistently. Nobody accuses the local bakery of being motivated by profit, and that they don't bake bread for you out of altruism.
woctordho
2 days ago
And humans don't run on markets.
wqaatwt
2 days ago
Mostly they kind of do since we do live in an utopian society of unlimited abundance. Extremely few people can afford to (or want to) spend a very large number of working hours without ever getting anything directly in return for it.
idiotsecant
2 days ago
I think you made a typo of saying do instead of don't and totally reversed your argument
throw1234567891
2 days ago
Neither on altruism.
AlecSchueler
2 days ago
Isn't it the entire basis of capitalism?
amelius
2 days ago
You mean more predictable, not more reliable.
IshKebab
2 days ago
I don't think so. I can confidently predict that altruism will give you a very unreliable income stream in the vast majority of cases.
threethirtytwo
2 days ago
Disagree. It’s More reliable.
rrvsh
2 days ago
Could you explain? (asking in good faith)
jmyeet
2 days ago
Projection is a funny thing. It causes people to misread situations all the time. Southern slaveowners feared violent retribution from freed slaves, for example [1]. It was pure projection and said more about the South than it did the slaves. The reality was there was no violent retribution. It was the opposite where the former slaveowners continued to inflict violence on the formerly enslaved.
I say this because we see the same thing used as an argument against China. "If they overtake us, they'll do imperialism (like us)." Again, it says more about us than them.
A better reading (IMHO) Of the situation is that China believes that AI shouldn't be used simply to mint a few more trillionaires but the benefits should be shared with society. Why do I say this? Because we now have 70+ years of China doing exactly that. The transformation in China all the way from rural villages to Tier 1 cities has been utterly astounding. China has lifted ~800M people out of extreme poverty.
In some ways we're at a similar point to the late 1990s and 2000s when Microsoft execs complained that Linux, being free, destroyed intellectual property value. Linux should be a perfect example of how people can and do act altruistically, or at least not in a way to bait-and-switch to enrich themselves.
[1]: https://www.reddit.com/r/AskHistory/comments/1d26grm/in_the_...
FooBarWidget
2 days ago
It's even worse than that. China publishes stacks upon stacks of policy documents in which they explain clearly what they will do and why. This includes why they do poverty alleviation and why they believe big monopolies that own everything are bad. But almost no western observers care to read those documents. Instead, western observers, including HN, speculate endlessly about China's intentions, and "it would be naive to believe they would not do X" or drawing equivalences to Soviet Union or whatever. And the "journalists" sell this notion that Chinese state intentions are "untransparent" and "unknowable" while pretending the policy documents don't exist.
Meanwhile, Xi Jinping has published his 5th book on how governance in China works and what they're after. These are not books written for a western audience: they're compilations of speeches that he already gave to the Chinese party and state apparatus, so the contents are not sanitized for foreign audiences. But there are no English reviews of summaries of this 5th book at all by the usual China experts that distribute what western audience know about China.
This extends to beyond the government. Even though "for the people but only against the government" is an often-heard mantra, nobody seems to listen to what Chinese AI companies themselves say about why they publish open models. DeepSeek and GLM have said multiple times publicly what their motivations are, yet people on HN still speculate like they usually do.
Truly mind-boggling. I get that a lot of people don't like China. But setting aside the question of whether their dislike is justified, it would at least be rational to properly understand China, even if it's to defeat it. And listening to what China says themselves is absolutely essential for proper understanding. But people don't bother to? And they seem mostly happy with sticking to speculations that match preconceived notions, even if that hurts their chances of defeating China.
isoprophlex
2 days ago
Extremely interesting comment, thank you. Got some links where I can download this source material? I don't read or speak the language, but will try interrogating it with an LLM
FooBarWidget
2 days ago
The fifth book is on Amazon. https://www.amazon.com/XI-JINPING-GOVERNANCE-CHINA-V/dp/7119... It's already an English translation.
For something shorter, you can see Arnaud Bertrand's recent review. https://arnaudbertrand.substack.com/p/the-book-the-west-refu... The review is behind a paywall, but not expensive.
If you want to read policy documents directly (primary source), try the State Council / Chinese government policy database: https://www.gov.cn/zhengce/ and https://sousuo.www.gov.cn/zcwjk/policyDocumentLibrary
They also provide official translations: https://english.www.gov.cn/policies/
For Central Party documents: https://news.cn/politics/zywj/. It lists recent Central Committee / General Office / joint Party-State documents, e.g. 2026 documents on township duty lists, Party member development rules, carbon evaluation, long-term care insurance, and SOE leadership rules.
isoprophlex
2 days ago
Thanks again, this is more than enough for a clanker-assisted rabbit hole to disappear into
krackers
2 days ago
> The review is behind a paywall, but not expensive.
I think the author wrote a twitter post with a summary of the content, and someone on twitter who had read the original Chinese source also chimed in with a summary
jmyeet
2 days ago
I 100% agree with you and want to add something.
If you simply take what the Chinese government says at face value, you will be correct way more often than 95% of Western policy wonks, media talking heads, "analysts" and so forth. Because, like you say, they tell you everything they're doing.
In the recent US-China summit, Xi Jinping just came out and used the Thucydides Trap metaphor, which tells you everything about where China thinks it is and where it sees the US going, which is to become increasingly belligerent as their power declines. Now whether or not you agree with that assessment (I do agree), it still tells you China wants to avoid open hostilities, it sees itself as continuing to rise and it fears what a declining US might do.
FooBarWidget
2 days ago
The Thucydides Trap mention is different from what you describe. Xi has dismissed the Thucydides Trap multiple times in the past as being hearsay and self-imposed bias (https://www.globaltimes.cn/content/944179.shtml). "We should strictly base our judgment on facts, lest we become victims to hearsay, paranoid or self-imposed bias. There is no such thing as the so-called Thucydides trap in the world. But should major countries time and again make the mistakes of strategic miscalculation, they might create such traps for themselves."
But western politicians keep raising this metaphor. So at some point they're like "okay we'll speak your language". They then used this metaphor to make the point "our rise isn't the threat, your fear of it is. If you resist it you're walking right into the trap Thucydides warned about". So your conclusion is still right, they don't want open hostilities, a stable world is in their interest.
Then western media ran away with this and were like "OMG Xi mentioned the Thucydides Trap", completely ignoring his point.
overfeed
2 days ago
> And they seem mostly happy with sticking to speculations that match preconceived notions
...you've almost achieved enlightenment on the nature of the majority of HN comments: vibes-based, off the cuff braindumps, where an idea is examined as it is being typed. It's great for tech and software discussions where many commenters have good knowledge or even mastery, but on "exotic" topics, an incorrect take can be voted to the top because it sounds right by affirming the biases of the majority. If you're a practitioner in a non-tech field and a topic in your field comes up for discussion on HN, be ready to be disappointed - and be ready to question all the other correct-sounding comments in areas unfamiliar to you.
CMay
2 days ago
Aspiration for the masses and policy reality are different things. They will say monopolies are bad, but what they mean is only if the state doesn't control them. The CCP has huge state run monopolies and there's nothing you can do about it. You can't understand their policy without understanding Marxism-Leninism. When pesky reality gets in the way of policy, it's called corruption.
In communist or post-communist countries like China and Russia, the percentage of government workers is extreme. To them, all social action is political action, and that includes economic. Since there is only one party allowed, any economic action (and thus political action) which threatens the CCP is unacceptable. They leave other private companies alone.
The CCP is bad and this conclusion is justified not only in theory as a distant observation of ideological concepts, but from their behaviors around the world which echo some of the causes of World War 2. That doesn't mean Chinese AI companies are bad, but the CCP will certainly find ways of using it for its purposes. For now, they're quite far behind on AI, but they deserve credit for optimizing for some use cases which masks poorer generalization.
FooBarWidget
a day ago
[flagged]
t00l3
2 days ago
I had the most ironic rollercoaster ride thanks to your comment.
I copied it into DeepSeek because I figured who's better to teach me about greatness of Chinese government policies if not the most popular Chinese LLM?
Anyhow, it must have detected _something_ in your comment because Chinese censorship policy kicked in and DeepSeek refused to talk about it. Funny because I would wager the overall sentiment about China's abilities to govern in your comment was positive but okay.
I literally asked it "expand on it so I can learn more about Chinese policies" and that was enough to get censored!
Anyhow, after saying few times "huh? but I want to learn what's great about Chinese policies!" it finally gave me response in... Mandarin. So I asked it to provide me that information in English and... it refused. Talk about difficulty of finding any materials in English ;-)
After starting a new chat and using plenty of positive adjectives to make sure I don't want to learn a single bad thing about China, I finally got a list. Looks like China is 100% successful in everything they do! How neat!
So I said "That's cool. Does this set of policies have one name? Can you recommend any books about it? In English of course" and...
...request denied again.
I mean, maybe it says more about how horrible DeepSeek is for this kind of research but boy, it was so ironic I now have stack of iron at home.
FooBarWidget
a day ago
Chinese censorship rules are not about criticizing vs praising the state. They are about avoiding any kind of social controversy or collective action. They don't care when you talk about these topics in private or small circles, even criticism is fine, they just don't want them to spread far no matter whether it's positive or negative.
t00l3
a day ago
[dead]
FooBarWidget
a day ago
I can't reply to your reply so let me do that here.
In Netherlands, the government wants to reduce emission, so it incentivizes people to isolate their homes better, and to use heat pumps instead of gas heating. On the other hand, if you actually try to install a heat pump, you'll run into all sorts of regulation issues: the unit can't be too big, there are only a few specific places where you're allowed to place it, a ton of people can object to it, permitting takes years if at all. Oh and if you isolate your house, then voila, during the current heat wave it's a constant 35 C in your home. So you try to install AC and you run into permitting/regulation issues. So you then use a super inefficient portable AC that just barely lowers the temperature by 3 C and uses 4x more energy, and that's fine. facepalm
And the government and banks also want to combat money whitewashing, so they incentivize people to use digital payments and discourage cash. Police could look at you suspiciously merely for having too much cash on hand. On the other hand, NATO and also a bunch of government agencies are warning about war and encouraging people to have lots of cash at home for emergencies.
"They" do not "clearly" want one or the other. Different government branches can have different, conflicting priorities.
The Netherlands is tiny. China has 1.4 billion people, and its state apparatus is orders of magnitude bigger. Forget about coordinating the population, even coordinating the tens of thousands of local government bodies has always been a huge problem. All the previous dynasties have said that governing such a large country is a nightmare.
Xi is not personally in charge of the censorship bureau. The top government sets broad direction and KPIs, while local governments and government agencies are given a lot of leeway for implementation as they see fit. And frankly you cannot run a large organization any other way — there is no large company in the world where the CEO micromanages everything without burning out. The KPI is "social stability", and as long as this is kept and there are no grave problems like corruption, it's not the top government's job to dictate how the censorship bureau do their work. Of course, you may be of the opinion that something like "freedom of speech" is more important than "social stability", but the point is that they value "social stability" more, and that they're motivated by that, and by not some idea of "suppressing freedom". This ties directly into my point of properly understanding them.
Furthermore, many people tend to be risk averse, and would rather instinctively deny something than to take chances. There was a famous scene in the Jiang Zhemin days in which Jiang said something frank in some meeting with a foreign politician. Then the cameraman was like "uuh should we record this?" and his boss was immediately like "no, cut it away". Then Jiang said "why shouldn't we record this? of course this should be recorded!" This risk-averse attitude is still pervasive in a lot of places. It's not just DeepSeek that's "paranoid", everybody implementing censorship rules is paranoid similarly. On Xiaohongshu/RedNote they don't want you to talk about societal issues at all, even "positive" things like "I think Taiwan belongs to China" — they recently banned a Taiwanese's account for saying stuff like that, they want you to focus on travel and food or whatever. This attitude likely won't change until the current censorship bureau generation retires, and gets replaced with the next generation that's more confident.
Finally, whatever Poland did pre-1989 has absolutely nothing to do with China. There are no similarities in motives or circumstances. You can't just lazily lump random Soviet-era countries together with China just because you give them both the "communist" label. China's adaptation of and motivation for adoption of communism is wholly different from the Soviet Union.
CMay
2 days ago
This is a really bad, poorly thought out take, and historically inaccurate. Also, we're not afraid China will somehow be like us. If they were like us, that would be great. The problem is that China is already behaving like it is ready to abuse the power it has on a scale that the US never has.
Russia, China and Iran all make public statements as if they abide by international law and everyone else are the law breakers, while their measurable actions are shockingly contradictory.
It is their actions that have caused the US reactions, but many people present them as if they occur in a vacuum.
The AI race sits within this context, with a constellation of concerns that most people do not think about. Of course the AI engineers have their own motivation, the companies will share some of that motivation combined with their business trajectory and governments will get involved with it for the justifications they see.
parineum
2 days ago
> I say this because we see the same thing used as an argument against China. "If they overtake us, they'll do imperialism (like us)." Again, it says more about us than them.
Or because they're human and that's what humans have always done. If the US is no longer a check on China, what will happen to Taiwan?
Frankly, you seem to be arguing that the US is somehow uniquely bad when, in actuality, The US has been hegemonic during a time of incredible peace and minimal imperialism.
r14c
a day ago
Peace for who? Like just look at the last 50-70 years of US intervention in LatAm. Backing a series of coups and extremely violent right wing dictatorships.
The issue is one of incentives. The US needs cheap foreign labor because of deindustrialization policies in the 60s and 70s. These were arguably passed as a check on labor power since socialism was still looking potentially ascendant at the time. Whatever the reason though, the contemporary US is reliant on keeping foreign wages down and domination of the oil trade. Imperialism grows out of this need for external resources to maintain economic growth. It would be less relevant to us if we had better fundamentals, but we traded those away to avoid letting certain demos get wealthy and powerful.
China's play is more mercantile. They benefit most from stable trade conditions. They get richer the more customers they have. They benefitted massively from becoming an industrial trade partner with the US in the 60s and 70s. Because of this, they have completely different foreign policy objectives. All they need to do to win is normalize relations and build trade infrastructure. Its way cheaper than imperialism.
parineum
a day ago
> Peace for who? Like just look at the last 50-70 years of US intervention in LatAm. Backing a series of coups and extremely violent right wing dictatorships.
For the world. Compare the those 70 years to the previous 70 years. Regime change and intervention is significantly better than full scale invasion, total war and colonization of other people.
> China's play is more mercantile.
Because they are held in check by US power. Remove the US and China is taking what it wants.
r14c
5 hours ago
Eh a death squad is a death squad tbh.
I mean maybe, but China's industrial strategy isn't well served by war. I just don't see the structural incentive.
cauefcr
2 days ago
resters
2 days ago
They are focused on the things you do when you are not over-capitalized and you can’t get unlimited nvidia hardware to train on. And the results speak for themselves.
Meanwhile we in the US are blocked from buying Huawei GPUs and retirees are boasting about the nvidia in their portfolios.
tw1984
2 days ago
> Chinese labs are also still behind, so they’re incentivized to collaborate and have no reason to do it in private.
US labs in Google, Meta and SpaceX are not leading, none of them managed to build something on par with GLM 5.2.
Care to explain to me why they still don't collaborate and still choose to do it in private?
vidarh
2 days ago
I'm not sure I'd put Google in that list, but either way: Because they think they have enough capital that they can catch up and don't need the reputational boost of this.
CuriouslyC
2 days ago
As good as Gemini's visual intelligence is, it's a terrible agent.
7speter
2 days ago
Google at least still releases open source models to the public.
re-thc
2 days ago
Thank Apple?
Those are mostly for embedded devices and the current "sponsor" is Apple.
VorpalWay
2 days ago
Aren't they only open weights, not true open source?
HarHarVeryFunny
2 days ago
The concept of open source doesn't really apply to AI models since their behavior is mostly controlled by the data they were trained on and the complex ways they are trained. Having the source code of the model by itself wouldn't help you.
From a practical POV having all the training data, training infrastructure, and training know-how wouldn't help you either unless you could afford to spend the millions of dollars (hundreds of millions for a SOTA model) in compute to train it each time they released a new training set, in which case you're only talking about the big commercial companies. "open source for the people" just does not apply.
VorpalWay
a day ago
If (and that is a big if) the concept of open source doesn't apply, then the term shouldn't be coopted to mean something else though.
But even if I can't build it from source locally, being able to see what went into the model is an important part of what open source is about.
HarHarVeryFunny
a day ago
> If (and that is a big if) the concept of open source doesn't apply, then the term shouldn't be coopted to mean something else though.
Yes, but for whatever reason this usage seems to have stuck. Open weights is definitely a better name. I assume the reason "open source" has stuck is because you can download and use it for free, but "open source" was always intended to be about "free as in speech", not "free as in beer". That said, I remember when the term "open source" was invented, and it was always a bit different, more commercially aligned, than the goals of the FSF.
> But even if I can't build it from source locally, being able to see what went into the model is an important part of what open source is about.
True. Unfortunately LLMs have become such a big money and closed enterprise (the opposite of OpenAI and Anthropic's altruistic founding principles) that it's hard to see these commercial models releasing their training data, especially since this data is the closest thing they have to a moat other than the cost of training.
The most valuable training data right now seems to be "reasoning data", and the need for this at least may disappear as AI moves beyond pre-trained language models to smarter systems capable of learning for themselves, and that can actually reason, not need to parrot reasoning data.
nullc
11 hours ago
Publishing RL/SFT/self-distillation harnesses would be very impactful even without the data.
Particularly when it comes to tool use w/ self-distillation it can be done without any data... have a tool the model doesn't know? a teacher model RTFMs and the source code, and helps the student learn to get it right.
wqaatwt
2 days ago
Gemini 3.1 is still up there, though? If Google started to compete on price they could be very successful.
disgruntledphd2
a day ago
It'll be their inability to build coherent products that dies them in, not their models.
budsniffer952
2 days ago
Wait, are you claiming that these companies haven't contributed to the ecosystem via research and open source?
lwansbrough
2 days ago
No idea I don’t work there.
threethirtytwo
2 days ago
Also, historically, China has always viewed intellectual property as public property. Similar to open source.
re-thc
2 days ago
> Chinese labs are also still behind, so they’re incentivized to collaborate and have no reason to do it in private.
Even if they're ahead they don't have enough GPUs to scale. Open sourcing is hence a good strategy to at least get market share (even if not $).
user
2 days ago
arj
2 days ago
Not everyone is motivated by greed
tirant
2 days ago
What do you think is the underlaying motivation?
arj
2 days ago
You ask me what I think. So far deepseek has been very consistently trying to advance state of the art research in a transplant and public way by writing papers and publishing working code. They are also not at the mercy of the stock market in the same way many Americans companies are. Before anyone assumes too much, I live in Europe.
lwansbrough
2 days ago
Not a greed thing. It’s a national security thing.
DANmode
2 days ago
Are they behind in models, or behind in VC money to burn on subsidized compute offered to the public and early customers?
Genuine question.
alexander2002
2 days ago
True!
colordrops
2 days ago
So the marketplace is working.
abc123abc123
2 days ago
This is the way! Open source models will benefit, and once open source models reach the state of "good enough" the hyped up US AI companies will fear, since the availability of free, good enough, AI models will set the ceiling for how much they can charge. Then the bubble will pop.
VorpalWay
2 days ago
You mean open weights, I guess? There are as far as I know very few open source models, the training data is seldom released. Sadly.
skeledrew
2 days ago
Regardless of where they are, the Chinese will always share their progress, as they're collectivist/cooperative at their core, compared to the individualistic/competitive US.
davedx
2 days ago
I don't really see the moat for frontier AI labs being "more efficient models" although that could help their margins - I think moats will be built by expanding the horizontal and vertical market expansion - like Anthropic is doing the most at the moment
cromka
2 days ago
I seriously am far from fear mongering and doomsday mentality, but I just can't see how OpenAI and Anthropic can have a successful IPO if the quality gap between the free and paid continues to narrow like that...
cyanydeez
2 days ago
[flagged]
72027372920
2 days ago
[dead]
2838383838
2 days ago
[flagged]
zhoBEENG
2 days ago
For real. Reading old comment threads makes me sad, because the level of discourse was so much higher in the past. Although this place is still deeply appreciated, it’s clear that its culture is going monotonically towards reddit.
Is there anywhere public anymore that isn’t being overrun by lobotomized p-zombies (partisan zombies)? Is it even possible to make such a public space? Ressentiment consumes all discourse.
speed_spread
2 days ago
Yet accumulation of power by a very small elite through state and selected corporations happens to be a defining characteristic of that political regime.
cyanydeez
2 days ago
you're right, full of corporate sock puppets shilling their vapor wares, idly dreaming that the world isn't what it is.
janalsncm
2 days ago
Chinese labs are also forced to find performance optimizations since they are aren’t allowed to buy the best chips.
baxtr
2 days ago
Who is financing DeepSeek and what are they expecting in return?
nmfisher
2 days ago
Until recently, DeepSeek were self-financed (it was a spin-out from a hedge fund). They just raised ~50million RMB (US$7bn), and according to media [0] (which admittedly can be unreliable), the lead investors were:
1) The CEO himself 2) Tencent 3) CALT (the battery company) 4) NetEase (internet/media company) 5) JD.com (ecommerce) 6) Chinese investment firms
What are they expecting in return? I'd say the same thing that all those investors in OpenAI and Anthropic are expecting - profit.
[0] https://finance.sina.com.cn/stock/vcpe/2026-06-11/doc-iniazi...
disgruntledphd2
a day ago
My understanding of Deepseeks raise is that it's basically do that they can give equity grants, as they were losing lots of people to competitors.
baxtr
a day ago
If they’re expecting profits like the VC backed US companies how come they behave so differently?
nmfisher
a day ago
The investment round only closed in the last few weeks, it would have had zero influence on anything up to & including DeepSeek V4.
Whether that now changes, who knows. It appears the CEO will remain the largest investor/shareholder, though, so I'm hoping not much changes.
gniv
2 days ago
I don't think this question would get to the reason. There could be one or two persons in charge who simply shape the culture of the company, including how much to publish.
archerx
2 days ago
They are self financed, the company that makes DeepSeek is a finance company that trades on the markets.
rsanek
2 days ago
The CCP's approach has historically been to subsidize their companies far more than other countries do. Why would LLMs be any different?
https://www.oecd.org/en/data/dashboards/magic-database-indus...
declan_roberts
2 days ago
Access to everything every American company feeds into the AI is well worth it to the CCP.
u8080
2 days ago
According to EU statistics, yeah
dgellow
2 days ago
OECD isn’t the EU.
And regarding the dataset:
> Unlike most OECD databases, which rely on government data provided at country-level, the OECD MAGIC database uses firm-level data. The subsidy estimates included in the database are based on raw data obtained from firms’ annual reports, financial statements, bond prospectuses, IPO prospectuses, etc. The data are collected and verified manually by the OECD to maximise accuracy, consistency, and comparability. In some cases, additional information is also obtained from government databases, either to verify the firm-level information or to complement it. Care is taken to avoid double-counting where the data mix corporate and government sources.
nixon_why69
2 days ago
Does that figure hold up when we look at Silicon Valley financing? Uber alone was subsidized to the tune of billions. Let alone the recent batch where we're into hundreds of billions.
eagleal
2 days ago
Even the latest World Bank report, the defacto neoliberal institution, recognized a couple of months ago that leaving the industries focus be dictaed by purely capital decisions was bad, as in _really_ bad.
baxtr
2 days ago
Even if they were fully self-financed, which isn’t the case, they would expect something in return.
nickthegreek
2 days ago
You can give them money by using their api. Just because their model is open, doesn’t mean they are a non profit.
archerx
2 days ago
Not everyone has the American “fuck you got mine” zero sum game attitude. Also they’re making some of the American and European AI companies look bad which they can leverage with their trades if they wanted to.
bushido
2 days ago
IMHO to promote that China believes in free markets and making the technology available to all.
Which will likely help them bolster the sales of the MANY new AI chips in development/use in China to international markets. Dislodging Nvidia.
Kinda the opposite of what Jensen Huang (Nvidia) thinks US is doing: https://www.youtube.com/shorts/u3SY8nvjhQA
Edit: I'm a fan of deepseek and believe it's good to make the technology open/available. And do think that also help business - which I support as well.
Edit 2: No idea why I'm getting downvoted. That's also their official stance https://english.www.gov.cn/news/202601/08/content_WS695f1b55...
panny
2 days ago
Short AI companies
???
Profit!
Not suggesting this is it, but you know, one possible angle.
bluerooibos
2 days ago
> Probably because American AI companies are on the hook for quite a lot of investment money
That's a lot of words to say it's just capitalist greed.
spacebacon
2 days ago
[dead]
budsniffer952
2 days ago
Do you think that DeepSeek are building their models for free, or something? They aren't "on the hook" for anything?
What's with all the China glazing about this stuff? They release some open-source work and people act like they are suddenly the beacon of freedom and transparency.
abc123abc123
2 days ago
This is incorrect binary thinking. Them releasing open source can be good, but that does not commit you to think that china or chinese companies are saints. There are many shades of grey here and one does not exclude the other (nor include it).
budsniffer952
2 days ago
Are you reading the comments?
13415
2 days ago
I think there are some sockpuppet accounts active but what also contributes is that many people are absolutely fed up with US technological hegemony and welcome alternatives to core technologies from elsewhere.
1over137
2 days ago
Not just US technological hegemony, but the USA has threatened to invade Europe (Greenland) and Canada, and has actually invaded Venezuela and Iran. China hasn't. Maybe lots of people that live in those places are now switching sides.
dgellow
2 days ago
Over the past 2y the US also started a trade war with Europe, triggered the worst oil shock the world ever experienced for no reasons, threatened to leave NATO, tried to force Ukraine to give up its territory to the invading country, and way more
7speter
2 days ago
I’m think its in our best interests to lever these american ai companies to exhibit at least some degree of freedom and transparency anyway we can…
herodoturtle
2 days ago
Publishing by necessity I wonder? American labs on the cutting edge pioneering the way forward, so Deepseek open sourcing what they’ve got is to help even the playing field.
Hopefully the experts here can offer insight. The above is just my hunch and I’m not a specialist in this field.
try-working
2 days ago
Yes, challenger Labs publish out of necessity. It is a marketing strategy. People assuming open source means giving something up, but the reality is that Z.ai has a revenue of some $100M and it would be about $0M if they never open sourced their models.
jonplackett
2 days ago
Wouldn’t that just help the American labs anyway though? Or do they assume they’ve actually already figured this stuff out and kept it secret?
vintermann
2 days ago
It used to be the case that NSA hired the majority of all math graduates in the US, and were assumed to be years ahead in cryptography. Yet in the 90s, it became clear that they no longer were that - among other things, the cipher of the notorious Clipper chip was broken, and we can rule out that it was made weak on purpose because the whole point of Clipper was that they had a backdoor.
So, despite hiring the cream of the crop of math graduates, who could read the papers of free academia, but whose own result the free world could not access - they fell behind.
I have a theory explaining why. I think it's because science is an interactive process. NSA cryptographers could read papers, but they couldn't talk openly with the authors of those papers, because of secrecy demands - even asking question might indicate what they were working on. You can easily imagine them spending months on something they could have avoided by going to the original authors and getting told "Oh, we tried that for a long time, it doesn't work".
Whether that theory is right or not, cryptography is a concrete example of a domain where public research with fewer resources beat private research with a lot more resources.
idiotsecant
2 days ago
Everyone in this thread is getting distracted by nationalism, but you hit the nail on the head. In this case for whatever reason the Chinese AI industry is collaborative and the American AI industry is not. This will result in the Chinese companies making progress faster. Full stop. This isn't a judgement on the merits of either system, only an observation of likely results.
tiahura
2 days ago
Hasn't that been the mantra of open source for 40 years. Armies of companies, trillions of valuation, or even just Wayland, suggest that isn't always the case.
eikenberry
2 days ago
So free software can only be considered a successful strategy if every single project succeeds?
idiotsecant
2 days ago
And yet, Linux runs approximately every ounce of computing substrate on earth
tiahura
a day ago
The point that I was responding to was that open sores leads to faster development. It's 2026 and "Next Year will be the year of Linux on the Desktop" since about 2000.
One would have to conclude that there is little correlation b/w openness and progress speed. Sometimes open is faster, sometimes it isn't.
overfeed
2 days ago
The Linux Foundation was bankrolled by the US government (via grants and code donations) to undermine the EU Operating System industry. Symbian was going to be amazing, until Microsoft - an American company with government links - nuked it /s
idiotsecant
a day ago
this is the part where you make a point cogent to the point to which you responded.
parineum
2 days ago
> This will result in the Chinese companies making progress faster. Full stop.
Is this happening? These open models have been a generation or two behind the closed models for quite a while now. They've been keeping pace but clearly behind.
idiotsecant
a day ago
They've been making enormous developments on a tiny fraction of the capital. Right now they've got no reason to devote half the electrical grid to brute forcing models when the Americans will waste their power doing that work and China can distill it for free.
tiahura
a day ago
What happens when they can't just distill from closed models?
NamlchakKhandro
2 days ago
Reminds me of Dot Net in the early 2000-2012... No one collaborated
7speter
2 days ago
From what I gather, the Chinese are behind, but a lot of their research amounts to scrappy, clever discoveries in how to use more novel technologies (for Qwen and Deepseek, its mixture of expert models, that can do inference using a portion of the model at a time). The chinese also distill information from American models, so there’s that.
The American companies, from my impression don’t involve themselves with such lowly “hacks” because they have so much money to just push forward with doing everything on big heavy models that run on the most cutting edge nvidia chips that they can, the moment, kinda sorta get on demand (I say that in some degree of jest).
idiotsecant
2 days ago
The American companies would love to develop these 'hacks' because it would make them more money, something they are in existential need of right now.
They don't develop them because they don't collaborate publicly anymore.
Where would the whole industry be if Google never allowed publishing the transformers paper?
It's not a coincidence that the American AI industry grew fastest in capability when it was the most open.
tiahura
2 days ago
Why would they collaborate? Why not defect and just keep theirs private and implement the open ones?
mistercheph
2 days ago
this is not an effective long term strategy in a collaborative environment that is advancing for the same reason that having a private secret fork of the linux kernel with a few proprietary improvements is not an effective strategy.
integrating your own work with the latest public advances takes resources. For one or two small changes this is manageable, but the further you diverge from the public, the cost of maintenance rises exponentially if you want to continue to integrate public advances. when you publish your meaningful advance, you offload the maintenance burden onto everyone else (and they only have to pay a linear cost rather than an exponential one) as it's integrated by default in new work.
In most cases, the (exponential) maintenance cost of integrating public advances with secret ones exceeds the value of the public advances, so most that undertake this strategy of advancing the open frontier in secret don't attempt to integrate continually, but instead try to make a breakaway sprint in isolation to grab a few sticky customers before the unstoppable wave of the public frontier catches up.
This is a pattern commonly seen in university research departments when researchers switch into product development mode, most of these projects are a sprint to advance away from the public frontier once a good idea is found and they do good work and find a few customers for a little while. But if you check back in a few years you won't find an advanced research department but a zombie IP company that brings in a steady income via IP enforcement and a small number of customers for whom switching is too expensive.
7speter
2 days ago
Just a crazy catch 22, it seems
parineum
2 days ago
> They don't develop them because they don't collaborate publicly anymore.
How do you know they aren't doing this stuff? Something has to account for them leading the industry.
_0ffh
2 days ago
I'm afraid I'm even balking at the word "pioneering" in context with US frontier labs. They are probably doing a few new things, right, but they are not blazing any trails for others to follow along, the Chinese are.
d0gsg0w00f
2 days ago
Or if the US labs are innovating, they're not talking about specifics.
_0ffh
2 days ago
Oh, I assume they're innovating - it's what I meant with "doing new things".
But the word pioneer comes from French pionnier, literally “foot soldier”, a soldier who goes ahead to prepare the way.
If you don't publish you may be advancing, but you're not preparing anyone's way.
epolanski
2 days ago
Chinese papers and techniques have been very influential and copied by US labs.
Multi-head Latent Attention (MLA), Multi-Token prediction, MoE architecture are some of the most famous examples.
HarHarVeryFunny
2 days ago
MoE is from Google (Noam Shazeer)
MTP is from Meta
Another DeepSeek advance that the west are copying is DeepSeek Sparse Attention (DSA)
xgk
2 days ago
Mixture-of-Expert (MoE) was introduced in the 1990s [1, 2], see also [3, 4]. The idea was that MoE scales up model capacity and only introduces small computation overhead. MoEs did not become viable for high-performance applications until sparse routing was integrated with modern deep networks, made possible by large-scale distributed computation. The breakthrough came with the development of sparsely gated networks [5], which showed that it is possible to maintain model accuracy while activating only a small fraction of a large parameter network during both training and inference.
[1] R. A. Jacobs, M. I. Jordan, S. J. Nowlan, G. E. Hinton, Adaptive mixtures of local experts. (1991)
[2] M. I. Jordan, R. A. Jacobs, Hierarchical mixtures of experts and the EM algorithm. (1993)
[3] L. Xu, M. Jordan, G. E. Hinton, An alternative model for mixtures of experts. (1994)
[4] S. Waterhouse, D. MacKay, A. Robinson, Bayesian methods for mixtures of experts. (1995)
[5] N. Shazeer, A. Mirhoseini, K. Maziarz, A. Davis, Q. Le, G. Hinton, J. Dean, Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. (2017)
HarHarVeryFunny
2 days ago
Yes - I meant as applied to LLMs/Transformers.
skeledrew
2 days ago
> Publishing by necessity
It's more a cultural thing. Sharing progress is just in their blood.
idiotsecant
2 days ago
This is overly simplistic to the point of glazing. Plenty of Chinese companies maintain industrial secrets to gain an advantage.
skeledrew
a day ago
Yes there are Chinese companies which maintain industrial secrets. Doesn't change the general cultural tendency that they prefer to share.
rvz
2 days ago
Exactly. They did not have to open up their research up and this is what happens when smart researchers are forced to squeeze performance gains out of existing hardware.
They don't have TPUs or access to the latest Vera Rubin GPUs either to get performance gains for free. All of the optimizations Deepseek have done are in software and it goes down to the PTX assembly level.
Compared to Anthropic who are celebrating in fixing a flickering issue in a terminal app which took months to fix.
HarHarVeryFunny
2 days ago
> All of the optimizations Deepseek have done are in software and it goes down to the PTX assembly level
DeepSeek are still using NVIDIA (PTX) to train on, but for inference have already transitioned to Huawei Ascend chips, and inference speed is what this paper is addressing.
yorwba
2 days ago
Anthropic almost certainly also has optimized software down to the assembly level, considering this take-home interview challenge they published: https://github.com/anthropics/original_performance_takehome/... which is all about instruction-level performance optimizations. That they don't prioritize UI fixes just means they consider other things more important.
lelanthran
2 days ago
Unlikely: that product is written completely by AI, of which they are not lacking.
More likely is that an AI generated codename is impossible to fix by humans, and SOTA was not able to figure it out until now.
lionkor
2 days ago
that's pretty silly to use as a measure of what they do internally
saagarjha
2 days ago
It's pretty representative of what they do internally
lionkor
a day ago
You know this from? Any sources? I'd love to learn more because it would be one of the very few industries that still write assembly by hand extensively enough to warrant hiring experts on just that.
saagarjha
a day ago
My source is that I work on this at a non-frontier lab and also I interviewed with that team
lionkor
13 hours ago
Okay that's fascinating. Can you share what kind of things require this? Where are compilers and extensive profiling not enough? Is it just very hot right loops, or larger routines? Is it for CPU or GPU?
saagarjha
9 hours ago
Taking a step back: I think a lot of people have a misunderstanding of this space. Despite what the "coolest baddest hackers" on social media might have you believe, performance engineers are not thinking about assembly in that they are writing assembly by hand all day. They most certainly know how to do so, and sometimes they end up having to do it themselves, but the goal is always specific workloads and how to make them run as fast as possible, with as little work as possible. If I could have Claude take my model and spit out a perfectly fused kernel for it that I knew was correct and hit 99% MFU I would just use that (well, actually I would probably retire at that point).
Until that happens this remains an unsolved problem, so my job is to take the description of what needs to be done and find which code is on the cold setup path and can just be some PyTorch or whatever the ML researchers can write themselves, and also which part of the algorithm is where all the FLOPs are. As things get more performance critical and run more, I look at the code closer and closer. In the core of the hottest kernels, where most of the work happens, I might be placing individual instructions by hand, or going even below that and thinking about cache behavior or power characteristics.
A good performance engineer is capable of doing this while also being able to find places where they can automate this process. And there are a lot of things you can automate: layouts, schedules, pipelines. There's a lot of work going for compilers and profilers for all kinds of accelerators. Some of these operate on the "assembly" but there are all kinds of assembly. Some of these tools do almost everything for you; some are a very thin layer over the code they generate. You can see this in the interview that was linked above: it's an assembly optimization task, but you will get better results (in the time provided, at least) if you do compiler-like things. IIRC the assembler already operates on named values and in my submission I had extended the instruction selection algorithm to pack bundles based on hazards.
vidarh
2 days ago
> Compared to Anthropic who are celebrating in fixing a flickering issue in a terminal app which took months to fix.
It's funny, because if you ran Claude Code on a slow terminal, the cause of the flicker was obvious: They kept dumping the entire history of the chat back into the terminal in a number of situations, and relied on the terminal to them end up in the correct state.
saagarjha
2 days ago
All frontier labs are working down to the PTX level (and lower)
gmerc
2 days ago
Deepseek is commoditizing the performance gains US labs rely on to make their investors money.
jmyeet
2 days ago
Chinese companies (and labs) operate in conjunction with the CCP so whatever they're doing, it's because it's Chinese state policy.
What became clear when DeepSeek came onto the scene was that China was seeking to commoditize LLMs. They consider it an issue of national security not to be beholden to US tech companies when it comes to AI. And I, for one, fully endorse this policy.
Another data point on this is the black market for Claude tokens in China [1]. The chat logs themselves are a commodity to train models.
I believe that OpenAI in particular is a bet on a trillion dollar pot of gold that doesn't exist. Google, Microsoft, Amazon and Meta will all be fine. Anthropic is in a far better position than OpenAI (IMHO) but if DeepSeek or some other Chinese open weight model gets as good at coding, they're in real trouble too.
anon373839
2 days ago
I don’t see how Anthropic is in a better position. They have a slight edge in model quality right at a time when we’re getting a taste of what cheap, “good enough” AI looks like. They don’t own their own compute. And their own arrogance and lies have alienated a huge chunk of their customer base and alerted everyone to the dangers of being dependent on them.
jmyeet
2 days ago
I personally think not owning their own compute is going to be an advantage.
There is a meteor headed towards all this AI investment that I don't think has been properly accounted for and that is, what happens to all the existing hardware investments when NVidia's next architecture comes out. Blackwell (H100/H200) is the current generation. Rubin (R100, presumably R200) is the next and arrives soon. Now a lot of the investment hasn't been spent yet so will likely be spent on Rubin but at that point, what happens when the next iteration comes out and does 3-4x the compute for the same electricity input and same hardware cost?
Also, what happens when people can run way bigger models on consumer hardware in 5 years? The effective limit for useful local LLMs is currently ~31B parameter models because the RTX 5090 has 32GB of VRAM and Apple's shared memory architecture, which can keep bigger models in memory, just doesn't have the raw processing power.
Anyway, why I argue Anthropic is in a better position (than OpenAI) is that they seem to have captured a market that may well be profitable for them as a company, specifically Claude for coding. So they just haven't burnt quite as much cash as OpenAI so aren't in as deep of a hole.
While I think local models are going to improve maassively over the next few years, running them in a data center at scale is always going to be cheaper for a company. Why? Because they can amortize their costs by running 24/7 and powering them and cooling them is simply cheaper at scale when you're talking about 1000+ engineers who otherwise might only be using their hardware ~40 hours a week.
IMHO Google is in the best position here of all the US companies, even though their models aren't the best, because their data centers are ruthlessly efficient, their homegrown TPUs will eventually catch up (and thus avoid the NVidia tax) and they simply haven't bet the farm on winning AI.
Schiendelman
2 days ago
I'm generally with you on all of these ideas.
However, Google probably won't catch up. Nvidia has been winning in spite of the fact that their hardware is general purpose rather than tuned for inference.
Rubin has architectural differences I don't understand that are supposed to make inference much cheaper and faster while still retaining those other more generic capabilities. Their next generation after that is going to do even better at being fast for inference and general purpose.
Google is betting that their TPUs won't depreciate faster than the markup they have to pay to Nvidia. I don't think they will be right.
Der_Einzige
2 days ago
Why do people who don't follow the prices of A100 talk like they know things about GPU pricing dynamics?
A100s are ~7 years old and going for more than 2 dollars an hour, significantly more expensive than even 2 years ago. This is because anything with 80gb of VRAM or more and made by Nvidia will have economically useful lifespans of like, 10 years.
I could see H100s getting 12 years.
Micheal Berry doesn't know shit about GPUs.
jmyeet
2 days ago
So I was curious about how A100s would do running DeepSeek v4. I can't find any instances of running v4 Pro on even an 8xA100 cluster. So you need to run Flash at ~284B params. A100s don't support FP8 so you're running FP16 so you're taking a hit that way. But I see estimates of 30-50tok/s for an 8xA100 cluster. They're drawing 300-400W each so you're looking at probably 3500+ Watts, which is roughly 0.01tok/W.
Now jump ahead 2 years and you seem to have a massive jump in performance [1]. The tokens/Watt goes up by at least 2 orders of magnitude. And the B100 is 3-4x that. And we're about to hit the R100 (Rubin) cliff.
That's what this is going to come down. When hyperscalar DCs are getting to Gigawatt power usage, it all comes down to power efficiency. Those A100s aren't far from being sold for scrap.
I've been looking into how different companies are handling depreciation for this. Amazon seems to be saying the life is 3-4 years, Google 4-5 and Meta is saying 8+, which I think is wildly optimistic.
[1]: https://lambda.ai/inference-models/deepseek-ai/deepseek-v4-f...
philjohn
2 days ago
You're focussing on inference ... is it not more likely that A100's are being used for training/fine tuning?
tw1984
2 days ago
> Another data point on this is the black market for Claude tokens in China [1]. The chat logs themselves are a commodity to train models.
anyone with IQ higher than 130 (thus qualified for actual AI R&D) would be questioning something obvious here -
if they are already doing such dodgy stuff with the aim to maximize profits, why would those resellers have large amount of logs with actual American model responses to sell to those AI labs in the first place. shouldn't they just post train & customize some leading Chinese open source models to pretend to be Opus or GPT for the vast majority of their users (as classified by some models) who don't know much about expected Opus behaviours & not skilled enough to tell the differences?
that is actually the interesting bit not covered in your censored version of the story line, it is also what happens on the ground. your censored version of the story implies that those dodgy resellers using stolen credit cards, pooling accounts with stolen IDs and illegally selling very personal logs would somehow be honest enough to spend extra $ to ensure their victims (aka paying users) can actually use real Opus and GPT. LOL
dude, you failed this IQ test miserably.
saagarjha
2 days ago
You don't actually need a very high IQ to do AI R&D. More than it takes to post IQ comments on this site, maybe.
jampekka
2 days ago
The galaxy brains in the labs putatively buying the logs wouldn't notice this? Or figure out a structure to prevent this?
tw1984
2 days ago
resellers wouldn't be trying to sell such junk in the first place. they use faked models to avoid the cost of Opus tokens, not to double dip to scam those with arguably the highest IQ in the country.
user
2 days ago
janalsncm
2 days ago
Their R1 paper was really well-done. But I think it leaves out a few details necessary for stable training.
garn810
2 days ago
Yep. It's about time western world realized Chinese are not the "very bad guys under dictatorship"
3abiton
2 days ago
Honestly it's just a hierarchy difference between the two countries. In the US, tech/fin/military companies have the upper hand compared to the government (fragmented between 2 parties). Despite the sharades with Anthropic, Tech-fluencers are in control. Compared to china, the government (dictatorship) has more control over Tech companies (take any example from the past 10 years). For them, undermining the US AI supremacy is an objective, and releasing open weight models is the way, and I'm all for it.
idiotsecant
2 days ago
Let's not get crazy here. You can acknowledge that the Chinese AI industry has some structural advantages right now without trying to claim anything else. China is still a brutal autocracy.
cloudfudge
2 days ago
I don't think it's very common to believe the Chinese people are bad guys. It's the government and its control of the people that's the problem. And no, I don't think the US is immune to that sort of problem either.
epolanski
2 days ago
R1 was very influential on US models development.
user
2 days ago
teekert
2 days ago
I'm deep seeking for that open in OpenAI indeed. It’s clear who’s the most anthropocentric in this space.
thesmtsolver2
2 days ago
This is so out of touch. Go to Neurips or the top AI conferences to see what is happening.
SubiculumCode
2 days ago
If American labs aren't publishing, it doesn't mean they aren't doing even more interesting work.
etdznots
a day ago
So fascinating, cant wait to never hear about or be affected by this research until it’s discovered elsewhere.
I genuinely wonder how it feels to be working your whole life, actual flesh and blood and heart and mind pouring 40 to make something that is a dead-end on the tree of human progress because it’s miserly masters are terrified of sharing knowledge.
Days and nights spent playing pretend human pioneer, when you are a lunatic on an island building towers of coconuts.
californical
2 days ago
You could also come up with a cure for cancer, but if nobody knows what you’ve done then there’s not a whole lot we can say about it
dakolli
2 days ago
Its because our culture worships pieces of paper the government tells us is worth something.
IAmGraydon
2 days ago
Money is just a physical representation of the ability to get what you want. The problem is not money. It’s the fact that we live in a “me” society.
mordae
2 days ago
Nope, people seek it out because government tells them to pay taxes _or else_.
utopiah
2 days ago
It's almost as if ... they were what OpenAI was when it started. Sad to see but glad someone is doing is.
nelox
2 days ago
Doing work ≠ publishing work
OtomotO
2 days ago
The difference between greed and power
pmarreck
2 days ago
They push the boundaries, alright. Of obtaining the results of work without doing the work themselves, which I hate to say it but this is classic Chinese machiavellianist business behavior:
https://www.cnbc.com/2026/06/24/anthropic-alibaba-distillati...
etdznots
a day ago
You mean like training off of pirated copyrighted works for example that Anthropic, OpenAI, and Google stole from the internet?
resters
2 days ago
Thank you so much to everyone at DeepSeek who is working on this and who have the courage and generosity to open source this for humanity.
We in the United States will never forget!
For all the harm Trump does to the US at least he is helping China!
godwinson__4-8
2 days ago
The idea that America is going to stay ahead of China is I think at this point clearly delusional. It's also just such silly framing. Why should 350 million people stay ahead of 1 billion people on the other side of the world? If an AI lab in China cures cancer or something do Americans lose?
So many Americans seem to (at least in theory) be ready to sign up for this ongoing confrontation with China. Does anyone think it isn't America who is poking the bear when it comes to the Thucydides trap? Why not try to get along? It occurs to me the only people more Chinese innovation would hurt are the mega cap class in the United States. Elon Musk certainly doesn't want BYD in the United States. Same story all the way down with these super capitalized AI companies. Most average Americans would probably be better off in a world where the United States and China got along. But its those Americans who will be called upon to suffer most of the burden if that trap ever springs.
thesmtsolver2
2 days ago
By this population-only logic, you should concede that India will overtake China.
Why not talk about how China shut out American companies for decades before complaining about BYD?
As an Indian immigrant, the PRC China has engaged in conflict with almost all its neighbors and stated wars in its short history.
China is not so benevolent when they get to the #1 spot:
https://m.economictimes.com/industry/renewables/china-wto-co...
godwinson__4-8
2 days ago
Its not population only logic, but it does underscore that it is silly to expect the United States to inevitably be ahead.
As for the rest of it:
pertymcpert
2 days ago
I dont necessarily disagree with you on your other points, but the population argument sort of shows a lack of understanding of how China works. It’s not guaranteed at all that they will ever overtake the US.
godwinson__4-8
14 hours ago
It wasn't a claim about "how China works" it was a humanistic point that 350 million people should not envy the progress of 1 billion people on the other side of the world.
It's not in the interests of the vast majority of either country to engage in conflict. American citizens should therefore resist the professed bellicosity of leaders who will never see any consequences, on a battlefield or otherwise. The average American will not be so lucky. Chinese innovation in a world of cooperation need not be less beneficial to Americans than the reverse.
darkoob12
2 days ago
Google and Microsoft publish more than enough and American universities are publishing the science beyond DeepSeek's engineering. That fact that you don't know about them means you're not following the science only reading hacker news.
kamranjon
2 days ago
Google hasn’t published much in depth ML work since T5 (which was hugely influential at the time) - most Gemma releases are 1-3 page model card pdfs these days with no in depth analysis. Even TurboQuant is shaking out to have basically been a rehash of previous work without proper attribution. I do think Microsoft is doing some interesting things with smaller models but haven’t read much research, interested in any refs you might have to share!
darkoob12
2 days ago
Check recent iclr acl icml neurips you will see 10-20 papers from Google Research which are not just simple model cards. they are solid reproducible research.
DivingForGold
2 days ago
Sure, in part by "stealing" from American AI companies with Distillation attacks:
https://yipzap.com/anthropic-accuses-alibaba-of-largest-ai-d...
pennomi
2 days ago
If your moat is “please don’t copy my outputs”, you don’t have a moat. There is no such thing as a distillation “attack”.
steinvakt2
2 days ago
How does it differ from pirating music or movies?
Balinares
2 days ago
According to US AI labs, training on other people's output is fair use. So that's how.
Zigurd
2 days ago
AI training is considered transformational. That's how AI training gets around copyright and it's probably consistent with copyright precedent. For example, indexing the web is considered transformational, even though you can recover the full text of everything in an inverted index.
anon373839
a day ago
There is no intellectual property to “pirate”.
Model outputs don't qualify for copyright. They aren’t patented. They aren’t trade secrets - the companies sell them. They aren’t trademarks, obviously. They are nothing, actually.
pornel
2 days ago
Machine-extruded text is not copyrightable, since there was no human creativity involved in producing it.
(and if you argue the US models do produce copyrighted works, then oooops - whose copyright is it huh?)
steinvakt2
a day ago
I see. Fair point
ReptileMan
2 days ago
That when I pay for a model, the copyright of the output belongs to me. This is as work for hire as it gets.
bethekidyouwant
2 days ago
Ow my head.
pmarreck
2 days ago
How very machiavellianist-libertarian of you.
Don't even try to combine it with any notion of "leadership" then, however, since distillation is literally "copying the actual leader"
orbital-decay
2 days ago
It really isn't, you can improve by distilling a weaker model
anon373839
a day ago
Self-distillation is also a technique.
Jonnerz
2 days ago
US AI companies trained their own models on vast amounts of copyrighted and publicly available content without obtaining permission. There's no moral high ground here.
NitpickLawyer
2 days ago
While I don't agree with your comment being downvoted, I don't think distillation is either an "attack" nor is it "stealing". The idea that someone else gets to decide how I use tokens that I pay for is ludicrous.
Imagine if your casio calculator would come with a ToS that says you can't use it to develop a competitor calculator or any other tools. Or that your hammer can't be used to make other tools. Or, closer to the HN crowd, imagine MS in the 90s saying that you can't use their OS to build competing services to MS. They'd be laughed at and be split immediately if they tried that.
The only thing they can do is to refuse serving tokens (and even that's debatable, if we get to tokens being commoditised). But that's gonna be a game of whack-a-mole, and they know it.
orbital-decay
2 days ago
Besides "attack" being a ludicrous name for distillation, note how your article says "accuses", also it's mostly about Alibaba, not DeepSeek (although it's mentioned there). Both Dario Amodei and Sam Altman publicly claimed that DS used their outputs to train their models, and knowing the differences between all these models by heart, I believe they're simply lying through their teeth to sway the public opinion and/or the policy. These models are absolutely nothing alike, and distillation necessarily makes student's outputs similar to teacher's. This is very visible in Z.ai models (which were trained on Gemini outputs to the point that they repeated Google's conditional prompt injections in the CoT, and later on Claude where it started repeating their CoT as well) and certain Google models which were trained on Claude's outputs in a roundabout way. Distillation always shows up in the result.
And certainly they have no idea whether these outputs (assuming they ever existed and it wasn't made up) were used for training. The article mentions that DS made 150k requests. This isn't much and might have been just an eval or a benchmark to compare their own model against. It's really hard to believe DeepSeek had any Claude outputs anywhere in their training schedule, since it's just too different. Besides training on random vibecode of course, which is mostly written by Claude.
pmarreck
2 days ago
You know what, if someone wants to downvote this guy by claiming distillation attacks are not "attacks" or don't cross some ethical bound (especially since I just posted a similar comment), then go right ahead, but if you're combining it with any notion of "leadership", that's like saying that the person in 2nd place in a bike race who is drafting behind the person actually in 1st place is exhibiting "leadership".
There's no "leader" if, absent someone whose results you're copying, you are an emperor without clothes