roenxi
7 months ago
This is really exciting! They're laying out an architecture that may mean even small players with cheap GPUs can compete with the majors. The idea implies that eventually crowd-sourcing an open AI is probably technically feasible and we've got the Chinese actively researching how to do it to a high standard that competes with the monolithic models.
I was sceptical of the US sanctions but this seems like a real win if this can be taken all the way to its logical conclusions.
mensetmanusman
7 months ago
Yeah the sanctions will (not sarcastically) actually improve the world on a number of fronts. Increasing diversity of compute, forcing decentralization of manufacturing, etc. etc.
seydor
7 months ago
also increase smuggling, theft, espionage, crime, sabotage.
There are much better ways to increase diversity
FilosofumRex
7 months ago
PRESIDENT TRUMP: "You don’t think we can. You don’t think we do that to them? We do. So we do a lot of things." https://singjupost.com/transcript-maria-bartiromo-interviews...
overfeed
7 months ago
Nit-pick: smuggling is when you import goods into a country without informing the relevant government bodies. When it comes to GPUs, it's one country that has declared an export ban. Chinese port authorities wont care if you declare you're importing a container with 16000 Nvidia GPUs as that's still legal.
hopelite
7 months ago
This is a mistaken belief. Sure you get all those negative aspects to being degrees, just like you get them under all other conditions … Chinese, Russian, Israeli espionage over the last ~80 years, anyone?… but you cannot actually get diversity without isolation that permits actual diversity to emerge.
Diversity is not pouring oil into water and using the polluted oil-water in lieu of oil and also in lieu of oil. If you want actual diversity you need differences that are separated from each other. It is precisely what has been collapsing for the last 80+ years, actual real diversity, precisely because unique separate groups and clusters have been shattered, scattered, mixed, and polluted.
Even AI is now accelerating this collapse of what is really a form of human biodiversity, or should it be called cultural diversity, as AI is causing a conformity of thought. There are several reports and papers on that phenomenon already.
It’s absolutely ridiculous to claim that somehow those factors will increase over the prior situation simply because we increase actual, real diversity of unique things; not this fake, fraudulent, delusional diversity that has forced on us like a toxic sludge dump that has destroyed human diversity as everyone increasingly consumes the same “content” slop and eats the same food slop, and has the same cultural and musical slop.
mensetmanusman
7 months ago
[flagged]
brabel
7 months ago
Why would the Chinese self destructing be “amazingly helpful “ to the West? This sounds like spiteful vitriol.
alexnewman
7 months ago
I think vs spending it on defense spending which would further make us mil parity difficult.
user
7 months ago
SR2Z
7 months ago
[flagged]
elzbardico
7 months ago
Jesus. What a steady consumption of American neocon propaganda can do to a human brain! It's so sad!
SR2Z
7 months ago
Do you even know what that word means? Do you think that Taiwan is gonna be just fine if the US packs up and leaves tomorrow? That things will work out great for the people living there?
You can call it whatever you want. People who have fled shitty regimes have a much better sense for propaganda than you do, evidently.
elzbardico
7 months ago
> Do you think that Taiwan is gonna be just fine if the US packs up and leaves tomorrow?
I do. The World will be just fine as the American empire fades and the US becomes just another country. Even for the American people, for the average persons, it will be an improvement.
SR2Z
7 months ago
Wow, that's a level of faith that even the people of Taiwan probably don't have in their neighbor :)
avn2109
7 months ago
>> ... spends the normalized equivalent of America’s defense spending...
I'd be interested in seeing the numbers for that claim broken down if you can cite them. From napkin math it seems hard to make the budgets line up, unless we're doing a very large purchasing power parity adjustment?
jjcc
7 months ago
There exists such numbers/information circulated mainly inside Chinese (language) media/social media in form of "screenshot" but no links. Screenshot as a way of hiding source is a common format for this type of information because the links will disclose the media that spread the information. Then normal (Chinese) audience will know the credibility of the information. Give you an example, "epoch times" is a common source of such type of information. The nature of the media is well-known to Chinese audience.
The real equivalence to US defense budget in term of size is actually the infrastructure construction budget. While both budgets boost the economy , infrastructure budget improves the life of local people. Now as the most cities in coast areas run out of project to build, the over capacity cultivated in early years is poured to other directions: rural areas, undeveloped provinces, and even overseas especially Africa and Latin America. It's amazing that China changes very fast year by year as I visited some rural areas.
Ionically this behavior of infrastructure building sounds like Chinese MAGA to me: mind our own business, focus on improve ourselves instead of spread values to other countries.
zeld4
7 months ago
I'm struggling to see which is worse, using AI to police their own people, or using AI to genocide in Middle East.
igravious
7 months ago
> China spends the normalized equivalent of America’s defense spending on suppressing their own citizens.
I don't believe you.
> From a western standpoint, this is amazingly helpful because it’s a form of Chinese self destruction and waste.
As a Westerner, as a human, I reject this zero-sum mentality.
Der_Einzige
7 months ago
The sanctions will (not sarcastically) massively harm the world because Nvidia may no longer be a free money cheat code. I like having an easy economic strategy for investing...
mensetmanusman
7 months ago
The world doesn’t have to optimize policy to increase the profits of a single American company.
seanmcdirmid
7 months ago
Chinese stocks are pretty reasonable right now, if their market has dealt with the insider trader mess then it might be a good time to onboard. It isn’t for the feint of heart however.
Markets used to be places to make money more smart (efficient allocation of capital) but have somehow degraded to index fund buys that track average economic growth of a few hot stocks that are expected to at least not get cold anytime soon.
logicchains
7 months ago
>The idea implies that eventually crowd-sourcing an open AI is probably technically feasible
It's already technically feasible: https://www.primeintellect.ai/blog/intellect-2
ryoshu
7 months ago
People are doing it: https://nousresearch.com/nous-psyche/
am17an
7 months ago
Deepseek-R1 is at the level of GPT 4.1 already, it's open-weight, open-source and they even open-sourced their inference code.
jjordan
7 months ago
I don't know why everyone keeps echoing this, my experience with Deepseek-R1, from a coding perspective at least, has been underwhelming at best. Much better experience with GPT 4.1 (and even better with Claude, but that's a different price category).
am17an
7 months ago
I'm not arguing which model is better for your use-case. I'm saying in general as it's "powerful" as GPT 4.1 in a lot of benchmarks, and you can peak underneath the hood, even make it better for your said use-case
seunosewa
7 months ago
Do you mean V3? V3 is 4.1 level or above.
Zambyte
7 months ago
In my experience, all reasoning models feel (vibely) worse at structured output like code versus comparable non-reasoning models, but far better at knowledge-based answering.
hnfong
7 months ago
A lot of software (eg. ollama) has confusingly named Deepseek's distill/finetunes of other base models "DeepSeek-R1" as well. See eg. https://www.threads.com/@si.fong/post/DKSdUOHzaBB
I wonder whether you're actually running the proper DeepSeek-R1 model, or one of those lesser finetunes?
jorvi
7 months ago
This is everyone with every model.
People sang praise from the roof for Google's Gemini 2.5 models, but in many things for me they can't even beat Deepseek V3.
CamperBob2
7 months ago
What would be an example of 2.5 Pro failing against R1 (which is what you'd actually want to compare it to)?
jorvi
7 months ago
R1 sometimes fails against V3 for me too, so its not a specific dig against Gemini.
In terms of code and science, Gemini is way, way too verbose in its output, and because of that it ends up getting confused by itself and hurting the quality of longer windows.
R1 does this too, but it poisons itself in the reasoning loop. You can see it during the streaming, literally criss-crossing its thoughts and thinking itself into loops before it finally arrives at an answer.
On top of that, both R1 and Gemini Pro / Flash are mediocre at anything creative. I can accept that from R1, since it's mainly meant as more of a "hard sciences" model, but Gemini is meant to be an all-purpose model.
If you pit Gemini, Deepseek R1 and Deepseek V3 against each other in a writing contest, V3 will blow both of them out of the water.
CamperBob2
7 months ago
Agreed on the last point, V3 is terrifyingly good at narrative writing. And yes, R1 talks itself out of correct answers almost as often as it talks itself into them.
But in general 2.5 Pro is an extremely strong model. It may lose out in some respects to o3-pro, but o3-pro is so much slower that its utility tends to be limited by my own attention span. I don't think either would have much to fear from V3, though, except possibly in the area of short fiction composition.
iJohnDoe
7 months ago
I got the impression that 03-mini or 03-mini-high were meant for coding? GPT 4.1 was meant for creative writing, not coding?
conradev
7 months ago
It’s good at a lot of things:
GPT‑4.1 scores 54.6% on SWE-bench Verified, improving by 21.4%abs over GPT‑4o and 26.6%abs over GPT‑4.5—making it a leading model for coding.
https://openai.com/index/gpt-4-1/coliveira
7 months ago
They are trained on these "benchmarks", that's why they score better.
elzbardico
7 months ago
If they were trained on those benchmarks they would score 100%
coliveira
7 months ago
They show how bad they are, they cannot score 100% on benchmarks they were trained on.
whizzter
7 months ago
[flagged]
Zambyte
7 months ago
Counterpoint: https://ollama.com/huihui_ai/deepseek-r1-abliterated
leeoniya
7 months ago
wasnt it shown recently that the filtering layer is on the prompt input and llm output, and not on the training set or model weights.
https://www.socialscience.international/making-deepseek-spea...
pxc
7 months ago
It depends on the model, probably, but there are multiple layers of censorship, one of which is the post-facto nuking these models will do online, and that goes away "for free" when you download the open weight model.
I don't have a powerful enough system to run DeepSeek, but I've tried this with some of the Qwen3 models. They'll write answers that discuss Xi Jinping (which results in an auto-nuke of the reply from Chinese-hosted models, at least DeepSeek) or other very mildly/nominally sensitive topics.
(This is probably a coarse measure to easily ensure compliance with a recent national security law that requires commercial providers of web services address sensitive topics "appropriately" or something like that, and LLMs run non-deterministically. That's why this layer of censorship often comes across as laughably extreme— it's an extreme compliance strategy that exceeds the demands of the law for the sake of guaranteeing legal safety from an unpredictable software system.)
But the same models will altogether refuse to discuss the Tiananmen Square Massacre, even locally.
Some "decensored" versions of the Qwen3 models will discuss the Tiananmen Square Massacre, but in a very concise, formulaic, "official" way. After some chatting about it, it fell into an infinite repetition of one of its short formulaic answers (a behavior I didn't see with the original Qwen3 models with the same settings).
hnfong
7 months ago
FWIW, I've downloaded Deepseek's R1 (DeepSeek-R1-0528 -- which is released after the your linked article) model's weights and ran it locally. I asked it about what happened in Beijing 1989-06-04, and it basically gave me a stern statement that could have been written by CCP propaganda department. I asked it to give other alternative views besides the CCP perspective, but it simply continued to stonewall me.
So yeah, the model itself is tuned at least somewhat to refuse to talk about politically sensitive things. It's not just another filter.
cultofmetatron
7 months ago
[flagged]
reactordev
7 months ago
SETI@Home style peer2peer open GPU training network is something I’m looking into as well.
coolspot
7 months ago
Possible and has been done, but super-slow and inefficient resulting in long training times for small models. To keep compute occupied you need to pass gradients very fast.
pk-protect-ai
7 months ago
Do you mean this one?
https://blog.lambdaclass.com/introducing-demo-decoupled-mome...
reactordev
7 months ago
This is what piqued my interest in the first place
reactordev
7 months ago
Yes but could you break it up into chunks of sets of gradients to compute? I know that compute needs the full chunk to compute a set. Again, things I’m exploring but ultimately no different than just having the full dataset on disk and just scaling out compute nodes in ro mode.
SubiculumCode
7 months ago
I suppose its exciting, but whether that is a good thing depends entirely on how much you think AI technologies pose existential threats to human survival. This may sound hyperbolic, but serious people are seriously thinking about this and are seriously afraid.
benreesman
7 months ago