DeepSeek makes the V4 Pro price discount permanent

230 pointsposted 6 hours ago
by Tiberium

134 Comments

minimaxir

an hour ago

I'm more curious about the caching:

> (2) For all models, the input cache hit price has been reduced to 1/10 of the launch price. This price adjustment takes effect from 2026/4/26 12:15 UTC.

There is no end date. Currently, it's 2% of the input price for DeepSeek V4 Flash and 0.8% with this new V4 Pro pricing, which is extremely low compared to competitors to the point that it affects the unit economics a bit and I thought it would be temporary.

In the case of V4 Pro, the effective cost is ~$0.04/M input tokens given the caching (based on OpenRouter's metrics: https://openrouter.ai/deepseek/deepseek-v4-pro), which is significantly cheaper than even small models from competitors.

alyxya

4 hours ago

Once they have their own coding agent which they seem to be working towards, I may start predominantly using their models. They seem to be doing all the "right" things, open sourcing models, publishing research, and keeping prices low for everyone.

ammar_x

3 hours ago

You can use V4 Pro with Claude Code [1].

I tried it and it's impressive.

[1]: https://api-docs.deepseek.com/quick_start/agent_integrations...

hbarka

2 minutes ago

The npm install of Claude Code deprecated, since Feb 2026.

maxdo

26 minutes ago

I'm not curious what tasks you tested it for. Im working on coding agent writing code dynamically on request for customers. i'd say code itself very simple and aggressively cached, and patternalized, e.g. we adding lots of hints to the system.

the only real family models that work were claude and openai, surprisingly, for tasks that needs faster speed, gpt 5.4 is very impressive. Deep seek was very average , doing things somewhere in gemini flash 3.0 domain.

KronisLV

2 hours ago

I'm working on a custom launcher for hooking up Claude Code with various providers (groups env variables in profiles) cause DeepSeek doesn't have vision and sometimes I need browser use with screenshots or Opus reasoning, for other tasks it's fine: https://ccode.kronis.dev/

  # After installed (or when run portably with ./ccode)
  ccode init-config
  ccode edit-config
  
  # Run with default profile
  ccode
  # Run with named profile
  ccode --deepseek
  
  # Set default profile
  ccode set-default-profile deepseek
Also turns out that with a local proxy you can get Remote Control working and see the DeepSeek sessions in the desktop app, screenshots on the page. Other than that, I'm happy that it works pretty well and the discount is enough to make me consider going from Anthropic's Max subscription to Pro and using it only where DeepSeek is insufficient. With that proxy I eventually hope to be able to transparently switch models mid-task, if I need Opus for like 5 turns or something.

Overall though I'm not sure exactly how well Claude Code would stack up against OpenCode, since the latter overall feels a bit less hacky with 3rd party models and is even getting niche but nice features like a locally runnable web version: https://opencode.ai/docs/web/

rjh29

an hour ago

How does the cost compare using the API vs the $20/month plans with other providers?

I did some back of the envelope calculations and it seems like you would pay $5/month using DeepSeek directly or $15-20 with OpenRouter or similar. But would be interested to hear real world usage.

thisisit

3 hours ago

I am curious - Is there a way to switch between models depending on the task? Because I believe Deepseek V4 is not multimodal and it will be good to switch back to Claude if vision or other capabilities are required.

mewse-hn

an hour ago

I was looking into something similar because I wanted to test a local model for doing basic coding and smart model (deepseek) for planning.

It's basically not possible with claude code, the api endpoint is a single environment variable and whatever models are on that endpoint are what's available.

HOWEVER, if you run a proxy like LiteLLM, you can configure it to send requests to different api endpoints on the back end and expose them as different "models" on the front end, then configure claude code to switch between those virtual models.

maxdo

23 minutes ago

i've been trying that, in reality every time you try to save it, it's not worth it, the cost of mistake is so high , you can spent 2-3h on just wrong assumption, you lost your time and all the burned tokens.

wiradikusuma

3 hours ago

That's interesting. I thought Claude Code is not as good, therefore people want to use Claude model with other alternatives. This is the other way around.

Which begs the question, regardless of the model, which Claude Code alternative is better? (I keep saying "Claude Code alternative" because I don't know the term... LLM CLI?)

flexagoon

2 hours ago

AFAIK the two most popular open source harnesses right now are OpenCode and Pi. They take a pretty different approach, OpenCode includes a lot of features while Pi is very minimal by design and focused on extensibility, to the point where many people are just asking Pi to write a plugin for itself whenever they want it to have a new feature. I personally like Pi's philosophy more and I think its developer justified the choices really well in his blog post:

https://mariozechner.at/posts/2025-11-30-pi-coding-agent/#to... (the pi-coding-agent section)

rjh29

an hour ago

Author blocks referrals from HN, weirdly dramatic, especially considering they have 1086 karma here. I wonder what we did to them.

g023

10 minutes ago

I use DeepSeek v4 flash with CoPilot and it works pretty good.

wrs

2 hours ago

The common term for a tool that wraps an LLM with a workflow is “harness”.

copperx

28 minutes ago

I love oh-my-pi, but I'm not sure if it's "better". Maybe just as good.

Scarbutt

3 hours ago

Surprised Anthropic hasn't done anything to restrict Claude Code from using other providers.

HWR_14

6 minutes ago

It went the other way, you can't use other harnesses to connect to the cheaper versions of Claude. So clearly they think their current moat is Claude Code use, not the LLM itself.

wolttam

3 hours ago

The value of Claude Code the harness isn't that great. There's a lot of other good harnesses out there.

chandureddyvari

2 hours ago

What’s your favourite harness? Is there any benchmarks for harness like LLMs have for swe verified?

wolttam

an hour ago

You can check my profile for which one I like most :) I do think there have been efforts to benchmark different harnesses.

Personally I'm not going to choose one harness or another based on +/- a few percentage points in a benchmark. I'm going to use one the one that I find the most ergonomic, that isn't too bloated, etc. The models are the primary lever, not the harness.

rane

2 hours ago

I thought so, and then I tried Opencode and Codex and started to appreciate Claude Code a lot more. They've actually done great work with the small details.

koolba

3 hours ago

Good or better? Curious which would be in either bucket.

wolttam

2 hours ago

Probably a matter of taste. I prefer the harness I wrote, I don't want to go near Anthropic's bloated mess of a harness with a 10-meter pole.

cortesoft

3 hours ago

At this point in the AI wars, it is probably better to have more users of Claude code rather than restrict which LLMs it can connect to. Claude code is probably (currently at least) stickier than the LLM model itself. Getting people into the Claude code ecosystem is worth it.

Later, they can always lock it down more or add Claude LLM only features to it.

smoe

an hour ago

Earlier this week I started testing Chinese models on my codebase. I haven’t really looked at interactive coding yet, but more at issue triage, bug auto-fixing, log analytics, etc.

I used DeepSeek, Kimi, GLM, Qwen, and MiMO against GPT-5.5 high as reference, all running in Pi harness without anything installed.

So far, Kimi and MiMO look the most promising to me. I haven’t tested them rigorously enough to make a strong statement, but my first impression is that, in practice, all those models may be less behind on typical daily tasks than people think.

They are a bit “work hard, not smart". Getting to same-ish results more slowly and using more tokens, but at a fraction of the price

comboy

22 minutes ago

I'm doing Chinese learning website I've tested all Chinese-first top tier models for a range of different tasks.

Your selection seems good, I'd drop MIMO, while promising it can't compare to deepseek v4pro, GLM 5+ or kimi 2.6 thinking.

But I'm writing with a word of advice - I've been using openrouter and for some models (especially kimi) it can be so subtly yet deeply worse through OR than when using directly that I've cut myself twice, even using quality-first providers, specific quantizations etc. It's not a feels-like-it result but doing hundreds/thousands queries where I have some objective metrics to evaluate the results and testing OR and direct on the exact same inputs.

Plus kimi swarm available through their website is quite impressive for some researchy stuff.

maxdo

20 minutes ago

maybe i need to give it second chance, surprisingly Kimi 2.6 consistently fail even to generate valid json plan, where gemma 4 was doing really good, but slow.

c0rruptbytes

an hour ago

I personally really like DS4 Flash - it's the largest I can run locally with decent speeds and I feel like it's good enough to maintain a codebase with less effort

LaurensBER

2 hours ago

It works very well with OpenCode. My team keeps hitting the 5h limits on other subscriptions and it's pretty good to have Deepseek as a backup. I just put 50 bucks on there and it feels like it'll never run out.

It's not good enough to fully replace any of the frontier models yet but it's definitely great to have as a backup!

lambda

4 hours ago

Why do you need them to provide a coding agent? Just use their model with any off the shelf coding agent. I happen to prefer Pi, but use whatever works for you.

alyxya

4 hours ago

I probably have an unfounded assumption that whatever coding agent they make will work really well with their models, better than external harnesses. I don't have a good sense for how all the model + harness combinations compare, nor any good way to compare them myself, but generally believe model companies train their models to work best with their own harness.

wolttam

3 hours ago

I've noticed that models have gotten less finicky with this over time. Harnesses don't need to be complex to get good coding performance from models, they just need to implement some sane primitives for code exploration and editing.

hootz

4 hours ago

Yeah, I'm using Pi with their models through an OpenCode Go subscription and it works pretty well. 10 bucks and V4-Flash is virtually infinite.

satvikpendem

3 hours ago

RL with the harness inputs and outputs of users is one of the primary improvers of model performance, a self perpetuating flywheel.

apitman

3 hours ago

What's the best way to use it with Pi, OpenRouter?

schaefer

an hour ago

> What's the best way to use it with Pi, OpenRouter?

I can't claim it's "the best"...

But the Pi.dev and OpenRouter combo is what I'm doing at home, and I love it. Setup was easy, I can use /model to switch between any of the openrouter models and whatever I'm hosting locally via VLLM.

lambda

an hour ago

I only use local models myself personally. But yeah, OpenRouter would probably be a good option.

jack_pp

17 minutes ago

i have done some amazing things for 5 dollars, using opencode. give it a shot, it is incredibly cheap

minimaxir

an hour ago

Zed's Agent natively supports a DeepSeek API key now. (do not use it through OpenRouter if you want to save the most cost)

tequila_shot

3 hours ago

You no longer need "their coding agent". You can hook up claude code to use Deepseek. Works perfectly.

potsandpans

an hour ago

Give pi a try if you haven't already. Avoid vendor harness lock-in.

raincole

2 hours ago

All the major coding agents already support DeepSeek.

zozbot234

3 hours ago

antirez's ds4-agent works quite fine. It runs on any Apple Silicon device with 96GB RAM or more.

rjh29

an hour ago

I wonder how many years it'll take for the API token cost to exceed the money spent on ram.

zozbot234

2 minutes ago

The DS4 folks are unofficially testing ways to run the model with lower performance on lower-RAM machines. Similar efforts are going on with llama.cpp. The results are a bit of a challenge, prefill time tends to explode which is a limitation if you care about agentic workflows.

cultofmetatron

3 hours ago

open code works with them today. I've been using it fulltime for 2 weeks so far.

sunaookami

3 hours ago

Using it with Pi and can only report good thing so far. I'm very impressed by how good it is (also it's way slower than Claude Sonnet and GPT-5.5 and often thinks "too much" before starting).

ReptileMan

an hour ago

Both pi, opencode and zed work amazing with deepseek.

Guillaume86

8 minutes ago

You seem to have tried a few things, if you don't mind I have a few questions as someone currently on Claude Code but would prefer to not lock myself in a commercial ecosystem (and their pricing change regarding headless usage is annoying me):

- how do/would you add the WebSearch tool to your harness? pay for a separate service or does deepseek offer something with their subscriptions?

- do pi/opencode support pasting images in prompts?

- how do you handle reading images? deepseek is not multi modal IIRC? do you pay for another model and route to it?

Any of these missing would really annoy me in day to day use...

ReptileMan

a few seconds ago

I use them for pure coding, but I think they do curls when needing something from the host machine.

wg0

4 hours ago

If you have not tried DeepdeekV4 you're missing out. The pricing makes it unbelievably good.

The chains of thought for Deepseek are very very interesting reads. Open code won't show them but do read them and you'll be surprised at how underrated the model is.

My model usage is very low but I still do pay directly to Deepseek regularly as my tribute and contribution to them open sourcing their models as my gratitude and showing support for what I deem positive for overall social good.

abyssin

3 hours ago

It’s good and cheap, but don’t talk about politics to it or it might trigger some sort of censorship rule. You can see it think, then suddenly erase everything and suggest to switch to another subject, without explaining anything. I also had it output some sort of generic message about how the news outlets are in the service of the people. Both times I was surprised because I didn’t make any sensitive requests, neither illegal nor subversive. But it was a remotely political topic and it was enough. There was something both chilling and refreshing about it, since censorship in the west is usually more subtle.

cassianoleal

33 minutes ago

I live V4 Pro for certain things but I've been quite impressed with V4 Flash for coding. It's terse, to the point, tends to make few mistakes and is pretty fast.

tequila_shot

3 hours ago

Yes - the model is REALLY good. I try Claude at work and Deepseek personally and this is the only model that works without trying to actively bankcrypt me.

seemaze

3 hours ago

Perhaps unintentional, but I find 'bankrypt' to be a thoroughly interesting portmonteau.

I'm not sure if it's when you run out of crypto, or when your bank gets hit by ransomeware.

gertlabs

3 hours ago

Even with the V4 Pro discount, the V4 Flash model gives you the best performance per unit dollar, and better performance overall for agentic, tool-heavy workloads. V4 Pro is smarter in one-shot reasoning, but at a significant speed difference. The performance, cost, and speed, makes V4 Flash our top flash model today by far.

Data at https://gertlabs.com/rankings

Sphax

4 hours ago

That is some insane value. I've been using GLM Coding Plan Max with GLM 5.1 for a while and i've tested DeepSeek V4 Pro maybe for 3 weeks now and I found it to be better than GLM 5.1 for complex coding tasks. I've used 65m tokens and with that price it cost me $1.5, that's really cheap.

DeathArrow

3 hours ago

I think Deepseek uses much more tokens than other models.

ReptileMan

an hour ago

But way less dollars. Which is the important metric.

Reubend

4 hours ago

Props to them. That makes DeepSeek v4 Pro extremely cheap compared to others, even in the same category. Look at these prices per million outputs tokens:

DeepSeek V4 Pro: $0.87

Qwen 3.7 Max: $7.50

Grok 4.3: $2.50

GLM 1.5: $3.08

Opus 4.7: $25.00

GPT-5.5: $30.00

Arcuru

4 hours ago

It's actually even cheaper when you look at the cache read costs. Those costs can dominate in agent workflows and DeepSeek's cost for cache reads is insanely low comparatively. At $.003626/M tokens, the cheapest other thing on your list is >$.2/M tokens. That's on the scale of 100x cheaper.

onlyrealcuzzo

2 hours ago

And they don't make the model worse once you have a subscription!

It doesn't matter how good Opus is if 2 months into your subscription they make it worse than GPT 3 to save money.

cassianoleal

27 minutes ago

DeepSeek don't have a subscription plan.

cold_harbor

4 hours ago

their MLA architecture cuts KV cache by ~5-13x vs standard attention. that's why inference is actually cheaper to run, not just a price war to gain market share.

zozbot234

4 hours ago

That's also a game changer for local inference. It unlocks long contexts, batched inference and storing the KV cache to disk on ordinary consumer platforms.

vitorsr

2 hours ago

Yes. The discount was most likely a "post-market trial" of how efficient the caching works for the new generation models.

trollbridge

an hour ago

I've "adjusted" my workflows now to use the cache. (Basically read all the files in your project very early on in your session, etc., simple stuff like that.)

Nearly all requests are cached now. It's amazing.

doctoboggan

3 hours ago

I am more worried about accidental data leak (agent reading env file for example) with the Chinese hosted models compared to the US hosted models. Am I wrong to suspect that the Chinese government might be more likely to scan all chats and save useful information compared to the US government or company?

I hesitated to even post this comment as it sounds biased and xenophobic. I would love for someone to convince me I am wrong. Does anyone have any insight into the company behind deepseek hosting, and what their history of respecting data privacy is?

3s

3 hours ago

It's not an unreasonable concern, which is why most US companies prefer to go with AWS bedrock, or even one of the AI labs, and typically request zero data retention agreements. But leaking is a concern no matter where it's hosted, it's just the incentives that change IMO. For example, the labs do scan every chat and train on data not covered under enterprise ZDR agreements. Law enforcement can request access to all user data with a valid warrant or in an emergency context [1]

If you're interested in trying DeepSeek V4 privately, you can try Tinfoil (tinfoil.sh) where all models are hosted in an attested secure hardware enclave, making the inference end-to-end private. Full disclosure: I'm one of the cofounders.

[1] https://cdn.openai.com/trust-and-transparency/openai-law-enf...

wkcheng

3 hours ago

Just use it through something like Azure. They host the entire model and serve it from the US. I'm sure that there are other providers like this.

We use it that way and it works great.

rsanek

an hour ago

You don't get the cheap pricing this way, which is why people are so interested in the model in the first place.

opsnooperfax

2 hours ago

I would not be shocked if they do that. I would not be terribly shocked that the US-headquartered models do that for another government either. As far as data confidentiality goes, I wouldn’t hold my breath. Microsoft checks all those enterprise boxes, right? Yet, Azure still gets breached once in a while.

giwook

3 hours ago

I think there is a nonzero chance of that happening. Beijing could at any point decide that DeepSeek has become too powerful and/or is a major export and start to insert themselves (assuming they have not already).

There are widespread reports about how foreign actors (not limited to China) have infiltrated critical networks across many industries in the US en masse and are simply waiting for the right time to exploit them. Frontier models are simply another attack vector (and much more easily exploitable when you think about it).

The fact is that there is potential for this with any cloud-hosted model, whether it is intentional by the actual company building the models or a malicious actor is able to exploit a vulnerability.

jug

3 hours ago

This is a risk although then this is fortunately a model that isn't tied to Chinese hosting. But indeed something to consider if using straight DeepSeek.com.

dualvariable

2 hours ago

I'm not important enough for anyone in China to go out of their way to attack me. And DeepSeek has to maintain a sufficient level of trust so that users keep using their platform--they can't just act like a keylogger attacking everyone's crypto wallets or trust collapses.

If I was working on something that the Chinese government considered of strategic importance, then I would certainly be worried about it. But I don't do that.

I'm much more worried about techbros in this country using their LLMs to extensively profile me and produce something vastly more dystopian in this country than the real or imagined social credit scores in China. The people trying to convince you that the Chinese government are the people you should be worried about (as an individual in the United States) are probably the people you really need to be worried about.

nivekney

3 hours ago

User data integrity definitely should be a concern. It's also known that regulations is being outpaced, so the cost of being/using frontier products is a double-edged sword for sure.

jdgoesmarching

3 hours ago

More likely? US tech leaders have been fully capitulating to the surveillance state for over a decade. Why do I care what China does with my data? I don’t live in China and never plan to.

The tech bro threat model has always been pure jingoism and xenophobia. Ironically, the worst thing a Chinese company has done with my data is sell Tiktok to an American technofascist.

wolttam

3 hours ago

I was hoping they were going to do this.

I'll keep running Flash locally for the stuff I care about data privacy, but the value of Pro through their API is unreal for anything else (and I want to give them my training data as long as they keep putting out open models).

margorczynski

4 hours ago

Maybe the Chinese are playing the long game by trying to bankrupt the US competition? Because there's no way this is financially viable.

ecommerceguy

3 hours ago

Small team, cheap electricity, very efficient models. Many western companies operate at a loss to gain market share. Why can't the Chinese?

odie5533

3 hours ago

Inference is cheap. I bet the financials of these Chinese companies are much saner looking than any of the big US AI companies which are bloated by investors.

raincole

2 hours ago

DeepSeek is very likely selling tokens at a loss. There're many cloud providers that provide you with DeepSeek V4 Pro via API, and those services at least twice as expensive as DeepSeek itself.

surgical_fire

2 hours ago

I see no evidence anywhere that "inference is cheap". To my knowledge this is a myth being spread to pretend ChatGPT or Claude will one day make any economic sense.

DeepSeek likely operates at a loss. How big the loss is anyone's guess.

Meanwhile I am happy using their model. It is really good, to a point I forget I am not using Codex or Claude.

missedthecue

3 hours ago

DeepSeek hasn't raised enough money to be actively selling tokens at a loss. They have a small team, extremely low overhead relative to other labs, operate in a place with the essentially the cheapest commercial electricity rates in the world, and their architecture lends itself very well to cheap inference.

jdgoesmarching

3 hours ago

If you think heavily subsidizing AI models isn’t financially viable, I have some bad news for you about US AI companies.

Deepseek has made some incredible advancements in model efficiency, and more importantly actually publishes those advancements so everyone can benefit from them.

overfeed

an hour ago

> more importantly actually publishes those advancements so everyone can benefit from them.

I suspect American inference providers implement the efficiency gains, and pad their margins rather than pass the savings along to the consumer.

tencentshill

3 hours ago

Federal ban incoming then. They did it with cars already.

dyauspitr

8 minutes ago

They’re going to have to. It’s $0.87 vs $30

kajman

2 hours ago

Maybe not. I don't see how US inference providers can compete anyway with commoditized models. Costs are out of control here and the infrastructure is way worse.

dyauspitr

10 minutes ago

For sure. But also they’re building an electrostate with 100% electricity redundancy and dirt cheap electricity. They might actually be able to sustain this.

zozbot234

3 hours ago

US suppliers are fine and won't go bankrupt, they can just focus on serving bigger "Pro" class models from their large datacenters. In fact cheap AI makes the bigger and smarter models more useful because it's smart enough to draft a clear question to the model, which helps minimize wasted tokens.

overfeed

an hour ago

> US suppliers are fine and won't go bankrupt, they can just focus on serving...

For a while, US automakers thought the same of Japanese, then Korean car manufacturers, and Musk laughed at Chinese EV makers in an interview >12 years ago. People learn and get better at making things until they catch up with the frontier.

zozbot234

15 minutes ago

Chinese EV makers have a few interesting technologies especially wrt. batteries but they're still very far from catching up to the frontier in a general sense. From that narrow POV Musk was absolutely correct.

govg

11 minutes ago

What is the "frontier" in EVs that Chinese automakers are yet to achieve? And what automaker is at this so called frontier?

dyauspitr

5 minutes ago

What the hell are you talking about? They have batteries that charge 0-80% in 5 minutes even at -30F. More full featured EVs at half the price with similar acceleration rates and higher top speeds. Total ranges are comparable or better. What is this frontier you speak of?

I’m an American fucking patriot but facts are facts.

throwa356262

an hour ago

US providers are burning VC money because they have been selling the idea of total world domination. Even the government has bought into that. Now suddenly they are not longer dominating the field and even need uncle Sam to protect them from foreign competitors.

When VC pulls out, some of them may go bankrupt.

zozbot234

11 minutes ago

They can still dominate wrt. the biggest and smartest models. DeepSeek does effectively nothing to change that. Of course these big models will be served at a very steep price in order to fully and completely recoup the investment, but there's no reason why that couldn't work if they really are smart enough and if the market value of smarts follows any kind of scaling law.

louiereederson

2 hours ago

I wonder if/when the US limits market entry of Deepseek and other Chinese model vendors like they have done with Huawei

mmastrac

2 hours ago

How would that be technically feasible? Would we get IP bans?

ReptileMan

36 minutes ago

When they repeal the first amendment.

bel8

4 hours ago

Great! I have been using DeepSeek 4 Flash high for everything lately.

First accessible model with useable 1 million context window for me.

onlyrealcuzzo

3 hours ago

I just canceled Claude Code and Codex today.

RIP.

Claude literally refuses to finish tasks in auto mode and just keeps saying, now is a good stopping point, when it's 1% done (and doing the EXACT OPPOSITE of what I tell it).

Codex is barely better...

May as well pay 1/20th the price for DeepSeek.

Claude seems to have something that looks at how long you've been a customer and then just massively degrades quality.

When I started my subscription, Claude had none of these problems.

2 months into subscriptions Claude is completely unusable garbage, and Codex is not much better.

dawnerd

2 hours ago

That was my experience with Claude code too. Someone will come and tell you you're doing it wrong. Hard to do it right when it'll just stop randomly, especially when it ends with something like 'let me know if you want me to continue!'.

onlyrealcuzzo

2 hours ago

Claude Code has been so unbelievably terrible this entire week that I CANNOT believe it's the same model I was using weeks ago.

I am completely convinced they just screw over their customers after so much usage or so long of a subscription thinking they have them for life.

I have NEVER been so happy to cancel a subscription.

cassianoleal

24 minutes ago

Claude Code is a harness, not a model.

eiek

2 hours ago

They’re playing games behind the scenes to massage and manage their earnings.

China is gonna win long term there’s no doubt. The fact that the American firms haven’t created immense escape velocity despite the disparity in spending is quite telling.

zozbot234

2 hours ago

The nice thing about hosting inference locally is that you can be sure you're not being rug-pulled in any way. This doesn't really help China 'win' though, it's just freeloading on them making their weights openly available.

onlyrealcuzzo

an hour ago

The good thing is, we're only 2.5 years away from a top of the line MacBook having better local inference than CC Opus does today.

That's more than good enough if you're actually getting what CC Opus is capable of.

I've never been so excited for the future.

velomash

3 hours ago

I found that DSV4 wasn't as cheap as its token price. It burns tokens at a pretty high rate

dburkland

3 hours ago

I've had a ton of success when pairing Opus 4.7 for planning w/ DeepSeek V4 Flash in opencode. Best part is DeepSeek V4 Flash is Free through opencode Zen.

belinder

4 hours ago

Anyone using deepseek through a gateway (not sure if right term) so there's no data retention? At work we're going through a few hundred million tokens a day in our app (using anthropic models), and we're looking for something significantly cheaper

wkcheng

3 hours ago

Use it through Azure! Azure hosts DeepseekV4-Pro and DeepseekV4-Flash themselves. We're using it and it works great.

You don't get the discount that Deepseek is providing, but it's still a cheap model (v4-pro is cheaper than sonnet)

bel8

4 hours ago

opencode allegedly has contractual no-data-retention policies with their providers.

I recall reading about that in an issue or in their Discord server.

But I would contact them formally to verify that.

BeetleB

2 hours ago

They claim it on their OpenCode Zen page.

What's frustrating is that they give no information on who the provider(s) are!

mlcruz

4 hours ago

I have been using deepseek via deepinfra, afaik they provide no data retention. Im probably going to deploy the full model on their infra instead of paying credits at some point, so far the experience has been pretty good

goobatrooba

3 hours ago

But do these prices apply if you use a third party go-between? I would expect they then charge their own prices?

dyauspitr

12 minutes ago

Oh shit that changes everything. This might be the biggest thing to happen to LLMs this year.

Havoc

5 hours ago

Neat. I like DS for secondary checks on code. Sometimes spots things other models don't

vladgur

3 hours ago

Which models do folks use for openclaw nowadays

npilk

16 minutes ago

I've been using DeepSeek Flash to replace Sonnet once the subscription stopped working. Haven't really noticed a difference, although I don't usually have it doing anything very complicated.

sourcecodeplz

3 hours ago

Honestly I haven't even tried the Pro model. Flash was just so much more than I expected I just keep working with it. Thank you deepseek team

kingjimmy

4 hours ago

is this the Huawei chip difference?

chvid

3 hours ago

That is probably why they were a few months delayed. But could be interesting to see their hosting / network / colocation setup.

guelo

3 hours ago

Even at these prices I find claude and codex subscriptions to be cheaper than per-token pricing when my usage is hovering around the session limits. I guess the subscriptions are heavily subsidized.

guelo

39 minutes ago

I guess I got downvoted because people don't believe me that it's cheaper? But I spent $5 a couple days ago in one hour with deepseek v4 in a coding agent. That's way more expensive than a $20/month claude subscription. Even if I hit claude's 5h limit in one hour I can do that many times in a month.