Salvatore Sanfilippo argues that DeepSeek v4 Pro, while open weights, comes close to top-tier closed models and could shift local processing and API pricing.
DeepSeek v4 arrived late, but, according to Salvatore Sanfilippo, it was worth the wait. His verdict is blunt: the Pro version is now the benchmark open-weights model, capable of going head-to-head with the best-known closed systems on reasoning and code tasks. The real news, though, is not just technical. If a model at this scale can run locally with credible performance, then the big providers’ edge is no longer just quality, but the ability to absorb the costs.
Sanfilippo presents DeepSeek v4 Pro as a clear leap over the other open-weights models he cites, from GLM 5.1 to Kimi K2.6. In his account, this is not a cosmetic difference: after two hours of use, he says the alignment and code work feel very close to top-tier closed models.
DeepSeek version 4 Pro is, at the moment, the frontier open-weights model, clearly better than GLM 5.1, I think than the latest one, and clearly better than Kimi K2.6.
0:00
There’s no comparison with open-weight models even in practical use, because I did a two-hour session and the alignment, the way it works with code, is very close to the closed models we’re used to.
0:36
His thesis is that DeepSeek v4 Pro sits, broadly speaking, in GPT 5.2 or Opus 4.5) territory depending on the task. For Sanfilippo, the point is not to crown an absolute winner, but to recognize that, in software, the Chinese model already looks very strong.
Behind the result, Sanfilippo says, is an enormous architecture made more manageable by sparsity. He says the model has 1.6 × 10^12 parameters, or 1,600 billion, but activates only 49 billion per token, which makes inference relatively sustainable.
The model is enormous, it’s 1.6 × 10 to the 12th parameters, that is, 1,600 billion parameters. A really large model, despite everything, thanks to the MoE sparsity, which activates only 49 billion parameters for each generated token.
2:18
Most of the weights are in 4 bits, other weights in 8 bits, other parts in the attention, and RoPE is always kept at full precision.
3:13
The model uses mixed precision, with much of the weights in 4 bits, others in 8 bits, and RoPE* kept at full precision. Sanfilippo also emphasizes two attention pipelines and a routing choice that, in the early layers, gives up on chasing meaning immediately, because the semantic signal there is still weak.
For Sanfilippo, the real brake is no longer model quality, but operating cost. He says DeepSeek v4 costs about 3.48 dollars per million output tokens and around 5 dollars on GPT 5.5, with input much more expensive on DeepSeek if it is not under certain cache conditions.
DeepSeek costs 3.48 for the same amount of output tokens; for input tokens it’s like 70 dollars per million tokens if they’re not cached, otherwise it drops further.
10:11
Artificial intelligence costs a fortune, there is no alternative whatsoever because the energy cost of these huge models and the inference of autoregressive models with attention as it stands today costs.
14:07
His conclusion is uncomfortable for anyone expecting a rapid drop in prices. Even if DeepSeek promises reductions when new Huawei GPUs arrive, Sanfilippo argues that a model this large cannot fall to a few cents per million tokens without changing the hardware base or the architecture.
For Sanfilippo, this structure makes local use plausible again. With a Mac Studio with 512 GB of RAM, he says, a user could run a near-frontier model on their own computer, with high enough performance to use it as a real coding agent rather than a toy.
If someone has a Mac Studio with 512 GB of RAM, they have a near-frontier model that can run on their own computer with native weights and, given the sparsity with 13 billion active parameters, it runs fast too.
6:20
It’s not a toy that you can actually use as an alternative to providers.
7:38
Here his reasoning moves from benchmarking to personal economics: with models like this, the line between cloud service and private machine gets thinner. Sanfilippo says he has already run DeepSeek v4 with tools like Claude Code* simply by redefining the endpoints, a sign that the tooling ecosystem matters almost as much as the model itself.
Sanfilippo tries to defend the verdict with a test he created: an interpreter he wrote years ago called people, which the model has to improve while staying under 3,000 lines of code and avoiding regressions. The criterion is not to win an easy benchmark, but to speed up the software without cheating.
You have to continuously, under 3,000 lines of code as a contract, keep improving the execution speed of that interpreter in the benchmark without ever having regressions in quality.
16:08
With DeepSeek version 4, it feels like you’re dealing with the thing you normally pay for.
17:27
This is where the conversation becomes more interesting than simple enthusiasm. Sanfilippo says that in Kimi K2.6 this test breaks immediately, while in DeepSeek v4 the behavior resembles the models people usually pay for: an imperfect check, but closer to real use than to an abstract ranking.
There are three thinking modes, Nothing, think and think max. You basically enable TMAX with a system prompt.
18:13
The documentation, he says, is clear and the developer options are well done. The final picture remains cautious but positive: he only tested the Pro, expects to see Flash on more powerful hardware, and suggests that if even the lighter version holds up, the economic advantage of flat-rate plans could start to wobble.
Why does Sanfilippo like DeepSeek v4 Pro so much?
Because, he says, it behaves like a top closed model, especially in coding. After two hours of testing, he says the alignment and results feel like premium systems.
How big is DeepSeek v4?
Sanfilippo says the model has 1.6 × 10^12 parameters, or 1,600 billion. But it activates only 49 billion per token thanks to MoE sparsity.
Can DeepSeek v4 run locally?
Yes, according to Sanfilippo, on very strong hardware such as a Mac Studio with 512 GB of RAM. He says that in that setup it can become a truly usable coding agent, not just a demo.
How much does DeepSeek v4 cost to use?
Sanfilippo cites about 3.48 dollars per million output tokens and about 70 dollars per million input tokens if there is no cache. For him, it is still cheaper than the big providers, but not cheap enough to be almost free.
What did he test to judge the model?
He used an interpreter he wrote called people, which the model must speed up without exceeding 3,000 lines and without introducing bugs. He says DeepSeek v4 performs far better than Kimi K2.6 on that test.
AI-assisted summary of Salvatore Sanfilippo's podcast, verified against the original transcript.