Shabby Lemmy

Deepseek-v3.2Speciale, built for agentic work, just released

https://hexbear.net/api/v3/image_proxy?url=https%3A%2F%2Flemmygrad.ml%2Fpictrs%2Fimage%2F2cd3797c-0c31-4eb5-8a5d-3001857940b3.png

cross-posted from: https://lemmygrad.ml/post/9929539

Just saw the news on twitter - and this is my own screenshot of their API pricing taken just now.

V3 has received an update and is moving to 3.2 - including on the web interface. They posted all these nice benchmarks which I guess means something? Lol tbh I'm not sure how much you can trust these benchmarks.

What does this mean? Well, I gave the twitter thread announcement to Crush (and the new deepseek-v3.2, which you are automatically upgraded to both on API and web interface) and this is what it made of the news:

Quick Summary

DeepSeek has released V3.2, replacing the experimental version. There are two main models:

V3.2: General-purpose, balanced performance (GPT‑5 level)

V3.2‑Speciale: Specialized for complex reasoning (Gemini‑3.0‑Pro level)

Both are open‑source and available via API.

What's New & Why It Matters

1. Thinking Integrated with Tool‑Use

V3.2 can now "think" while using tools (like searching the web, running code, or calling APIs). This makes AI assistants more transparent and better at multi‑step tasks. You can choose thinking mode (slower but more thorough) or non‑thinking mode (faster for simple tasks).

2. Two Models for Different Needs

V3.2: Use this for most agentic work—chatbots, coding assistants, general automation. It's cost‑effective and reliable.

V3.2‑Speciale: Use this only when you need top‑tier reasoning—complex math, competitive programming, advanced planning. It uses more tokens (costs more) but solves harder problems.

3. Open‑Source & API Options

API: Available immediately. V3.2‑Speciale has a temporary endpoint until Dec 15, 2025.

Self‑hosting: Download from Hugging Face and run locally for privacy or cost savings.

Pricing: Same for both models [crush made a mistake in pricing here, not sure why. See my screenshot for current pricing or check on https://api-docs.deepseek.com/quick_start/pricing]

Practical Advice for Agent Builders

Which Model Should You Use?

Start with V3.2—it handles 90% of agentic tasks well.

Switch to V3.2‑Speciale only for tasks that require deep, step‑by‑step reasoning.

Monitor token costs—V3.2‑Speciale uses more tokens, so watch your API bill.

Key Improvements for Agentic Interfaces

Better reasoning transparency—you can see the model's thought process when using tools.

Mode flexibility—toggle thinking on/off based on task complexity.

Stronger performance on benchmarks (math Olympiads, programming contests).

Timeline & Availability

Now: V3.2 on DeepSeek App, Web, and API

Until Dec 15: V3.2‑Speciale via temporary API (same price)

Always: Both models open‑source on Hugging Face

SouffleHuman @lemmy.ml - 2w

The title seems a bit confusing. I believe that the Speciale model specifically does not support tool-calling, while the regular V3.2 is designed for Agentic work. I see you explained that in the description though, which is nice.

Anyway, it's pretty hilarious that OpenAI just started to experiment with ads (apparently even on the paid tiers), right when they're getting absolutely hammered by everyone else. Especially with Deepseek's really cheap API that basically makes it very difficult for western companies to turn a profit.

☆ Yσɠƚԋσʂ ☆ - 2w

I'm just waiting until DeepSeek starts getting banned in the US. I don't think they'll be able to ban the open model downloads, but I can totally see them blocking API access to protect openai.

🏴حمید پیام عباسی🏴 - 2w

The cyber security insurance my company uses requires that refuse to use Deepseek lol, so even if they can't pass the laws the US will make it functionally impossible for most companies to use.

I run deepseek on an offline gpu heavy computer hooked to solar panels lol good luck stopping me

it's really funny to me how the west is starting to turn into a hermit kingdom cause it can't compete with China now

gay_king_prince_charles [she/her, he/him] - 2w

Anthropic and Alibaba are the only model developers capable of coming up with good names at this point

Promising. Deepseek-reasoner is a bit finicky with tool calls right now (and arguably behind Qwen and Kimi by a large margin right now), and this should make it more feasible. Chinese models do need a big jump sometime in order to catch the West, as right now the pecking order is Opus 4.5 > Gemini 3 >> GPT-5.1 > Qwen 3 > Kimi K2 > Deepseek V3.1. This looks like DeepSeek might have jumped ahead of the pack to beat GPT-5.1, but still behind Gemini 3. I'm particularly interested in it beating out Sonnet 4.5 while being 30x cheaper. I'm excited to see what DeepMind and Anthropic will do with the new attention model and what the cost reduction will be (obviously both of those two have made major efficiency improvements with Gemini 3 and Opus 4.5's cost reductions). I'm a little surprised that they didn't use OCR for memory like they recently suggested, and I wonder if anything more will come of that.d that they didn't use OCR for memory like they recently suggested, and I wonder if anything more will come of that.

Yeah, I'm waiting for the whole ocr thing to get integrated cause that would potentially allow for huge contexts. I imagine they're still figuring stuff out around that. There are a few really promising things I've seen recently that seem like low hanging fruit that that will likely make it into models before long. These ones in particular stood out for me:

It looks like you can get a huge improvement even with small models by using external context like graphs, and much better reasoning by tweaking the evaluator strategy. I think it's likely we might see models that will be small enough to run locally that blow even stuff like Opus 4.5 out of the water in the near future.

Yeah, cheap context is going to be huge. Opus 4.5 is super limited and the thinking budget could afford to be way bigger. Deepmind probably has some sort of secret sauce behind their 1m token content. Quadratic scaling is the single largest impediment to running inference these days, and reducing that would cause huge savings in electricity and silicon.

I see you've also linked the HRM paper. I'm really looking forward to seeing how those play out. LLM progress is going to level out at some point (in fact, I think the only thing that is growing faster than linearly at this time is time horizons of tasks), but HRMs seem like they solve most of the big issues with LLMs and could produce significantly better results from much less pre-training and compute.

Forgot to link this one as well. With the current approach, the model gets pretrained, and its weights are frozen. It can’t actually learn anything new going forward because the context simply acts as short-term memory. The idea they came up with is to have layers that can have different update cycles running at different speeds. They represent the model as a set of nested optimization problems , where each level has its own update frequency. Instead of just deep layers, you have levels defined by how often they learn. You might have a mid frequency level that updates its own weights every, say, 1k tokens it processes, and a slower level that updates every 100k tokens, and so on. It can learn new facts from a long document and bake them into that mid level memory, all while the deep, core knowledge in the slowest level stays stable. It creates a proper gradient of memory from short term to long term, allowing the model to finally learn on the fly.

https://abehrouz.github.io/files/NL.pdf

I'm really excited to see what all this is going to look like in a year or so once people have the chance to integrate some of these ideas. It seems like a lot of them are also mutually reinforcing, so you could have continuously learning models, that externalize old memories into graphs or images, etc. And then all the tricks people are coming up with to improve reasoning will make the outputs a lot more reliable.

It's going to be very cool if you could run your own model locally that will start learning your habits and patterns over time and start evolving to fit exactly what you're doing.

BountifulEggnog [she/her] - 2w

Please give us a distill 🙏

normal_user @lemmygrad.ml - 2w

"Speciale"? Must be Italian. Let's hope they release the "Fragile" model next.