
Credit: unsplash
On the occasion of ChatGPT's third birthday, its competitor DeepSeek showed up with a “birthday gift” that seems a little too competitive, as if unwilling to let the pioneer of large language models enjoy an easy celebration.
According to DeepSeek, the newly updated “regular lineup” V3.2—now available on the web, app, and via API—strikes a balance between reasoning ability and output length, making it well-suited for everyday use.
In benchmark reasoning tests, V3.2 and GPT-5, as well as Claude 4.5, showed varied strengths across different domains. Only Gemini 3 Pro delivered a noticeably stronger overall performance compared to the first three.
Source: DeepSeek official WeChat
Meanwhile, DeepSeek also stated that compared to the recently released Kimi-K2-Thinking from the domestic large model developer Moonshot AI, DeepSeek V3.2 has significantly reduced output length, which greatly decreases computational overhead and user wait time. In agent benchmarking, V3.2 also outperformed other open-source models such as Kimi-K2-Thinking and MiniMax M2, making it the strongest open-source large model to date. Its overall performance is now extremely close to that of the top closed-source models.
Image from DeepSeek official WeChat
What’s even more noteworthy is V3.2’s performance in certain Q&A scenarios and general agent tasks. In a specific case involving travel advice, for example, V3.2 leveraged deep reasoning along with web crawling and search engine tools to provide highly detailed and accurate travel tips and recommendations. The latest API update for V3.2 also supports tool usage in “thinking mode” for the first time, greatly enriching the usefulness and breadth of answers users receive.
In addition, DeepSeek specifically emphasized that V3.2 was not specially trained on the tools featured in these evaluation datasets.
We’ve observed that while benchmark scores for large models are climbing, these models often make basic factual errors in everyday user interactions (a criticism especially directed at GPT-5 upon its release). Against this backdrop, DeepSeek has made a point of highlighting with each update that it avoids relying solely on correct answers as a reward mechanism. As a result, they have not produced a so-called “super-intelligent brain” that appears clever in benchmarks yet fails at simple tasks and questions that matter to ordinary users—a “low EQ” AI agent.
Overcoming this challenge at a fundamental level—becoming a large model with both high IQ and high EQ—is the key to developing a truly versatile, reliable, and efficient AI agent. DeepSeek also believes that V3.2 can demonstrate strong generalization capabilities in real-world application scenarios.
In order to strike a balance between computational efficiency, powerful reasoning capabilities, and agent performance, DeepSeek has implemented comprehensive optimizations across training, integration, and application layers. According to its technical paper, V3.2 introduces DSA (DeepSeek Sparse Attention mechanism), which significantly reduces computational complexity in long-context scenarios while maintaining model performance.
At the same time, to integrate reasoning capabilities into tool-using scenarios, DeepSeek has developed a new synthesis pipeline that enables systematic, large-scale generation of training data. This approach facilitates scalable agent post-training optimization, substantially improving generalization in complex, interactive environments as well as the model’s ability to follow instructions.
In addition, as mentioned earlier, V3.2 is also the first model from DeepSeek to incorporate reasoning into tool usage, greatly enhancing the model’s generalization capabilities.
If the focus of V3.2 is on “saying things that make sense and getting things done”—a balance-seeking approach for practical intelligent agents—then the positioning of the “Special Forces” V3.2 Speciale is to push the reasoning ability of open-source models to the limit and explore the boundaries of model capabilities through extended reasoning.
It’s worth noting that a major highlight of V3.2 Speciale is its integration of the theorem-proving capabilities from DeepSeek-Math-V2, the most powerful mathematical large model released just last week.
Math-V2 not only achieved gold-medal-level performance in the 2025 International Mathematical Olympiad and the 2024 China Mathematical Olympiad, but also outperformed Gemini 3 in the IMO-Proof Bench benchmark evaluation.
Moreover, in a similar vein to previously discussed approaches, this mathematical model is also striving to overcome the limitations of correct-answer reward mechanisms and the so-called “test-solver” identity by adopting a self-verification process. In doing so, it seeks to break through the current bottlenecks in AI’s deep reasoning, enabling large models to truly understand mathematics and logical derivations; as a result, it aims to achieve more robust, reliable, and versatile theorem-proving capabilities.
)
)
)
)
)
)
)
)
)
)
)
)
)

)