Site iconSite icon ForkLog

DeepSeek Unveils a Rival to Claude, ChatGPT, and Gemini

DeepSeek Unveils a Rival to Claude, ChatGPT, and Gemini

Chinese AI startup DeepSeek has released a preview of its new line of language models. The flagship V4-Pro has surpassed Claude Opus 4.6 and GPT-5.4, becoming the best open system available. 

Architecture and Scale 

The V4-Pro model comprises approximately 1.6 trillion parameters, but only 49 billion are used at each step. The second version, V4-Flash, has a total scale of 284 billion, with 13 billion activated. 

Both models are built on a Mixture of Experts (MoE) architecture: only the subnetworks relevant to the task are activated for each token. This approach is more cost-effective than fully dense architectures while maintaining performance.

Pre-training was conducted on a corpus exceeding 32 trillion tokens. Developers then fine-tuned the models in stages, dedicating separate blocks for coding, mathematics, logic, and instruction following. The final version integrates these skills through distillation.

Long Context Made Affordable

A key distinction of V4 is the optimization for processing long sequences. While a 1 million token context window exists in other models, its use typically involves high costs and delays. 

DeepSeek announced that the new version significantly reduced the resource demands of such operations. Compared to V3.2, V4-Pro requires about 27% of the computations and 10% of the memory KV cache when working with maximum context. For V4-Flash, the figures are approximately 10% and 7%, respectively.

Source: Hugging Face

The team achieved this through a hybrid attention architecture: two mechanisms compress data and reduce load when handling long texts. Special hyperconnections were also used for stability, and the Muon optimizer accelerated training.

Reasoning Modes and Agent Capabilities

DeepSeek V4 supports three reasoning modes:

  1. Non-think — quick responses to simple questions without additional analysis. 
  2. Think High — deep analysis for complex tasks and planning. 
  3. Think Max — maximum mode: the model outlines each step and checks all options.  

In agent tasks, the Max mode now retains the chain of intermediate steps within a single task. In the previous version, some of this context was lost during user interaction. 

Testing Results

According to DeepSeek, the flagship version shows results comparable to leading systems in several areas:

Source: Hugging Face. 

V4 was specifically trained on real-world scenarios: data analysis, reports, document editing, and internet searches with iterative tool use.

To assess the model’s suitability for real development, the startup conducted internal testing on tasks from its engineers. In a survey of 85 developers and researchers, 52% stated they are ready to use V4-Pro as their primary coding model, while another 39% indicated they are inclined to do so.

Back in April, OpenAI released GPT-5.5. The model is positioned as “a new level of intelligence for real work and agent management.”

Exit mobile version