3 Minutes
Imagine a server room dense with silicon, each chip chipping away at a mountain of text. That’s the image Huawei’s research group is selling after announcing they trained DeepSeek V4-Pro, a 1.6 trillion-parameter model, using a cluster built around at least a thousand Ascend 910C chips.
The story sounds straightforward: domestically produced AI silicon finally handling large-scale model workloads. But the reality is layered. Huawei says the team performed full-parameter updates—meaning every weight in the model was trained rather than simply adding a thin adapter layer—and that pretraining for V4-Pro processed a staggering corpus reportedly exceeding 32 trillion tokens. Pretraining builds the model’s core capabilities; the later fine-tuning stage shapes behavior through instruction tuning and safety alignment.
Why does that matter? Because full-parameter training is far more demanding than light-touch techniques that tweak only a small portion of a network. It requires sustained throughput, stable interconnects, and tight orchestration across chips. Historically, Chinese groups struggled to migrate heavy training workloads off Nvidia hardware without hitting bottlenecks in performance and connection stability.

Huawei points to the Ascend 910C’s dual-design architecture as a turning point. Independent tests from earlier DeepSeek experiments suggested an Ascend part could deliver roughly 60% of the inference performance of Nvidia’s H100, but that was inference — not large-scale, synchronized training. Training workloads expose different weaknesses: collective communication, memory management, and software maturity all become decisive.
Still, the claim has caveats. The researchers reported completion of full-parameter training, but provided no rigorous benchmarks: no wall-clock time, no throughput metrics, no head-to-head comparison with H100 clusters, and no detailed breakdown of power or efficiency. Without those numbers, the announcement reads precisely like what it is—an encouraging technical milestone but not yet independent proof that Ascend clusters match or surpass established alternatives for leading-edge pretraining.
There’s precedent for caution. Earlier reports said attempts to train a different model, R2, on Huawei silicon ran into instability and slow chip interconnects. Moving from successful demonstrations in inference to reliable, large-scale pretraining is a big leap. Companies can sometimes stitch together enough engineering to complete a single run while still lacking the robustness required for routine model development at scale.
So what’s the takeaway for the wider AI ecosystem? If Huawei’s account holds up under scrutiny, it signals growing competitiveness of Chinese AI hardware and a maturing software stack capable of orchestrating thousand-chip training jobs. If it doesn’t, it underscores that hype still outpaces verifiable progress. Either way, the next step is clear: independent benchmarks and transparent runtime data.
We’ll be watching for those numbers. Independent verification will tell us whether this is a true pivot in global AI infrastructure or simply an ambitious proof-of-concept.
Leave a Comment