Huawei Atlas 350: The Chinese Chip That Challenges Nvidia… with Important Caveats

Huawei just released the most powerful AI chip China has ever made — and the most revealing number is not its absolute performance, but which Nvidia chip it managed to match. The Atlas 350, powered by the new Ascend 950PR processor, delivers 1.56 petaflops in FP4 precision and 1 petaflop in FP8 — roughly 2.8 times the performance of the Nvidia H20, the best chip the United States allows to be sold to China under current export restrictions.

The numbers grab attention. But the full story is more nuanced than the headlines suggest.

The Atlas 350 is undeniably a significant technical achievement for the Chinese semiconductor ecosystem. Not just for raw performance, but because it incorporates Huawei’s own HBM memory — the HiBL 1.0, with 112 GB and 1.4 TB/s of bandwidth — giving the company full control over the memory supply chain. It also introduces CANN Next, a software stack designed to be compatible with Nvidia’s CUDA, offering familiar abstractions like thread blocks, warps, and kernel launches to ease developer migration.

But context is key. The H20 against which the Atlas 350 is measured is not just any chip: it is a chip deliberately limited by Nvidia to comply with U.S. export regulations. With 296 TFLOPS in FP8, the H20 is far below what Nvidia sells in other markets — the H100 delivers around 2,000 TFLOPS, and the B200 doubles that. Saying the Atlas 350 “outperforms the best American chip” is only true if we add the crucial caveat: the best chip the United States allows to be exported to China.

There are more differences than raw performance. The Atlas 350 consumes 600W, 50% more than the H20’s ~400W. Its memory bandwidth (1.4 TB/s) is less than half the H20’s 4.0 TB/s. And most importantly: the Ascend 950PR is a chip designed primarily for inference, not training. For training frontier models, China still depends on American chips.

Huawei has a plan. The 950PR is the first product in a three-year roadmap that includes the 950DT (for training and inference, expected Q4 2026), the Ascend 960 (Q4 2027), and the Ascend 970 (Q4 2028). The company aims to ship 750,000 units of the Atlas 350 in 2026, and companies like ByteDance and Alibaba are already planning large orders.

Four weeks after the Atlas 350 launch, on April 24, 2026, DeepSeek released its V4 model with a notable twist: Huawei announced “day-zero” support for V4 inference on its Ascend 950PR and 950DT chips. The adaptation, demonstrated in a livestream on Bilibili and WeChat, showed that Huawei’s CANN framework — the functional equivalent of CUDA — could run the V4 model without Nvidia GPUs.

This collaboration is strategically important. It is the first time a frontier-class model has been specifically adapted for domestic Chinese accelerators. But here too there are caveats. DeepSeek V4 was not trained on Huawei hardware — DeepSeek’s official documents do not mention Huawei, and SCMP reports that “cutting-edge Chinese models still rely on advanced American chips for training.” The Ascend 950PR is an inference chip, not a training chip. Saying V4 was “built with zero dependence on Nvidia” is an overstatement; what is correct is that it was adapted to run on Ascend.

The adoption of the CANN ecosystem is perhaps Huawei’s most strategic move. Nvidia dominates not because of its hardware but because of CUDA, the software ecosystem that locks developers in. CANN Next attempts to replicate that play by offering API-level compatibility, but history is not on its side: previous generations of Ascend struggled to achieve mass adoption. That ByteDance and Alibaba are placing orders suggests this time could be different, but the verdict is not yet written.

Why it matters

The Atlas 350 represents China’s most credible advance in domestic AI hardware to date. It has competitive specifications, its own memory, a clear roadmap, and for the first time, a frontier model adapted to run on it. But the road to full independence from Nvidia remains long. The Atlas 350 competes with the H20, not the H100 or the B200. It is an inference chip, not a training chip. And the CANN software ecosystem has yet to prove it can captivate developers the way CUDA did.

What is clear is that the gap is closing. Not overnight, but steadily. And that, for the global semiconductor industry, is a signal no one should ignore.

Main source: SCMP — Huawei challenges Nvidia with powerful new AI accelerator card