Huawei showcases CloudMatrix 384 AI system to rival Nvidia’s flagship

China’s Huawei Technologies presented its CloudMatrix 384 AI computing system to the public for the first time last week, drawing significant attention at the World Artificial Intelligence Conference (WAIC) in Shanghai.

Initially announced in April, CloudMatrix 384 has been closely watched by industry observers, with some analysts positioning it as Huawei’s answer to Nvidia’s GB200 NVL72 – the US chipmaker’s most advanced AI computing solution currently available.

The launch underscores Huawei’s ongoing efforts to compete at the high end of the AI hardware market amid tightening US export restrictions.

Its emergence has also fueled debate over whether Huawei could begin to replace or meaningfully rival Nvidia in key AI infrastructure deployments, particularly within China.

AI hardware face-off

Huawei’s CloudMatrix 384 and Nvidia’s GB200 NVL72 can be compared at two levels, according to Fab Economics: the chip level, which pits Huawei’s Ascend 910C against Nvidia’s GB200, and the system level, where the overall performance of the full AI infrastructure is evaluated.

While Nvidia significantly outperforms Huawei at the chip level, Huawei gains an advantage at the system level by integrating five to six times more compute and HBM chips.

“The mathematics used to overcome physics at the chip level by Huawei is simple: having five times as many Ascend 910C chips more than offsets each GPU being only one-third the performance of an Nvidia Blackwell B200,” said Danish Faruqui, CEO of Fab Economics.

At the chip level, Nvidia’s Blackwell B200 GPU delivers 2,500 teraflops of performance, more than three times that of Huawei’s Ascend 910C, Fab Economics’ research said.

The B200 also offers 192 GB of high-bandwidth memory (HBM) per GPU, using eight HBM3E (8-high) modules. In contrast, Huawei’s 910C uses an earlier HBM generation, delivering 128 GB per GPU.

The analysis also showed that Nvidia’s HBM bandwidth reaches 8 terabytes per second per GPU, approximately 2.5 times higher than Huawei’s.

System-scale gains, energy costs

At the system level, the PFLOP performance of Nvidia’s GB200 NVL72 system stands at 180, which is 40% lower than Huawei’s CloudMatrix 384 system, Faruqui said.

“However, the downside of Huawei’s CloudMatrix 384 system lies in its power consumption,” he added. “It uses more than four times the power of Nvidia’s GB200 NVL72, and on a per-PFLOP basis, its power consumption is 2.5 times higher, making large-scale deployment difficult.”

This high power consumption could limit large-scale deployments, although it may be more viable in China, where electricity costs are up to four times lower than in the US.

“Two other major issues limiting the scale-up of Huawei’s CloudMatrix 384 system are the total cost of ownership across CAPEX and OPEX at the system level, and the availability of compute and HBM chips,” Faruqui added.

Challenging software dominance

While Huawei’s CloudMatrix 384 aims to rival Nvidia’s hardware performance, closing the gap on the software front is a far more complex challenge.

Nvidia’s closed CUDA ecosystem embeds significant switching costs for developers. Moving away from CUDA often means rewriting large portions of code, forfeiting access to highly optimized libraries, and giving up support from a vast and experienced developer community built around Nvidia’s tools.

However, in the post-2025 “open-source era”, the AI development landscape is shifting.

Faruqui noted that many machine learning developers no longer write code directly in CUDA. Instead, they use higher-level languages such as Python, relying on frameworks like PyTorch and JAX. These frameworks abstract much of the underlying hardware, making it easier to shift between platforms.

“Huawei’s AI software stack is expanding with a growing suite of tools designed to mirror the utility of CUDA’s broader ecosystem, along with deeper integration with PyTorch, the most widely adopted machine learning framework, which by default pairs seamlessly with CUDA,” Faruqui said.

In addition, Huawei is investing in ONNX (Open Neural Network Exchange), a cross-platform standard that enables models trained on other hardware to run efficiently on Huawei chips.

While Nvidia’s ecosystem still presents a significant barrier, Huawei’s strategy of embracing open standards and improving compatibility with widely adopted frameworks is gradually lowering the hurdles for enterprise adoption.

Source:: Network World