
Huawei Cloud have announced its AI Compute Service powered by CloudMatrix384. The specifications of the Huawei CloudMatrix supernode will be upgraded from 384 cards to 8,192 cards. The supernodes can support a hyperscale cluster running on 500,000 to 1 million cards, thus providing robust AI compute, an invaluable resource in the intelligent era. Huawei Cloud also announced innovative memory storage with its Elastic Memory Service (EMS), which achieved an industry first by expanding video RAM with memory. This drastically reduces the latency of multi-round conversations on foundation models, greatly improving user experience.
Huawei Cloud has deployed fully liquid-cooled AI data centers in China’s Guizhou, Inner Mongolia, and Anhui. These AI data centers support 80 kW heat dissipation per cabinet, reduce power usage effectiveness (PUE) to 1.1, and offer AI-enabled O&M. This means enterprises do not need to reconstruct traditional data centers or build new ones. Instead, they require only a pair of optical fibers in order to connect to the data center and access efficient AI compute, as well as full-stack dedicated AI cloud services, on Huawei Cloud.
Zhang Ping’an pointed out that Huawei Cloud’s AI Token Service abstracts away the underlying technical complexity and directly provides users with the final AI computing results. This allows users to utilize the inference computing power in the most efficient way possible. The CloudMatrix384 supernode realizes the full pooling of compute, memory, and storage resources, decouples compute tasks, storage tasks, and AI expert systems, and converts serial tasks into distributed parallel tasks, greatly improving the inference performance of the system. In scenarios involving inference tasks with different latency requirements, such as online, nearline, and offline inference, CloudMatrix384 delivers an average inference performance per card that is 3 to 4 times that of H20.
Image: Zhang Ping’an, Huawei’s Executive Director of the Board and CEO of Huawei Cloud