- Huawei desires UB-Mesh to unify fragmented interconnect requirements throughout large AI clusters
- UB-Mesh design blends CLOS spine with multidimensional rack-level meshes for scalability
- Conventional interconnects develop too costly at large-scale deployments
Huawei has revealed plans to open supply its UB-Mesh interconnect, a system geared toward unifying how processors, reminiscence, and networking gear talk throughout large AI knowledge facilities.
The UB-Mesh design combines a CLOS-based spine on the knowledge corridor stage with multi-dimensional meshes inside every rack.
By combining these topologies, Huawei claims it might probably hold prices below management whilst system sizes scale into tens of 1000’s of nodes. It additionally hopes to resolve the difficulty of scaling AI workloads, the place latency and {hardware} failures pose obstacles.
Changing fragmented requirements with a single framework
The transfer is pitched as a method to exchange a number of overlapping requirements with a single framework, doubtlessly reshaping how large-scale computing infrastructure is constructed and operated.
In easy phrases, Huawei desires to exchange in the present day’s combine of various connection guidelines with one common system, so all the pieces hyperlinks collectively extra simply and cheaply.
“Subsequent month we’ve a convention, the place we’re going to announce that the UB-Mesh protocol will probably be revealed and disclosed to anyone like a free license,” mentioned Heng Liao, chief scientist of HiSilicon, Huawei’s processor arm.
“This can be a very new expertise; we’re seeing competing standardization efforts from totally different camps. […] Relying on how profitable we’re in deploying precise techniques and demand from companions and clients, we will speak about turning it into some type of commonplace.”
One of many central arguments behind UB-Mesh is that conventional interconnects develop too costly at scale, ultimately costing greater than the accelerators they’re meant to attach.
Huawei factors to its personal demonstrations, the place an 8,192-node deployment was used as proof that prices don’t must rise linearly.
That is framed as important for the way forward for AI techniques constructed with thousands and thousands of processors, high-speed networking units, and big storage arrays akin to the biggest SSD techniques utilized in cloud storage operations.
UB-Mesh is a part of a broader thought Huawei calls the SuperNode. This refers to a knowledge center-scale cluster the place CPUs, GPUs, reminiscence, SSD items, and switches can all function as in the event that they have been inside a single machine.
Bandwidth claims of over one terabyte per second per machine and sub-microsecond latency are being positioned as proof that the idea isn’t solely potential however vital for next-generation computing.
Nonetheless, requirements like PCIe, NVLink, UALink, and Extremely Ethernet have already got backing from a number of firms throughout the semiconductor and networking industries.
The query now could be whether or not the {industry} will settle for a brand new Huawei-backed protocol or proceed favoring requirements already supported by a wider vary of firms.
Huawei’s proposal, whereas bold, locations clients within the place of adopting a protocol owned and managed by one provider.
Even with open-source licensing, there are considerations about long-term interoperability, governance, and geopolitical dangers.
That mentioned, Huawei’s technical potential sounds spectacular, however its transfer calls for a level of industry-wide belief and adoption that it has but to safe.
Through Toms {Hardware}