
The multivendor Ultra Accelerator Link (UALink) consortium has published its first specification aimed at delivering an open standard interconnect for AI clusters – and providing an alternative to Nvidia’s proprietary NVLink. The UALink 200G 1.0 Specification defines a low-latency, high-bandwidth interconnect for communication between accelerators and switches in AI computing pods. It’s the only open scale-up interconnect for next generation AI workloads, UALink asserts.
“AI models are rapidly growing, demanding higher compute, memory, and interconnect performance,” UALink Consortium wrote in a white paper describing the spec. “The cost and complexity of delivering reliable scale‐up solutions is a significant burden for the entire industry. Scale‐up solutions are critical to distribute AI models across a Pod with 100s of accelerators. There is a growing demand from the industry to establish standards‐based scale‐up network solutions for training and inference workloads. The UALink Consortium’s mission is to establish an open standard to deliver a scalable, performant, resilient and cost‐effective networking solution for scale‐up connections.”
The UALink 200G 1.0 Specification was crafted by many of the group’s 75 members – which include AMD, Broadcom, Cisco, Google, HPE, Intel, Meta, Microsoft and Synopsys – and lays out the technology needed to support a maximum data rate of 200 Gigatransfers per second (GT/s) per channel or lane between accelerators and switches between up to 1,024 AI computing pods, UALink stated.
The UALink specification is based on the standard 802.3 Ethernet PHY, and UALink lanes can be configured into various groupings: a single‐lane Link (x1 Link), a dual‐lane Link (x2 Link), or a quad‐lane Link (x4 Link). A group of four lanes constitutes a Station, offering a maximum bandwidth of 800 Gbps each in transmit and receive directions. The number of accelerators and bandwidth allocated to each accelerator can be scaled to meet the demands of AI applications, according to UALink.
The UALink spec also defines security for its AI transmissions.
“The UALink security feature, referred to as UALinkSec, is intended to protect traffic on a UALink network and switches from a physical adversary; the adversary might be present at the time of the attack or may have placed a device (e.g., an interposer) to snoop or tamper with the UALink traffic,” the white paper states. “When enabled, UALinkSec provides data confidentiality and optional data integrity (including replay protection). UALinkSec supports encryption and authentication of all the UPLI protocol channels – requests, read responses, and write responses.”
UALink’s primary target for now is to provide an alternative to Nvdia’s high-bandwidth, low-latency, direct interconnect technology for CPU, GPU-to-GPU connectivity, NVLink. NVLink is primarily used in InfiniBand-based networks.
Given the spec’s Ethernet heritage, UALink is seen in most circles as working hand-in-hand with the Ultra Ethernet Consortium to help expand the massive Ethernet community with AI technology reinforcements. Many members of the UALink group are also developing UEC specifications which are aimed at developing physical, link, transport and software layer Ethernet advances for the AI connectivity arena.
“UALink is at the vanguard of innovation in the artificial intelligence and machine learning domains, providing an open ecosystem path to a dedicated accelerator interconnect leveraging the ubiquitous Ethernet ecosystem,” the UALink group wrote. “By incorporating UALink Switches, accelerators with UALink capability, can expand the scale‐up domain, creating ultra‐high bandwidth multi‐node accelerator pods. UALink also enables a simple software model by supporting load/store operations across an entire pod of up to 1024 accelerators.”
Now a market around UALink needs to develop to be a true alternative to Nvidia. Currently, only UALink member Synopsys has an actual UALink-based product – it rolled out a UALink IP controller, PHY, and verification IP package late last year. Other members are expected to follow suit.
“With the release of the UALink 200G 1.0 Specification, the UALink Consortium’s member companies are actively building an open ecosystem for scale-up accelerator connectivity,” said Peter Onufryk, UALink Consortium president, in a statement. “We are excited to witness the variety of solutions that will soon be entering the market and enabling future AI applications.”
Source:: Network World