NCCL Deep Dive: Cross Data Center Communication and Network Topology Awareness

As the scale of AI training increases, a single data center (DC) is not sufficient to deliver the required computational power. Most recent approaches to…

As the scale of AI training increases, a single data center (DC) is not sufficient to deliver the required computational power. Most recent approaches to address this challenge rely on multiple data centers being co-located or geographically distributed. In a recently open-sourced feature, the NVIDIA Collective Communication Library (NCCL) is now able to communicate across multiple data centers…

Source

Source:: NVIDIA