Modernizing the Data Center with Accelerated Networking

Accelerated networking combines CPUs, GPUs, DPUs (data processing units), or SuperNICs into an accelerated computing fabric specifically designed to optimize…

Accelerated networking combines CPUs, GPUs, DPUs (data processing units), or SuperNICs into an accelerated computing fabric specifically designed to optimize networking workloads. It uses specialized hardware to offload demanding tasks to enhance server capabilities. As AI and other new workloads continue to grow in complexity and scale, the need for accelerated networking becomes paramount.

Data centers are the new unit of computing, and modern workloads are starting to challenge network infrastructure as networking services place further strains on the CPU. The network infrastructure, with an agile, automated, and programmable framework with accelerators and offloads, is key to unlocking the full potential of AI technologies and driving innovation. 

This post explores the benefits and implementation tactics of accelerated networking technologies in data centers, highlighting their role in enhancing performance, scalability, and efficiency.

Accelerating your network

Network acceleration requires optimizing every aspect of the network, including processors, network interface cards (NICs), switches, cables, optics, and networking acceleration software. Leveraging lossless networking, remote direct memory access (RDMA), adaptive routing, congestion control, performance isolation, and in-network computing will help organizations unlock the full potential of modern applications, including AI.

Maximum efficiency across shared networks can be obtained by properly controlling ‌data injection rates. When dealing with large data flows, Ethernet switches that implement adaptive routing algorithms can dynamically load-balance the data across the network, prevent congestion, and reduce latency. Switch multipathing and packet spraying techniques can further enhance network efficiency, ensuring timely data arrival and minimizing bottlenecks. This prevents data collisions between the switch and NICs or DPUs, while traffic flow isolation techniques ensure timely delivery by preventing one flow from negatively impacting others.

Another optimization technique is to deploy SuperNICs and DPUs. A SuperNIC is a type of network accelerator for AI cloud data centers that delivers robust and seamless connectivity between GPU servers. A DPU is a rapidly emerging class of processor that enables enhanced, accelerated networking. With the help of SuperNICs and DPUs, workloads can be offloaded from the host processor to accelerate communications, enabling data centers to cope with the ever-increasing need to move data.

To implement accelerated networking, consider the following techniques.

Accelerated services 

Workloads have undergone a significant paradigm shift, transitioning to decentralization, splitting workloads through containers and micro-segmentation. This has caused a dramatic increase in in-network bandwidth between servers (east-west traffic). 

AI workloads are a distributed computing problem, requiring the utilization of multiple interconnected servers or nodes. This places a tremendous strain on the network and CPU. Workload decentralization requires re-examining network infrastructure to add accelerators to relieve the CPU and GPUs from processing networking, storage, and security services. This frees the CPU to focus on application workloads. Acceleration ensures high-speed, low-latency data transfers between these nodes, and enables efficient workload distribution and faster model training.

Network abstraction 

The move to highly virtualized data centers and cloud models is straining legacy networks. Traditional data center networks were not designed to support the dynamic nature of today’s virtualized workloads. Network abstraction, including network overlays, can run multiple separate, discrete virtualized network layers on top of the physical network. These are crucial in providing flexibility, scale, and acceleration. However, if not implemented properly, they can impede network flows. 

Network optimization

A vast amount of collected and processed data has moved workloads into a data-centric era. The availability of large datasets combined with technological advances such as machine learning and generative AI increase the need for more data to feed learning algorithms. A ramification of this data explosion is the need to move, process, retrieve, and store large datasets. 

Lossless networking can guarantee accurate data transmission without any loss or corruption and is vital for moving, processing, retrieving, and storing these large datasets. RDMA technology enhances networking performance by enabling direct data transfers between memory locations without involving CPUs. The combination of lossless networking and RDMA can optimize data transfer efficiency and reduce CPU and GPU idle time, enabling the efficient movement of data to feed modern applications.

End-to-end stack optimization

Modern workloads have unique network traffic patterns. Traditional workloads generate traffic patterns with many flows, small packets, and low variance. Traffic for modern applications involves large packets, fewer flows, and high variance, including elephant flows and frequent changes in traffic patterns. 

Adaptive routing algorithms are used to dynamically load-balance data across the network, preventing congestion and high latency for these new traffic patterns. Congestion control mechanisms, such as explicit congestion notification (ECN), also ensure efficient data flow and minimize performance degradation. To account for this, networks must be architected with an optimized end-to-end stack to accelerate new traffic patterns.

In-network computing

The large datasets of modern workloads require ultra-fast processing of highly parallelized algorithms and therefore are more complex. As computing requirements grow, in-network computing offers hardware-based acceleration of collective communication operations, effectively offloading collective operations from the CPU to the network. This feature significantly improves the performance of distributed AI model training, reduces communication overhead, and accelerates model convergence to eliminate the need to send data multiple times between endpoints and accelerates network performance.

Network acceleration reduces CPU utilization, leaving more capacity for CPUs to process application workloads. It also reduces jitter to improve data streams, and offers higher overall throughput, which enables more data to be processed faster.

Summary

Techniques for network acceleration continue to evolve and are becoming more specialized. The newest evolution will address AI workloads, which require consistent, predictable performance and compute and power efficiencies capable of running multi-tenant environments.

To learn more about building the most efficient, high-performance networks with acceleration, see the two whitepapers, NVIDIA Spectrum-X Network Platform Architecture and Networking for the Era of AI: The Network Defines the Data Center, and the ebook, Modernize Your Data Center with Accelerated Networking.

Source:: NVIDIA