AWS announces EFA update for scalability with AI/ML applications

AWS announces the launch of a new interface type that decouples the Elastic Fabric Adapter (EFA) from the Elastic Network Adapter (ENA). EFA provides high-bandwidth, low-latency networking crucial for scaling AI/ML workloads. The new interface, “EFA-only”, allows you to create a standalone EFA device on secondary interfaces. This allows you to scale your compute clusters to run AI/ML applications without straining your private IPv4 address space or encountering IP routing challenges associated with Linux.

Previously, each EFA interface was coupled with an ENA device, which consumed an IP address. This could result in a scaling limit for growing AI/ML model training jobs. Linux could also introduce routing challenges when multiple interfaces with private IPs were used, such as packet drops because of source IP mismatch and host name mapping problems. EFA-only interfaces solve these challenges as the EFA device is not assigned an IP address because it uses the Scalable Reliable Datagram (SRD) protocol, which operates over MAC addresses. EFA-only interfaces can only be configured as a secondary interface, with the primary interface being either EFA coupled with ENA or just ENA, since ENA is required for TCP/IP VPC routing.

EFA-only is available on all EFA supported instances in all AWS Regions, including the AWS GovCloud (US) Regions and the AWS China Regions. You can enable EFA at no additional cost to run your AI/ML workloads at scale. To learn more, see the EFA documentation.

Source:: Amazon AWS