
As artificial intelligence rapidly transforms business operations, enterprises are racing to upgrade their network infrastructure to meet the unprecedented demands of AI workloads.
Enterprise Management Associates (EMA) surveyed some 269 IT professionals involved in preparing their networks for AI applications and traffic. For nearly three-fourths (74%) had at least some AI applications in production and their corporate AI strategy was already in progress. These organizations already running AI application in production must equip their networks to keep pace with AI’s growth. EMA conducted the survey to understand better what steps network leaders are taking to arm their networks to succeed with AI applications and traffic.
“The goal was to get a sense of what early adopters of AI who are developing, training, and operating AI applications to transform their businesses in some way are doing. What are they doing to ready their networks to support the types of traffic that such applications generate, because a big topic of conversation has been those applications?” said Shamus McGillicuddy, research director for the network management practice at EMA, during a webinar discussing the survey results. “AI has very strict requirements for latency, and things like that can create a lot of congestion. There are a lot of challenges around it, from a performance, availability, and security perspective, and bandwidth requirements, which impact observability requirements because you need to be able to manage all of that stuff proactively.”
The EMA survey found that only 49% of respondents believe their data center networks are ready for AI traffic, and 48% said they believe their wide area networks are fully prepared for AI traffic. Among the respondents, 42% have established AI centers of excellence to lead their organization’s AI strategy. Respondents also reported that by the end of 2025 they will have several AI technologies in production, including:
- Proprietary large language models (LLMs): 58%
- Machine learning: 51%
- Open source LLMs: 34%
- Agentic AI: 32%
- Retrieval-augmented generation: 18%
Enterprise will also contemplate how and where to distribute AI workloads. According to the survey, enterprises reported that their training workloads in 2028 will reside in:
- Private data centers: 28.3%
- Traditional public cloud: 36.1%
- GPU as a service specialists: 19.1%
- Edge compute: 16.5%
The respondents indicated that their inference workloads by 2028 will reside in:
- Private data centers: 29.5%
- Traditional public cloud: 35.4%
- GPU as a service specialists: 18.5%
- Edge compute: 16.6%
“There is little variation from training to inference, but the general pattern is workloads are concentrated a bit in traditional public cloud and then hyperscalers have significant presence in private data centers,” McGillicuddy explained. “There is emerging interest around deploying AI workloads at the corporate edge and edge compute environments as well, which allows them to have workloads residing closer to edge data in the enterprise, which helps them combat latency issues and things like that. The big key takeaway here is that the typical enterprise is going to need to make sure that its data center network is ready to support AI workloads.”
AI networking challenges
The popularity of AI doesn’t remove some of the business and technical concerns that the technology brings to enterprise leaders.
According to the EMA survey, business concerns include security risk (39%), cost/budget (33%), rapid technology evolution (33%), and networking team skills gaps (29%). Respondents also indicated several concerns around both data center networking issues and WAN issues. Concerns related to data center networking included:
- Integration between AI network and legacy networks: 43%
- Bandwidth demand: 41%
- Coordinating traffic flows of synchronized AI workloads: 38%
- Latency: 36%
WAN issues respondents shared included:
- Complexity of workload distribution across sites: 42%
- Latency between workloads and data at WAN edge: 39%
- Complexity of traffic prioritization: 36%
- Network congestion: 33%
“It’s really not cheap to make your network AI ready,” McGillicuddy stated. “You might need to invest in a lot of new switches and you might need to upgrade your WAN or switch vendors. You might need to make some changes to your underlay around what kind of connectivity your AI traffic is going over.”
Enterprise leaders intend to invest in infrastructure to support their AI workloads and strategies. According to EMA, planned infrastructure investments include high-speed Ethernet (800 GbE) for 75% of respondents, hyperconverged infrastructure for 56% of those polled, and SmartNICs/DPUs for 45% of surveyed network professionals.
Respondents also indicated they plan to adopt network protocols to support AI workloads. For instance, 67% plan to adopt Ethernet and 33% plan to invest in InfiniBand. Nearly two-thirds (64%) will invest in RoCE, or RDMA over Converged Ethernet, which is a network protocol that enables Remote Direct Memory Access over standard Ethernet networks. And 42$ said they plan to adopt NVMe over Fabrics, which is a protocol that extends the Non-Volatile Memory Express standard to allow hosts to access storage devices over a network fabric, such as Ethernet or Fibre Channel.
“Ethernet is definitely dominant in these data centers, but there is a lot of interest in InfiniBand and NVMe over fabric for an alternative approach to handling AI traffic in their data centers,” McGillicuddy said.
Source:: Network World