The long-term impacts of AI on networking

AI is either going to spawn a golden age of profits and benefits or kill us all off, according to popular viewpoints on the subject. The latter view has been at least somewhat discredited; if you yell “AI” in a crowded theater these days, hardly anyone runs for the exits. The former one is still going on strong, and so network vendors have been quick to claim giant revenue growth potential in AI-driven changes. But wishing doesn’t make it so.

To almost everyone, AI really means generative AI services ranging from search augmentation to copilot tools associated with everything from writing emails to writing code. What delivers this massive intelligence injection? Networks. And so network traffic will explode from AI usage, and so will network equipment spending, right?

Of 195 enterprises who offered me comments on the impact of generative AI services on their wide-area network traffic, guess how many said it had a “material impact.” Answer: None.

Think about it based on your own experience with the most common form of generative AI: the augmentation of search. You now get an AI summary result first, followed by the traditional results. Suppose the summary is good; the only possible outcome would be that you’d need the other results less, and you’d be less likely to even look beyond that first page of results. Traffic then would be less. Suppose the summary was not useful. You’d stop looking at it, and you might see fifty or more extra characters taken up by what you’re ignoring, until search providers realized this and stopped paying gazillions for AI GPU hosting to generate them. No material impact.

Overall, enterprises think that the exchange of prompts/queries and results with generative AI won’t impact their networking. Of the 195, 28 said they could envision new applications, ones that involved things like AI analysis of video, that could impact this sort of traffic, but they believed that these applications would be run on their own edge facilities, near the data sources, and thus wouldn’t impact the wide-area traffic or expand their use of network services or capital equipment.

If supporting the use of generative AI is a networking dud, is the whole AI-network connection as big an opportunity as the humanity-destructive AI entity? No, because there’s still the challenge of training and running the AI model.

Only 21 of the enterprises who offered AI network comments were doing any AI self-hosting, but all who did and almost all of those who were seriously evaluating self-hosting said that AI hosting meant a specialized cluster of computers with GPUs, and that this cluster would have to be connected both within itself and to the main points of storage for their core business data. They all saw this as a whole new networking challenge.

Every enterprise who self-hosted AI told me the mission demanded more bandwidth to support “horizontal” traffic than their normal applications, more than their current data center needed to support. Ten of the group said that this meant they’d need the “cluster” of AI servers to have faster Ethernet connections and higher-capacity switches. Everyone agreed that a real production deployment of on-premises AI would need new network devices, and fifteen said they bought new switches even for their large-scale trials.

The biggest problem with the data center network I heard from those with experience is that they believed they built up more of an AI cluster than they needed. Running a popular LLM, they said, requires hundreds of GPUs and servers, but small language models can run on a single system, and a third of current self-hosting enterprises said they believed it is best to start small, with small models, and build up only when you had experience and could demonstrate a need. This same group also pointed out that control was needed to ensure only truly useful AI applications where run. “Applications otherwise build up, exceed, and then increase, the size of the AI cluster,” said users.

Every current AI self-hosting user said that it was important to keep AI horizontal traffic off their primary data center network because of its potential congestion impact on other applications. Horizontal traffic from hosted generative AI can be enormous and unpredictable; one enterprise said that their cluster could generate as much horizontal traffic as their whole data center, but in bursts rarely lasting more than a minute. They also said that latency in this horizontal burst could hamper application value significantly, stretching out both the result delivery and the length of the burst. They said that analyzing AI cluster flows was critical in picking the right cluster network hardware, and that they found they “knew nothing” about AI network needs until they ran trials and tests.

The data relationship between the AI cluster and enterprise core data repositories is complicated, and its this relationship that determines how much the AI cluster impacts the rest of the data center. The challenge here is that both the application(s) being supported and the manner of implementation have a major impact on how data moves from data center repositories to AI.

AI/ML applications of very limited scope, such as the use of AI/ML in operations analysis in IT or networking, or in security, are real-time and require access to real-time data, but this is usually low-volume telemetry and users report it has little impact. Generative AI applications targeting business analytics need broad access to core business data, but often need primarily historical summaries rather than full transactional detail, which means it’s often possible to keep this condensed source data as a copy within the AI cluster.

Where full transactional data is needed, then real AI users recommend thinking in terms of what the AI community calls RAG, which means retrieval-augmented generation. With RAG, the AI model uses traditional database queries to “flesh out” the model’s training data, which means it’s possible to design the whole process to minimize the amount of data that’s being drawn out by an AI prompt. As on user put it: “If you let your AI model boil your whole core data ocean, you’re going to create a lot of traffic and use up a lot of your data center network capacity.” The right AI application design, says the user, is more important than the network design in optimizing the network cost of AI.

The wider impact of AI self-hosting doesn’t come from its use, though, but from its training. All the current self-hosted AI users I’ve talked with say that training a model is almost surely more impactful on the enterprise network than running it. Here again, those with experience said that it was smart to start an AI journey with a pre-trained model, rely more on RAG to integrate company data, and contain the training problem that way, than to try to do a full model training of an LLM in-house.

Every current enterprise with a view on AI networking says that too much attention is paid to the “networking” part and not enough to the “AI.” Like any network mission, AI demands an understanding of both technology and traffic implications before you can start running cables and connecting things. Those who’ve already done an in-house AI project agree that they’d have done things better, and cheaper, if they’d done more work to understand what AI really needs in both hosting and connectivity.

Source:: Network World