AI inferencing requires an incredible amount of network traffic, and telcos must have the ability to keep up with that extreme demand. However, it’s estimated that traditional networks only use one-third of their capacity due to over-provisioning for peak loads.
Nvidia and SoftBank are promising to address this conundrum: The pair announced this week the first combined AI and 5G telecom network. In a trial, their new AI radio access network (RAN) infrastructure achieved carrier-grade performance while at the same time supporting AI inference workloads. SoftBank said it expects a return of up to 219% for every AI-RAN server it launches.
“The bandwidth increase will allow telcos to not only monetize AI infrastructure, but also monetize additional bandwidth,” Forrester senior analyst Octavio Garcia told Network World.
The two companies have been working on this infrastructure for years, and are both founding members of the new AI-RAN Alliance launched at Mobile World Congress 2024. According to their estimates, over a period of five years, telco operators can earn roughly $5 in AI inference revenue from every $1 of capital expenditure invested in AI-RAN infrastructure.
“SoftBank’s AI-RAN pilot marks a key milestone for the industry,” Ronnie Vasishta, Nvidia’s SVP of telecom, told Network World. “It demonstrates that telecom operators worldwide can reinvent themselves as essential leaders enabling the era of AI. Through this breakthrough in network technology, telco operators are at the ground floor of a once-in-a-lifetime opportunity.”
He pointed out that there is broad ecosystem support for AI-RAN and that “great progress is being made,” so Nvidia expects other partners to make announcements in this area in 2025.
Bringing AI as close as possible to enterprise
SoftBank performed an outdoor trial in Japan’s Kanagawa prefecture in which its AI-RAN infrastructure built on Nvidia AI Enterprise achieved carrier-grade 5G performance while using excess capacity to concurrently run AI inference workloads. These workloads included multimodal retrieval-automated generation (RAG) at the edge, robotics control, and autonomous vehicle remote support. SoftBank is calling the trial ‘AITRAS.’
In inferencing, pre-trained AI models interact with previously unseen data for predicting and decision-making. Edge computing moves this closer to data sources to hasten the process.
Garcia pointed out that the concept of edge intelligence has emerged in the last 18 months following the launch of ChatGPT. It pulls together enterprise edge (data centers), operational edge (physical branches), engagement edge (where enterprises interact with consumers) and provider edge (where AI-RAN sits).
This new partnership represents a trend in the market of “bringing AI as close as possible to the enterprise. Enterprises rely on providers for infrastructure for not only running model training, but also inferencing,” Garcia said.
Converting from cost center to revenue-generating asset
Traditional RAN infrastructure is designed using customized chips (application-specific integrated circuits) built solely for running RAN. By contrast, as Nvidia’s Vasishta explained, RAN and AI workloads built on Nvidia infrastructure are software-defined, and can be orchestrated or provisioned according to need.
This can accelerate the 5G software stack compliant with 5G standards to the same level, and in some cases exceeding, the performance/wattage of traditional RAN infrastructure, he said.
“This is groundbreaking, because a telecom operator can convert itself from solely a 5G provider to an AI services and 5G provider, opening potentially billions in new revenue from AI,” said Vasishta. “With AI-RAN, operators can convert their network infrastructure from being a cost center into a multi-billion-dollar revenue-generating asset.”
Garcia pointed out, “there is an appetite for telcos on monetization,” and he predicted that in the next 18 months, more telcos will join these types of initiatives. Further, in 18 to 24 months, some new offerings will hit the market. To match the pace of AI, that evolution “needs to be fast,” as competing providers race to build AI-ready enterprise infrastructure.
Telecom operators can start deploying just for AI first and add RAN as a software update later, he noted. Flexible edge infrastructures can also run workloads not connected to telecom networks. “That is the benefit of using a single architecture for both AI and RAN,” said Vasishta.
Nvidia and its ‘Midas touch’ increasingly engaging with telcos
John Byrne, research VP for IDC’s communications service provider operations and monetization industry practice, pointed out that Nvidia has been increasingly engaged with telcos in the past few years as AI has entered the scene.
AI-RAN takes three forms, he explained. The first is AI-for-RAN: This essentially applies AI to make radio more efficient and reliable to reduce costs, gain more capacity from limited spectrum, and provide service level agreements (SLAs) that can help drive more revenue.
This concept is embodied in open radio access networks (ORAN), the newer method of designing and deploying mobile networks to allow for interoperability between different vendor equipment, and there is a “host of operators” engaged in limited deployment of open networks where AI, and specifically GPUs from Nvidia, can play an important role.
Then there’s AI-on-RAN: This supports services that require radio and accompanying AI inferencing. Examples include connected cameras or vehicles that use AI-powered computer vision for services such as security and fleet management.
“This holds promise to generate new revenue opportunities,” said Byrne, “but is still in the early stages.”
Finally, there’s AI-and-RAN, which combines various network and computing resources in a virtualized environment to take advantage of excess capacity in the network to perform AI processing.
“This is the most difficult of the three approaches to execute commercially, and also the hardest to define use cases that the telcos could use to generate new revenue,” Byrne said.
The Nvidia-SoftBank announcement is essentially a combination of all of these forms of AI-RAN and “should be regarded as proof of concept at this point,” he said.
However, he added, “Nvidia has had a Midas touch in the past few years with regard to AI.” As such, he said he expects similar forthcoming announcements from Nvidia and other partners, particularly out of Mobile World Congress in March.
What about micro-edge computing?
While these types of AI-RAN architectures promise increased capacity, Garcia wondered: “Are we really going to see that bandwidth increase?”
He pointed out that micro-edge computing brings models right into devices for specific use cases, such as cameras leveraging real-time video analytics. These devices are already executing inferencing models, as opposed to the process of sending data back to cloud-native platforms, whether private, edge, or public.
“It’s kind of a friction there,” said Garcia. “The bandwidth that we all thought could be massive, sending back large amounts of data, could eventually be reduced by having micro-edges.”
Of course, not everyone will use micro-edge computing, he noted. “At the end of the day, AI-RAN and micro-services will balance one another.”
Source:: Network World