Squeezed between AI-driven demand for ever-faster computing, and shortages of the GPUs used to accelerate those workloads, hyperscalers are designing custom silicon for specific workloads to further improve performance while cutting costs.
Microsoft added two new chips-for-hire to the huge variety of hardware instances in its cloud computing catalog at its Ignite conference last week — and all eyes are on AWS to see whether it reinvents its custom chip offering at its own event next week.
Some computing tasks, such as training and running AI models, can be speeded up by running them on GPUs instead of CPUs — but not all tasks can, so in addition to filling their data centers with GPUs from the likes of Nvidia and AMD, cloud services providers are also developing new silicon to accelerate the performance of specific tasks.
“GPUs, despite revolutionizing performance-intensive workloads such as modeling, simulation, training, and inferencing in AI and machine learning (ML), are power hungry, and require additional cooling,” said Mario Morales, vice president analyst at market research firm IDC.
On top of that, GPUs are in short supply. So much so that chipmakers, such as Nvidia, AMD, and Intel, have started increasing production capacity and have been unveiling plans for introducing new processors. In October, Morgan Stanley analysts noted that Nvidia’s latest Blackwell GPUs are booked out for 12 months or so.
According to Morales, recent advances have created alternatives for IT buyers and service providers in the form of custom accelerators.
“These accelerators are becoming increasingly important in cloud infrastructure due to their superior price-performance and price-efficiency ratios, which lead to better return on investments,” he said.
Microsoft was a little late in joining the custom chip revolution. Its rivals had introduced custom chips for AI workloads years earlier, AWS in the form of Trainium and Inferentia, and Google in the form of Tensor Processing Units (TPUs), but it wasn’t until last year’s Ignite conference that Microsoft unveiled its first custom chips, Maia and Cobalt, to tackle internal AI workloads and make its data centers more energy efficient.
Speeding AI dataflows with DPUs, not GPUs
At this year’s event, Microsoft introduced two more chips: the Azure Boost DPU to accelerate data processing, and the Azure Integrated HSM module to improve security.
The Azure Boost DPU is a hardware-software co-design, specific to Azure infrastructure, that runs a custom, lightweight data-flow operating system that Microsoft claims enables higher performance, lower power consumption, and enhanced efficiency compared to traditional implementations.
Microsoft is also introducing a new version of its liquid-cooling sidekick rack to support servers running AI workloads, and a new disaggregated power rack co-designed with Meta that it claims will enable 35% more AI accelerators to fit in each server rack.
“We expect future DPU-equipped servers to run cloud storage workloads at three times less power and four times the performance compared to existing servers,” the company said in a blog post.
However, said Forrester senior analyst Alvin Nguyen, Microsoft is playing catch-up in the DPU space. He compared Microsoft’s Azure Boost DPU with Google’s E2000 IPU, which it co-developed with Intel. AWS, similarly, offers its Nitro system for DPU-oriented tasks. Other cloud providers use Nvidia Bluefield and AMD Pensando for these applications.
Custom silicon for security
Another area where custom silicon can deliver better performance is security.
Microsoft said its new Azure Integrated HSM module will enable encryption and signing keys to remain within the hardware boundary without compromising performance.
Nguyen said the chip performs a very specific function that would otherwise be handled through a combination of hardware and software, with greater exposure to attack and a greater performance hit, making it prone to latency and thus harder to scale.
Other cloud service providers, such as AWS and Google, though differing in their implementations, also offer security-based chips in the form of AWS Nitro and Google Titan.
“While Nitro provides the critical security function of ensuring that the main system CPUs cannot update firmware in bare metal mode, Titan provides a hardware-based root of trust that establishes the strong identity of a machine, with which we can make important security decisions and validate the health of the system,” Nguyen explained.
Analysts expect to see hyperscalers introduce more custom silicon to either tackle other workloads or try and achieve more efficiency. “Once you have the expensive capability to do custom chips, it’s logical to look at where you are sending the most margin to vendors, where that’s growing fastest, and apply it,” Alexander Harrowell, principal analyst at Omdia, said.
Source:: Network World