OpenAI tests Google TPUs amid rising inference cost concerns

OpenAI has begun testing Google’s Tensor Processing Units (TPUs), a move that — though not signaling an imminent switch — has raised eyebrows among industry analysts concerned about the escalating costs of AI inference and its effects.

A spokesperson for OpenAI told Reuters that OpenAI’s AI lab is in early testing with some of Google’s tensor processing units (TPUs) but added that the company has no plans to deploy them at scale right now.

The tests come as OpenAI continues to scale its models and services, prompting scrutiny over how it plans to manage the growing financial and computational demands of running large language models (LLMs).

More so at a time when the demand, unavailability, and cost of Nvidia GPUs are at their peak.

At the heart of this conundrum is the rise in inferencing costs, analysts say.

“The accelerating pivot from training to inference or finetuning workloads where cost-per-query dominates operational economics is catalyzing mass adoption of alternative AI chips, other than Nvidia GPUs,” said Charlie Dai, principal analyst at Forrester.

OpenAI’s tests are a hint toward LLM providers increasingly exploring specialized hardware to rein in spiraling inference costs and improve efficiency as the usage of models surges, Dai added.

Barclays forecasts that chip-related capital expenditure for consumer AI inference alone is expected to approach $120 billion in 2026 and exceed $1.1 trillion by 2028.

Barclays also noted that LLM providers, such as OpenAI, are being forced to look at custom chips, mainly ASICS, instead of GPUs, to reduce the cost of inference to move toward profitability.

The case for Google TPUs

Inference consumes over 50% of OpenAI’s compute budget, and TPUs, specifically older ones, offer significantly lower cost-per-inference compared to Nvidia GPUs, Dai said, explaining the significance of TPUs for OpenAI.

“While older TPUs lack the peak performance of newer Nvidia chips, their dedicated architecture minimizes energy waste and idle resources, making them more cost-effective at scale,” Dai added.

Omdia principal analyst Alexander Harrowell also agreed with Dai.

“…a lot of AI practitioners will tell you they get (from TPUs) a better ratio of floating-point operations per second (FLOPS) — a unit of measuring computational performance — utilized to theoretical maximum performance than they do with anything else,” Harrowell said.

The analyst also pointed out that, contrary to popular belief, AI chips, in general, tend to last in the market longer than expected despite the rapid pace of evolution.

“A100s, A10s, and even T4s still sell. Google itself still offers TPU v2 and presumably has customers, and that’s older than the original Transformer paper,” Harrowell said.

There are currently five generations of TPUs open for sale via the Google Cloud Platform — v2, v3, v4, v5, and v6, aka Trillium. Within those, v5 has two sub-variants, v5p for performance and v5e for efficiency. Google documentation shows the presence of only v6e for Trillium. The cloud services provider also had an efficiency variant of the v4, called v4i, which was never offered outside Google.

In April, Google previewed Ironwood — its next generation TPU — which analysts say offers even better price-performance than its predecessor — Trillium — as well as chips from Nvidia, AMD, AWS, and Microsoft.

While OpenAI may have tested an Ironwood unit or two, a bulk order of the chip is not likely to be available at the moment, Harrowell said.

Diversifying suppliers with custom silicon from hyperscalers

Forrester’s Dai also pointed out that if and when OpenAI adds Google’s TPUs, it will diversify its suppliers, avoiding bottlenecks like GPU shortages and gaining leverage in pricing negotiations.

OpenAI’s current list of chip suppliers includes the likes of Microsoft, Oracle, and CoreWeave.

It also has the option of adopting custom silicon, such as AWS Tranium and Microsoft Maia — both of which are targeted at inferencing workloads or AI acceleration in general.

Contrary to other analysts, independent expert Thomas Dinsmore believes that OpenAI may be in the process of negotiating a special deal with Google to use TPUs for internal uses, such as testing, employee training, and cached applications.

Source:: Network World