
Google has unveiled a new chip — Ironwood — that could help enterprises accelerate generative AI workloads, especially inferencing — the process used by a large language model (LLM) to generate responses to a user request.
The inferencing-only semiconductor market, according to IDC, is projected to be $191 billion in size by 2030, with AI ASICs for inferencing at $27 billion or 14.3% of the market.
ASICs or application-specific standard product chips are processors that are designed for a specific task and unlike FPGAs or field programmable gate arrays they cannot be reprogrammed to fit other circuits or functions.
Unveiled at Google’s annual Cloud Next conference, Ironwood is the company’s seventh generation of Tensor Processing Unit (TPU), which it claims is designed for reasoning models that provide a “proactive” generation of insights and interpretation.
This means that Ironwood is more suited for agentic AI workloads or agents that are expected to operate autonomously.
To help inferencing tasks for frontier models, the new TPU has been designed to minimize data movement and latency on the chip while processing massive amounts of data, the company said in a statement.
More performance per watt than Trillium
According to Google, Ironwood offers twice the per-watt performance compared to its predecessor — Trillium — which was launched in April last year.
The gain in performance while using the same amount of power also results in the chip being more cost-effective as it delivers more capacity per watt, Amin Vahdat, VP of machine learning, systems, and Cloud AI, wrote in a blog post.
Ironwood pod can also scale to up to 9,216 chips, compared to its predecessors TPU v5p and TPU v4 which support up to 8,960 and 4,896 chips respectively.
Ironwood also offers 6x the bandwidth memory of Trillium at 192GB per chip compared to 95GB and 32GB for TPU v5p and TPU v4.
The higher bandwidth memory capacity is crucial for processing larger models and datasets, reducing the need for frequent data transfers and improving performance, Vahdat wrote.
Google has also upgraded Ironwood’s HBM bandwidth and Inter-Chip Interconnect (ICI) bandwidth to 4.5x and 1.5x of Trillium respectively.
While the improved HBM bandwidth will aid in running more intensive AI workloads, faster communication speeds between chips will enable efficient distribution of workload while training LLMs or inferencing, Vahdat said.
Ironwood also comes with SparseCore and Pathways as features. While SparseCore is a specialized accelerator for processing “ultra-large” embeddings, Pathways is Google’s proprietary machine learning runtime software that enables efficient distributed computing across multiple TPU chips, especially across pods.
Better than Nvidia and AMD
According to analysts, Ironwood offers more performance per dollar than the competition, saying that Google’s seventh-generation TPU is a cut above the rest and very similar to GPUs from Nvidia and AMD.
“Google’s Ironwood TPU demonstrates significant improvements over AWS Trainium and Microsoft Maia 100,” said Dylan Patel, chief analyst at SemiAnalysis.
Omdia principal analyst Alexander Harrowell said that Maia is similar in scale to Google’s TPU v4.
“Until Ironwood, you could see a lot in common between the AWS, Microsoft, and Google ASICs – all three companies had gone with big clusters of relatively small ASICs. But Ironwood is much bigger than previous TPUs,” Harrowell added.
Harrowell pointed out that Ironwood’s HBM and ICI bandwidth match the performance of Nvidia and AMD GPUs.
Who will use Ironwood?
Analysts say that Ironwood is likely to see adoption from enterprises focused on developing, tuning, and running large-scale custom inference workloads — especially those with well-defined use cases that can’t be served by off-the-shelf solutions.
However, IDC research director Brandon Hoff pointed out that software developers and data scientists who are already locked into CUDA are likely to stay with Nvidia GPUs offered by Google and other cloud service providers.
“For software developers who have flexibility in which software stacks they develop, they can take advantage of Ironwood,” Hoff said, adding that if and when enterprises move to Ironwood they may face some amount of vendor lock-in.
Alternatively, SemiAnalysis’ Patel pointed out that Ironwood’s biggest adopter could be Google itself and the cloud services provider will use the TPU to run internal workloads, including Search, Ads, YouTube, and Workspace AI features.
“The performance and efficiency benefits are attractive, and means to adoption will be targeted rather than broad-based,” Patel said. However, Alletto warned that Ironwood’s success will depend on how quickly and broadly Google makes it available throughout its ecosystem as service availability sometimes runs behind demand, especially for larger-scale compute offerings.
Source:: Network World