
The next generation of AI chips won’t just be faster — they will consume unprecedented amounts of power and force fundamental changes in data center infrastructure.
A new report by the TeraByte Interconnection and Package Laboratory (TeraLab) at Korea Advanced Institute of Science and Technology (KAIST) shows that GPU-HBM (high-bandwidth memory) modules could demand up to 15,360 watts per unit by 2035, pushing existing power and cooling technologies to the brink.
The report outlines the scaling trajectory of high-bandwidth memory from HBM4 in 2026 to HBM8 by 2038. Each step promises higher performance, but also exponentially higher energy requirements and thermal output.
“GPU power alone is forecasted to climb from 800W in 2026 to 1,200W by 2035,” the report said. “When paired with up to 32 HBM stacks, each drawing 180W, the total power envelope of a module can reach 15,360W.”
Why AI acceleration is rewriting power budgets
AI workloads, particularly large language models (LLMs) and foundation models, require vast memory bandwidth and processing power. The roadmap projects that HBM8 will deliver up to 64 terabytes per second of memory bandwidth using 16,384 I/O interfaces. Each stack could house up to 240GB of memory.
“Each HBM8 stack could house up to 240GB of memory and consume 180W,” the report said. “Multiple stacks integrated into a single AI processor multiply both compute throughput and heat output.”
That scaling comes with significant implications. Today’s server-class GPUs typically operate within a 300W to 600W range. Moving toward 1,200W GPUs bundled with dense memory layers redefines rack-level power design and triggers architectural changes far beyond the chip itself.
“Energy consumption is the biggest bottleneck for the advancing AI growth,” said Neil Shah, VP for research at Counterpoint Research. “As we move from Generative to Agentic to Physics AI, the compute requirement is growing exponentially.”
The cooling challenge
As chip power grows, thermal management becomes a critical engineering problem. The report makes it clear: conventional air-cooling is no longer viable.
“Only the molding compound part can be customized for reducing thermal coupling effects,” the report explained. The heat generated from densely stacked memory modules exacerbates the already high thermal output of next-gen GPUs.
To address this, TeraLab proposes several advanced cooling innovations. These include direct-to-chip liquid cooling, immersion systems, and thermal transmission lines integrated directly into packaging. Fluidic through-silicon vias (F-TSVs) will play a key role in removing heat from within stacked dies.
The report also mentions Cu-Cu bump-less bonding, thermal sensors embedded in base dies, and intelligent control mechanisms that allow chips to respond to thermal conditions in real time.
Rethinking chip architecture from the ground up
The rising power density is not only forcing new cooling strategies, but it is also influencing how chips themselves are designed. The roadmap introduces the concept of HBM-centric computing, where processors, controllers, and accelerators are co-packaged within the HBM substrate.
“Through design customization between memory and processor companies, HBM7 is expected to be integrated directly above the GPU,” the report noted.
This approach enables full 3D integration using vertically stacked dies and double-sided interposers. It improves performance but concentrates heat generation in an even smaller footprint, further complicating thermal design.
The bigger picture: infrastructure, grid, and geography
The implications extend well beyond chip packaging. A 15KW module significantly alters rack-level power distribution, cooling loop planning, and facility-wide thermal zoning. The U.S. Department of Energy estimates that cooling already accounts for nearly 40% of data center energy use. These next-gen AI chips will drive that figure even higher.
“The power requirements outlined in the KAIST roadmap signal not just a thermal or architectural challenge, but an impending crisis of coordination between compute timelines and utility readiness,” said Sanchit Vir Gogia, CEO at Greyhound Research. “These electrical densities simply cannot be supported by existing grid infrastructure in most regions.”
He added that while hyperscalers are reserving gigawatt-class electricity allotments up to a decade in advance, regional utilities struggle to upgrade transmission, often requiring 7 to 15 years for execution. “Speed-to-power is now eclipsing speed-to-market as the defining metric of digital competitiveness.”
“Dublin imposed a 2023 moratorium on new data centers, Frankfurt has no new capacity expected before 2030, and Singapore has just 7.2 MW available,” said Kasthuri Jagadeesan, Research Director at Everest Group, highlighting the dire situation.
Electricity: the new bottleneck in AI RoI
As AI modules push infrastructure to its limits, electricity is becoming a critical driver of return on investment. “Electricity has shifted from a line item in operational overhead to the defining factor in AI project feasibility,” Gogia noted. “Electricity costs now constitute between 40–60% of total Opex in modern AI infrastructure, both cloud and on-prem.”
Enterprises are now forced to rethink deployment strategies—balancing control, compliance, and location-specific power rates. Cloud hyperscalers may gain further advantage due to better PUE, renewable access, and energy procurement models.
“A single 15,000-watt module running continuously can cost up to $20,000 annually in electricity alone, excluding cooling,” said Manish Rawat, analyst at TechInsights. “That cost structure forces enterprises to evaluate location, usage models, and platform efficiency like never before.”
The silicon arms race meets the power ceiling
AI chip innovation is hitting new milestones, but the cost of that performance is no longer just measured in dollars or FLOPS — it’s in kilowatts. The KAIST TeraLab roadmap demonstrates that power and heat are becoming dominant factors in compute system design.
The geography of AI, as several experts warn, is shifting. Power-abundant regions such as the Nordics, the Midwest US, and the Gulf states are becoming magnets for data center investments. Regions with limited grid capacity face a growing risk of becoming “AI deserts.”
“Electricity will become a first-class constraint for AI scale-up. Success at scale will depend not just on compute capacity, but on where and how efficiently it’s powered,” said Kalyani Devrukhkar, senior analyst at Everest Group.
For enterprises building AI infrastructure, the message is clear: future-ready means power-aware. Planning for AI performance must now go hand-in-hand with power budgeting, energy sourcing, emissions visibility, and grid proximity. In the AI era, energy is the constraint frontier, and those who ignore it may be forced to throttle their ambitions.
Source:: Network World