Liquid cooling becoming essential as AI servers proliferate

Average power densities—the energy consumption of servers in data center racks—have more than doubled over the past two years, from 8 kilowatts to 17 kilowatts per rack. And they’re expected to rise to as much as 30 kilowatts by 2027 as AI workloads increase.

That’s just the average. Individual racks can go much higher. Servers used to train AI models can consume more than 80 kilowatts per rack, and Nvidia’s latest GB200 chip, combined with its servers, can require densities of up to 120 kilowatts, according to data from McKinsey.

Most data centers operators say that once rack power density is over 20 kilowatts, air is no longer sufficient to cool them down. As of early 2024, 22% of data center operators were already using direct liquid cooling technology, the Uptime Institute’s cooling system survey shows.

In most cases, liquid cooling is deployed in a hybrid environment. In data centers with liquid cooling, typically only 10% of racks or fewer are using it.

But as AI workloads are deployed at a runaway pace, liquid cooling is becoming increasingly popular. In fact, Dell’Oro Group just raised its market forecasts for liquid cooling because of higher-than-expected adoption rates. The firm now predicts that the data center physical infrastructure market will grow by an annual 14% to $61 billion in 2029, up from its previous estimate of 13% growth, partly as a result of growth in the data center liquid cooling segment of the market.

“AI workloads will require 60 to 120 kilowatts per rack to support accelerated servers in close proximity,” said Tam Dell’Oro, founder of Dell’Oro Group. “While this jump in rack power density will trigger innovation and product development on the power distribution side, a bigger change is unfolding in thermal management — the transition from air to liquid cooling.”

The vast majority of data centers are still air cooled, says Scott Tease, vice president and general manager for AI and HPC at Lenovo. “It’s easy, people know it, it’s familiar.” Less than 3% of the servers in the world today are liquid cooled, he estimates.

But liquid cooling adoption is expanding twofold and threefold, “and it continues to accelerate,” Tease says. “It’s one of the fastest growing — if not the fastest growing — portion of the data center market today.”

According to research from to real estate and investment management firm JLL, air cooling is only efficient up to about 20 kilowatts. When racks exceed 20 kilowatts, the most effective cooling is a type of liquid cooling called active rear door heat exchanges, where air is still used to move heat away from the GPU to the server’s rear door. At 100 kilowatts, direct-to-chip liquid cooling is most effective. Above 175 kilowatts we get to the domain of immersion cooling.

Liquid cooling infrastructure is now the default when it comes to new data center construction, the firm reports. And while there’s a lot of new construction, it’s still not enough. McKinsey says that global demand for data center capacity is growing from 60 gigawatts today to at least 171 gigawatts — and as much as 300 gigawatts — by 2030.

Enterprises are already finding it difficult to get space in new, liquid-cooling-ready data centers. At the end of 2024, colocation vacancy in North America declined to a new all-time low of 2.6% despite several years of record construction levels. Any second-generation space that becomes available is leased again within weeks, JLL reports.

One solution is to try to squeeze AI servers into existing data centers. That can be a challenge, since it can be difficult to retrofit an existing air-cooled data center for liquid cooling. But it’s better than waiting for new space to open up — and it might be better for the environment.

“A lot of the carbon emissions of the data center happen in the build of it, in laying down the slab,” says Josh Claman, CEO at Accelsius, a liquid cooling company. “I hope that companies won’t just throw all that away and start over.”

In addition to the environmental benefits, upgrading an air-cooled data center into a hybrid, liquid and air system has other advantages, says Herb Hogue, CTO at Myriad360, a global systems integrator.

Liquid cooling is more effective than air alone, he says, and when used in combination with air cooling, the temperature of the air cooling systems can be increased slightly without impacting performance. “This reduces overall energy consumption and lowers utility bills,” he says.

Liquid cooling also allows for not just lower but also more consistent operating temperatures, Hogue says. That leads to less wear on IT equipment, and, without fans, fewer moving parts per server. The downsides, however, include the cost of installing the hybrid system and needed specialized operations and maintenance skills. There might also be space constraints and other challenges. Still, it can be a smart approach for handling high-density server setups, he says.

And there’s one more potential benefit, says Gerald Kleyn, vice president of customer solutions for HPC and AI at Hewlett Packard Enterprise. The hot liquid coming out of a data center can be used to heat other buildings or facilities.

“If you add up all the data centers in the world and the energy they’re consuming already, they’re bigger than many countries,” Kleyn says. “It behooves all of us to maximize the efficiency with which we’re using the energy and resources we have today.”

Data center operators have several different options for adding liquid cooling systems to legacy facilities.

Liquid cooling in a dry data center

Enterprises can take advantage of liquid cooling even in a facility that has no water anywhere near the racks. The trick is to replace a server’s internal fan with a liquid loop that goes from a cold plate sitting on top of the GPU to the door of the rack. Then the heat is dissipated into the air.

“Liquid to air is the term the industry has settled on,” says Accelssius’ Claman, whose company is launching a product like this in a couple of months.

Only having the liquid inside a server doesn’t offer the same energy savings as transferring the heat to a second, external liquid loop, but it’s a step in the right direction.

“There are many data centers, including on-prem, where it may not make sense to make an investment in facility liquid cooling,” says Bernie Malouin, founder at JetCool Technologies, a liquid cooling company. “You might not want to bring pipes into the white space room or the IT part of your facility, or you might be in a colocation that doesn’t have pipelines in the floor.”

There are self-contained liquid cooling options already available on the market — JetCool has one available through Dell, says Malouin.

“There are no piping or plumbing needs,” he says. “It picks up heat from the GPU and transfers it into the air using a liquid-assisted cooling system. Is that as high-performing as facility liquid cooling? No. Is it as efficient as the facility one? No to that, too. But it doesn’t require any infrastructure and allows you to deploy it as a bridge until you make a bigger investment in liquid cooling in a dedicated facility.”

And even this option saves on power — 15% in the case of the self-contained solution from Dell, says Malouin, due to the fact that it can eliminate the need for fans. “By spending less electricity on your fans, that frees up power, power that you can deploy on your AI rack.”

Self-contained liquid cooling systems can generally handle power densities of up to 50 kilowatts, he says.

Lenovo also has a self-contained liquid cooling solution, says Tease. “Neptune Air is a self-contained water loop,” he says. “You don’t even have to have any plumbing.”

Liquid cooling for an air-cooled server

Is there anything a company can do if it already has the AI servers, and they’re air cooled, but it wants to get the benefits of liquid cooling?

Yes. If a data center has some water connections already — for example, for its HVAC system — and new pipes can be added to take liquid to the server, then a rear-door heat exchanger can take the hot air coming out of the server and cool it down with water.

So the server itself doesn’t need to be replaced, which is good, because it can take time to get a new, liquid-ready AI server delivered. But it does require water pipes to the rack — which can be costly to add.

At colocation provider Flexential, all new data centers are pre-piped with liquid cooling capabilities, says Ryan Mallory, the company’s COO. Of course, these are all leased 12 to 24 months in advance. “Companies looking for capacity don’t necessarily have the opportunity to participate in those sites,” Mallory admits.

So companies are looking at using legacy data center space and upgrading it to support liquid cooling, he says.

“We put heat rejection units outside, on the roof, and put pipelines into the floor,” he says. “We get coolant distribution units, and then build a customized model for a group of racks — or even a single rack, if you want.”

That doesn’t come cheap, though — there’s an upfront cost of between $2 million to $7 million a megawatt, he says. And good luck trying to find a spot where someone’s already paid for the piping and is looking to vacate it. “We’re at historic lows in vacancy rates across North America,” says Mallory.

And it will take a few months to get it all put in place, he adds. “It takes time to do the right design, so make sure you engage early,” he says. “If you need liquid cooling in a month — that’s not a good position to put yourself in.”

Still, it’s faster than waiting for space to open up that already has all the pipes.

“People are selling capacity into 2026, 2027, and 2028 right now,” Mallory says. “If you’re an enterprise trying to live within a 12-month budget and you’re not willing to look at retrofitted capacity — well, you might have a hard time finding that.”

Liquid to the chip

Even better than getting liquid to the back door of a rack is getting liquid right into the server, next to the hot GPU itself. This combines the advantages of a self-contained liquid cooling system, where the liquid is circulating inside the server, with the advantages of a system that brings facility water to the rear door of the rack.

Now both halves of the system are liquid cooled, and the energy efficiencies are maximized. The two loops meet in a heat exchanger, where the heat passes from the internal system to the external one — without the liquids ever mixing.

“Facility water loops sometimes have good water quality, sometimes bad,” says My Troung, CTO at ZutaCore, a liquid cooling company. “Sometimes you have organics you don’t want to have inside the technical loop.”

So there’s one set of pipes that goes around the data center, collecting the heat from the server racks, and another set of smaller pipes that lives inside individual racks or servers. “That inner loop is some sort of technical fluid, and the two loops exchange heat across a heat exchanger,” says Troung.

The most common approach today, he says, is to use a single-phase liquid — one that stays in liquid form and never evaporates into a gas — such as water or propylene glycol. But it’s not the most efficient option.

Evaporation is a great way to dissipate heat. That’s what our bodies do when we sweat. When water goes from a liquid to a gas it’s called a phase change, and it uses up energy and makes everything around it slightly cooler.

Of course, few servers run hot enough to boil water — but they can boil other liquids. “Two phase is the most efficient cooling technology,” says Xianming (Simon) Dai, a professor at University of Texas at Dallas.

And it might be here sooner than you think. In a keynote address in March at Nvidia GTC, Nvidia CEO Jensen Huang unveiled the Rubin Ultra NVL576, due in the second half of 2027 — with 600 kilowatts per rack.

“With the 600 kilowatt racks that Nvidia is announcing, the industry will have to shift very soon from single-phase approaches to two-phase,” says ZutaCore’s Troung.

Another highly-efficient cooling approach is immersion cooling.

According to a Castrol survey released in March, 90% of 600 data center industry leaders say that they are considering switching to immersion cooling by 2030.

But immersion cooling does have its downsides. According to the survey, 38% of respondents are concerned about the potential for leaks, 31% say that it’s too time-consuming to implement, and 31% worry about maintenance challenges.

Colocation provider Equinix, which has more than 260 data centers in 72 markets, has liquid cooling already available at 100 of them — and continues to build new liquid-cooled facilities and retrofit older ones.

But it hasn’t seen much demand for immersion cooling yet, says Phil Read, the company’s senior director of data center services. “It’s definitely an area we’re monitoring,” he says. “It could be interesting in the future.”

Today, however, the technology still has some issues. For one thing, there’s the weight, he says. “You’re basically putting the weight of a small car on a data center floor.”

And then there’s the issue of the kinds of liquids that are used for immersion cooling. “Equinix has a strong sustainability posture and PFAS chemicals are very problematic for us,” he says. “But we see those used in immersion cooling — and in two-phase cooling as well.”

It’s not clear yet exactly how the industry will manage the power and cooling requirements of next-generation AI servers. Maybe AI will come up with some suggestions.

Source:: Network World