Liquid cooling technologies: reducing data center environmental impact

Large data centers are necessary for driving next-gen technologies including large language models (LLMs) and generative and agentic AI. However, their environmental challenges are increasingly concerning.

“Strong action is necessary,” Microsoft researchers wrote in a new paper published in Nature. “However, without understanding the overall environmental implications of new technologies, organizations find it difficult to chart a path to their reduction goals.”

However, they wrote, liquid cooling is showing increasing promise. They found that advanced cooling methods can reduce greenhouse gas emissions, energy demand, and water consumption by anywhere from 15% to 82%.

Analyzing cold plate and immersion cooling

According to The Science Based Target Initiative (SBTi), a 42% reduction in greenhouse gas emissions must occur by 2030 to achieve net-zero and limit global warming to 1.5 degrees Celsius (2.7 degrees Fahrenheit).

Data centers consume anywhere from 10 to 50 times more energy per square foot than traditional office buildings, and they accounted for roughly 1.5% (about 300 terawatt hours) of global electricity demand in 2020, according to the US Department of Energy, and that’s only expected to increase. Further, cooling technologies can consume up to 40% of the total data center energy demand.

The Microsoft researchers used life cycle assessment (LCA) to determine the potential end-to-end environmental impact of cooling technologies in cloud data centers. It examines the impact of everything from the building through the equipment housed in it and the resources used during operation, all the way to data center end of life. This analysis could lead to sustainability by design, they noted.

Specifically, the researchers analyzed cold plate (direct-to-chip), one-phase immersion, and two-phase immersion cooling technologies.

Cold plate cooling places a metal plate next to the heat generator (the chip), with a coolant loop flowing through it to directly transfer heat from the server.

However, with immersion cooling, servers are fully submerged in tanks of dielectric fluid that absorbs 100% of generated heat. One-phase transfers heat from hardware to the fluid via convection, with pumps pushing fluid through the servers and out to be cooled before returning; two-phase removes heat using a fluid with a low boiling point (30 – 50 degrees Celsius) and vaporizing it; the vapor then is sent to a condenser coil where it recondenses to a liquid and is returned to the tank.

In their experiments, the researchers found that, compared to traditional air cooling:

  • Cold plate reduced greenhouse gases by 15% to 16%, energy use by 15% and water use by 31% to 50%;
  • One-phase decreased greenhouses gases by 13% to 16%, energy by 15% and water use by 45% to 80%;
  • Two-phase reduced greenhouse gases by 20% to 21%, energy by 20% and water use by 48% to 82%.

The researchers noted that water savings were even more significant (anywhere from an additional 13% to 48%) when liquid cooling was paired with 100% renewable energy due to renewable sources’ lower water use.

Benefits and drawbacks

There are, of course, pros and cons to each cooling method. Microsoft’s researchers found that cold plate systems are the easiest to retrofit and offer strong chip-level cooling without requiring full infrastructure redesign.

However, “air cooling uses by far the most electricity,” they noted. “And the copper components of a cold-plate cooling network must be replaced for each server during each IT cycle.”

Meanwhile, one-phase immersion shows strong environmental performance at lower complexity and cost than two-phase systems. At the same time, though, it requires the use of flammable hydrocarbon oils that may require new safety protocols.

Two-phase immersion, for its part, provides the highest energy and water efficiency. Still, it relies on per- and polyfluoroalkyl substances (PFAS)-based refrigerants composed of synthetic substances that do not break down easily and pose health concerns; thus they are facing increased regulatory scrutiny in the US and the EU.

Further, two-phase immersion can be more complex to deploy, making it better suited for new, high-density builds, the researchers said. By contrast, cold plate is the best scenario for retrofits, while one-phase immersion is ideal for mid-density new builds, they said.

All told, they noted, liquid cooling cuts cooling overhead and increases server density, allowing for more compute per square foot and reducing total infrastructure cost per workload. Liquid cooling also reduces failure rates, supports higher uptime/availability and longer equipment life, reducing maintenance and replacement costs.

Notably, these infrastructures are ready for the power-hungry, high-density infrastructure required by AI workloads, the researchers pointed out. They allow for overclocking (when a CPU’s clock rate is increased to process more instructions per second than its factory settings permitted) and tighter rack packing, thus increasing data center capacity.

“Highly optimized cold-plate or one-phase immersion cooling technologies can perform on par with two-phase immersion, making all three liquid-cooling technologies desirable options,” the researchers wrote.

Factors to consider

There are numerous factors to consider when adopting liquid cooling technologies, according to Microsoft’s researchers. First, they advise performing a full environmental, health, and safety analysis, and end-to-end life cycle impact analysis.

“Analyzing the full data center ecosystem to include systems interactions across software, chip, server, rack, tank, and cooling fluids allows decision makers to understand where savings in environmental impacts can be made,” they wrote.

It is also important to engage with fluid vendors and regulators early, to understand chemical composition, disposal methods, and compliance risks. And associated socioeconomic, community, and business impacts are equally critical to assess.

More specific environmental considerations include ozone depletion and global warming potential; the researchers emphasized that operators should only use fluids with low to zero ozone depletion potential (ODP) values, and not hydrofluorocarbons or carbon dioxide. It is also critical to analyze a fluid’s viscosity (thickness or stickiness), flammability, and overall volatility. And operators should only use fluids with minimal bioaccumulation (the buildup of chemicals in lifeforms, typically in fish) and terrestrial and aquatic toxicity.

Finally, once up and running, data center operators should monitor server lifespan and failure rates, tracking performance uptime and adjusting IT refresh rates accordingly.

The researchers noted: “Additional considerations for decision-making on cooling technologies are time-to-market, regulatory and supply chain landscapes, technology complexities, and implementation costs.”

Source:: Network World