Could GPU power levels break the data center ecosystem?

  • Chip power levels have rapidly increased from just a couple hundred watts to 1,000-watts-plus

  • These power levels and increasing rack densities are creating cooling and - more importantly - power supply issues for data centers

  • Chipmakers and hyperscalers think they can handle the problem before it comes to a head

In the grand scheme of things, a GPU is just an itty bitty piece of silicon. But these tiny pieces of technology are increasingly dictating the big changes being made by data center operators looking to deploy them. So, we wondered, is there an upper limit to what GPU specs the ecosystem is willing – or able – to support?

The subject isn’t necessarily one certain folks in the industry – looking at you, Nvidia, Microsoft and Google Cloud – want to discuss. But data center executives have repeatedly highlighted the power issues in conversations with Fierce Network, DataBank CEO Raul Martynek most recent among them.

“It’s a golden age right now to be in the data center sector. Whenever there’s a golden age someone asks what could go wrong. Ultimately, there are a number of things that concern us. Number one, you look at some of the estimates around what Nvidia wants to ship in terms of GPUs, what AMD wants to ship, what Qualcomm wants to ship and you do the math and you’re like ‘wait a minute, that’s a lot of power,’” he said.

For instance, AMD’s CEO Lisa Su said in December 2023 the company believes the data center AI chip market could worth $400 billion by 2027. (For what it’s worth, AI chips tend to be among the higher-power options on the market. See also: Nvidia’s new 1,000-watt Blackwell offering.)

According to Martynek, this trend of shipping more power-sucking chips just isn’t sustainable.

“Power is the ultimate constraint here,” he said. “There’s still not enough awareness around that dynamic. What it means is it’s going to become a scarcer and scarcer resource and it’ll have implications for our sector.”

Martynek isn’t alone in this belief. Henri Richard, president of Rapidus in the Americas, made a similar comment to our sister site Fierce Electronics in an interview. Rapidus is building a huge chip fab in northern Japan with financial assistance from the Japanese government and large private investors including Sony and Toyota. Richard said the chip industry’s rising power trend is “not sustainable” and argued “something has to change.”

Dell’Oro Group Research Director Lucas Beran said theoretically there is “an upper limit to chip/processor densities that the data center ecosystem can support.” While the industry isn’t close to that limit yet, Beran said higher power levels are certainly still presenting “challenges for the data center industry.”

What’s the problem?

The issue boils down to rising GPU power levels, or the amount of energy required to run each chip. Power levels used to hover in the lower hundreds of watts a decade ago, but the chipmakers are starting to release GPUs that require 1,000 watts or more. That’s creating problems in terms of power supply and cooling.

As Beran noted, the issues crop up in three separate areas. On the individual processor level, thermal management is the primary limiting factor – that is, folks deploying these chips need to find ways to effectively cool them to ensure they can function at peak performance levels.

Zooming out a tad, you then run into rack power density challenges, Beran said. While a single high-power chip is technically more energy efficient than several lower-power chips, high-power chips are hotter, much hotter. Obviously, shoving several of these chips in a single rack not only makes the whole thing one giant furnace but also creates a power-gulping monstrosity.

Direct and immersion liquid cooling solutions are emerging to address the cooling issue. Unless you’ve been living under a rock, you’ve probably seen our coverage of those. But the power question is a bit trickier.

Beran explained: “This adds a new challenge of 120+ kW of power distribution to the rack. DCPI vendors don’t have off the shelf products to support this level of power delivery, so custom and semi-custom products need to be used. If we look at rack power distribution units (rPDUs), you may also need to increase the quantity in the rack from 1 to 2 rPDUs to 2 to 4 to power all the IT appliances in the rack.”

And zooming out even further, you run into grid-level power availability issues as more of these uber racks are deployed in data centers.

“Data centers are developing so rapidly that utilities building infrastructure to provide power to the data centers can’t keep up,” Beran said.

In many places the power grid is already tapped out. Take Northern Virginia in the U.S. for instance, a market that is the largest in the world in terms of data center capacity (so big, in fact, that according to Dominion Energy, it is larger than the next five largest U.S. data center markets or next four largest international data center markets combined.) The market has notoriously been power constrained for the past few years and it doesn’t look to be improving much anytime soon.

Utilities both there and in Georgia have recently revealed that recent projections for peak power demand through 2030 calculated as recently as 2022 are now wildly off.

Dominion Energy in Virginia, for instance, in 2022 predicted peak summer power consumption would come in around 25,000 megawatts in 2030. Its revised forecast now calls for that number to come in around 34,000 megawatts.

And Georgia Power, meanwhile, in 2022 predicted its winter peak demand would come in around 15,800 megawatts in 2030. It is now expecting peak demand to hit around 21,000 megawatts that year.

Data center operators are skirting this problem with a series of temporary workarounds, expanding into secondary markets and (wildly enough) eyeing the deployment of technologies like micro-nuclear reactors to provide their own on-campus power generation.

From the horse’s mouth

What do the chip makers and hyperscalers have to say to all this? Matt Hamblen, senior editor of our sister site Fierce Electronics, asked industry leaders.

AMD told Fierce that last year’s introduction of its MI300X accelerator is putting the company closer to its goal of a 30x improvement in energy efficiency by 2025, based on a 2020 baseline. But AMD’s Sam Naffziger, SVP and product technology architect, warned last year that the industry can’t rely on smaller transistors alone and will need a holistic design perspective that includes packaging, architecture, memory, software and more.

Intel, meanwhile, has made a push for a neuromorphic system to enable sustainable AI with a system code-named Hala Point that relies on the Loihi 2 processor.

“Intel is looking at future computing technologies as a solution for AI workloads, namely neuromorphic, that promise to deliver greater computing performance at much lower power consumption,” said Jennifer Huffstetler, Intel’s chief product sustainability officer, told Hamblen via email.

And among the big three U.S. hyperscalers, Amazon Web Services was the only one willing to go on record. A spokesperson told Fierce Network “there are of course deep engineering challenges in cooling and power delivery of data centers, but AWS is well set to handle it given we own the design of our data centers.”

“The big challenge for the data center industry is not solving one of these problems, but the complexity of solving all of these challenges, while maintaining the mission-critical reliability that data centers operate with,” Dell’Oro’s Beran concluded.

“These challenges aren’t going away anytime soon, and will probably only grow, but the data center ecosystem is innovating to try and maintain the rapid growth trajectory the industry is currently on.”


This article is a joint effort between Fierce Network and Fierce Electronics. Read Hamblen’s full deep dive into the chip side of the equation here on Fierce Electronics.