UPS for AI data centers: everything you need to know
- giorgio sbriglia
- Apr 12
- 7 min read
Updated: Apr 19
Intro: UPS for Artificial Intelligence workloads - what's different from traditional data centers
The UPS has historically been the heart of the datacenter’s electrical infrastructure. It enables uninterrupted operations at a stable voltage and allows servers to survive a power outage while the generators come online.
Since the redundancy system has historically been 2N, multiple generators would operate on one power path, requiring the gensets to synchronize with one another in datacenters larger than 2.5 MW (a typical generator and transformer size). The size of the battery pack would increase with the number of generators that need to synchronize.
With the introduction of 3+1 busbar architecture in AI datacenters, this requirement is significantly reduced. If each busbar is fed by its own generator and transformer, the battery backup time only needs to be 2–3 minutes at most, as the startup time for a generator is around 30 seconds.
This is good news, because AI datacenters can place significant stress on batteries, accelerating the degradation of their capabilities.

Indeed, traditional datacenters host several computers that operate independently from one another, whereas in AI datacenters the most common workload today is model training, where servers operate as a single supercomputer. During training, the workload is split across all servers in brief iterations. At the end of each compute iteration, computation is paused and data, typically model weights, is shared again across all servers before another round of computing begins. This behavior leads to rapid changes in power consumption.
Such oscillations have the following drawbacks:
If left unmanaged, they can affect both the grid and the generators upstream.
If managed with the batteries, acting as a reserve, they will rapidly reduce battery life.
Neither of these solutions is ideal.
Thankfully, brief oscillations can be smoothed out by the UPS inverter, output inductors/capacitors, and DC-link capacitors. This is essential to protect the batteries when they are acting as a power reserve to protect the grid and generators upstream. Each UPS has different capabilities, which is why we performed a dedicated AI test to verify Riello’s UPS ability to absorb the power swings caused by the GPUs.
Riello then developed a custom firmware that not only allows transition between the two modes, but also enables the batteries to be used only when needed, while allowing certain oscillations to propagate upstream.
The idea behind this is that both the generator and the grid can tolerate a certain level of perturbation. Therefore, the batteries should intervene only before the tolerance level of the generators or the grid is exceeded.
In summary, the strategy we adopted in Terakraft’s datacenter is the following:
Use the UPS to absorb part of the swing.
Allow residual and acceptable oscillations to be transferred to the grid.
Intervene with the batteries when swings exceed acceptable levels.
I can also make it sound more polished and blog-like while keeping the same technical meaning.
The UPS AI Datacenter Test
What we tested: Riello M2S UPS
To anchor the Terakraft project, engineers deployed the Riello M2S 1250, a modular UPS designed to bridge the gap between massive capacity and granular control. During the Factory Acceptance Test (FAT), the unit was evaluated in both 19-module and 20-module configurations to verify load sharing and N+1 redundancy under extreme conditions.
Feature | Specification |
Model | M2S 1250 CT1 F VE 2197 |
Rated Power | 1250kVA / 1250kW |
Configuration | 19 to 20 Power Modules (PM) |
Control Units | SCU (System Control Unit) & RCU (Redundant Control Unit) |
Firmware Versions | SCU 120 / RCU 120 / PM 118 |
This modularity is strategic. By utilizing dozens of individual Power Modules, the system distributes the thermal and electrical stress of AI surges across a massive, redundant array, ensuring no single point of failure during a GPU peak.
How we Tested


Large scale simulation of a GPU workload would have required actual deployment of GPUs at scale. Standard testing protocols are insufficient for AI. At the Riello test bay in Cormano, engineers realized that standard 2.0MW resistive load banks couldn't switch fast enough to mimic a GPU. To solve this, they used the "Superposition of the Effects" principle. The setup combined standard resistors with 1.6MW of electronic loads (modified UPS systems) capable of simulating high-frequency transients.
The Low-Frequency Pulse: A macro cycle swinging from 0% to almost 100% load with a period of roughly 2 seconds (one second per cycle).
The High-Frequency Swing:.
A micro oscillation between 75% and 100% load with a 120ms switching period.
A micro oscillation between 100% and 130% load with a 120ms switching period.
While the traditional performance tests successfully demonstrated that the UPS can effortlessly handle a sustained, static 125% overload for 10 minutes and 18 seconds, the AI load simulations pushed the system's transient capabilities even further. To properly emulate the violent current surges generated by synchronized NVIDIA GPUs, engineers introduced high-frequency pulsing loads that deliberately exceeded the unit's nominal capacity. Notably, the high-frequency tests actually pushed past 125%, subjecting the UPS to aggressive 130% peak overloads during Test Cases 3 and 4. Operating on a lightning-fast 120-millisecond switching period, the load was forced to repeatedly spike to 130% capacity for just 40 milliseconds before dropping back down to 100% for 80 milliseconds. Conducted over continuous 20-minute intervals using both Standard Firmware and the specialized "Battery AI Shield" Firmware, this brutal stress test is intented to verify that the UPS can flawlessly deliver the extreme, high-frequency current compensation required to stabilize the erratic micro-bursts of modern AI workloads.
The danger here is the velocity of the demand. The UPS must respond to these 120ms surges almost instantaneously.

FAT Test Results
For completeness, before showing the results on the UPS AI Test, we share the outcome of the more traditional FAT test. The data reveals a system built for the high-efficiency, high-stress demands of AI:
Precision Efficiency: The M2S achieved an AC/AC efficiency of 97.93% at 25% load and maintained 96.68% at 100% load (230V). Efficiency was lower than datasheet due to additional extraction fans installed on top of the cabinet.
Short Circuit Resilience: The system survived a massive short circuit current of 5150 A for the first 100ms—nearly 3x its nominal rating.
Overload Stamina: During stress testing, the UPS sustained a 125% overload for 10 minutes and 18 seconds, far exceeding the 10-minute requirement.
Normal Mode:


In normal mode test with swings from 0 to 100% (flat lines 0% and swinging lines 100%) revelead that UPS and Batteries are able to absorb fully GPU power swings (clean harmonics + green line showing power draw from batteries)- this is great news as the system demonstrated to be able to protect Generators and Grid should it be required even with extreme 100% swings.
In the second test, we verified the capability of the UPS to deliver the required oscillating power without relying on the batteries. Output conditions show a successful harmonic distribution without usage from the batteries. However, the harmonics upstream have been deformed to supply the required output power.

Finally, Riello iteratively tried to invesitgate how much the UPS can smooth out the power swings without propagating them upstream (without the use of batteries).
Results highlighted that up to 25% power swings can be taken by UPS capacitors, inductances and DC link without involving the battery banks and without affecting upstream harmonics.

Conclusions

AI data centers have specific functional requirements when it comes to UPS and battery systems. Backup-power requirements, such as diesel-generator support and ride-through time during power loss, are discussed separately and largely determine the battery autonomy needed in an outage. A second, equally important requirement is the ability to cope with the highly synchronized operation of large AI training clusters. In practice, these clusters behave less like a collection of independent servers and more like a tightly coordinated parallel supercomputer, with many GPUs moving through compute and communication phases in lockstep - a functional feature typical of super computers.
This implies wide swings in power consumption, an issue partially addressed with the migration from GB200 to GB300 NVIDIA's servers with the inclusion of electrolytic-capacitor energy storage. However, inevitably as long as training is involved power swings will occur between compute-heavy and communication-heavy phases across many synchronized GPUs (unless asyncroneous training is adopted, which is very premature as of today).
For this reason is essential to develop a mitigation strategy to handle this issue; Colossus I from Xai learnt this experience on the field and had to deploy additional battery banks from Tesla. NVIDIA has aknowledge the issue adding further capacitors to PSU installed in GB300 system and has added custom parameters to reduce the power swings, yet reducing training efficiency and energy efficiency.
The AI test proved that Riello's UPS can:
maintain under all circumstances clean harmonics downstream the UPS
absorb a good part of the power swings without relying on battery packs
and have flexible firmware capabilities to set the limit at which batteries intervene to limit the harmonics distortion upstream (for Generators and Grid compliance).
This is key as significantly extends battery life. Firmware solutions are effective as long as part of the load can be managed by the UPS itself.
For AI workloads, the first fast-response energy buffer is usually the UPS DC-link capacitors and output-filter network, while the inductive elements help control how sharply those transient currents are drawn from the grid or generators.
For this reason, "high capacitance / high inductance" UPS should be favoured for AI applications due to their better internal energy buffering and better fast-transient filtering.
Unfornately this data is not publicly available, this is why we are sharing our successful experience with Riello for all the AI data center operators to have prior knowledge of suppliers capabilities when it comes to AI workloads.
We strong recommend including the AI test in all contracts when purchasing UPS systems, as regardless of the different firmware that can be installed, the amount of capacitors and inductances in the UPS represent the first power reserve in managing these ultra fast transient and cyclical loads.
