top of page

GTC 2026: Inference Consumption takes off, AI Data Center Reference Design Overhall and the Physical Limits of the $1 Trillion NVIDIA orders Backlog in the Vera Rubin era.

  • Writer: giorgio sbriglia
    giorgio sbriglia
  • Mar 29
  • 10 min read
NVIDIA DGX SUperPOD with DGX Vera Rubin NVL72
NVIDIA DGX SUperPOD with DGX Vera Rubin NVL72

 

From Annual Hype to Grueling Scaling

The atmosphere at the latest NVIDIA GTC signaled a fundamental maturation of the generative AI era. The industry has shifted from the " high" expectations of 2025—where every keynote promised a world-altering breakthrough with the promise of inference — to the sobering reality of "grueling scaling” as it becomes true. We have moved from a one-year hype cycle to a more traditional, capital-intensive two-year technology cycle.

 

The central tension is no longer the theoretical limit of a Large Language Model (LLM), but the physical constraints of the global power grid. With NVIDIA’s backlog ballooning to a staggering $1 trillion—representing a capital commitment for roughly 250,000 to 330,000 high-end "AI Factory" racks—the industry faces a massive challenge. How does a global infrastructure, already straining under legacy loads, accommodate the 20 to 40 GW of new compute capacity required to clear this backlog?


Stacked bar chart of OpenClaw OpenRouter usage over 30 days showing a rising trend.
Open Claw driving inference adoption

 

The AI Inference Explosion: The Paradox of Efficiency

Despite the "Vera Rubin" architecture delivering a 10x improvement in inference throughput per watt over the Blackwell generation, absolute power consumption is surging. This "Inference Paradox" is driven by the shift toward Agentic AI, where autonomous systems operate in continuous loops rather than single-shot prompts.

 

This demand is catalyzed by several drivers all around Agentic AI surge:

1. MCP servers have powerlaunched Agentic AI. Claude Code, Claude Cowork, Openai Codex have been driving dramatically up the token consumption per query. This was possible through the development of several Agentic AI techniques, amongst which MCP servers.

1. The "Open Claw" Phenomenon: Described as the "Linux moment" of AI, Open Claw has achieved the highest adoption rate of any open project in history within weeks. The cultural frenzy is palpable—exhibit halls were filled with developers literally "dressing up with lobsters and claws". The technical reality is even more significant: daily utilization is doubling, a growth rate described as the highest ever seen in the industry.

2. The Code Rabbit Multiplier: Historically, AI-generated code was bottlenecked by human review. The emergence of the "AI-checking-AI" loop, exemplified by Code Rabbit, removes this human friction. By allowing one model to debug another, the industry has introduced a 4x multiplier in token consumption per developer hour.

 

Schematic showing job specialization between different racks between prefill GPUs and Decode LPUs
Schematic showing job specialization between different racks between prefill GPUs and Decode LPUs

Inference has become so prominent that NVIDIA acquired Groq to create a dedicated platform for inference decoupling racks for pre-filling and decoding, adopting LPUs for decoding whilst still maintaining Vera Rubin for pre-filling. We mentioned the prominent role of inference and emerging players such as Groq in the previous GTC piece from 2025:


"In this evolving landscape, companies specializing in inference optimization like Groq are gaining attention. Previously somewhat overlooked in favor of training-focused companies, Groq and similar inference-centered businesses could see significant growth as the industry reorients toward efficient model deployment and operation rather than just model creation."


So much, that Groq was acquired by NVIDIA and its chips integrated into the package. This confirms another prediction from 2025, whereby NVIDIA is positioning more and more as a ecosystem provider rather than a GPU vendor - i.e. the most likely exit option for competitors is actually to be bought by NVIDIA.

For 2026 until GTC 2027 we expect Players such as Baseten, Together AI, Fireworks AI to gain major market traction and support from NVIDIA as they represent a major go to market for NVIDIA in the same way GPU Clouds have been the major alternative go to market to Hyperscalers in 2024 and 2025.


 

GPU clouds win the balance sheet bet, the NVIDIA "Ferrari" Strategy and the AI Accelerators Legacy Boom from massive demand


 

The AI inference boom has brought to reality the vision of many GPU clouds who planned to breakeven on 3 years GPU lease contracts and profit from the last 2-3 years of hardware life. The market has been skeptical on this bet, seeing GPU hardware as a risky bet - a fast depreciating asset due to NVIDIA fast innovation causing technological obscolence of older models.


The market was wary that the frequent roll out of new GPUs would excessively depreciate the assets of all NVIDIA clients - most evidently GPU clouds whose balance sheet is dominated by GPU cards. If at the end of 3 years the GPUs were worthless, then GPU clouds were not profitable as a business and not very credit worthy for GPU financing. This also raised questions if GPUs were a sufficient collateral for financing, due to their very fast depreciation. Every month, GPUs would rapidly lose value hence commanding a significant discount when pricing them as a collateral: once they are used and new models have been rolled out, how much can you really sell them for?


Yet, if today you want to lease a decent cluster of H100, H200 and even A100, you will struggle to find availability. And the price? it actually went up from December 2025 by 25%.


H100 GPU rental price trend index from Bloomberg (source https://x.com/matthew_sigel/status/2037598820054224948) shows soaring prices for allegedly depreciating assets in March 2026.
H100 GPU rental price index from Bloomberg (source https://x.com/matthew_sigel/status/2037598820054224948) shows soaring prices for allegedly depreciating assets.


This market data point is particularly important for 3 reasons:

  • It dispels AI Financial Bubble doubts: The most fragile part of the AI ecosystem are the GPU clouds. AI labs (Openai, Anthropic etc) are primarely funded by equity. They will not experience in the short term an insolvency issue. GPU clouds collateralize their GPU assets against 3 years offtaking contracts. If demand reduces/contracts, GPU Clouds will be the first ones to run out of liquidity. Instead, they are currently not able to keep up with demand.

  • It proves the GPU cloud business model: A new industry is being invented and new business models are being tested and consolidated. GPU clouds went from crypto startups to hyperscale colusses. They are here to stay.

  • It proves NVIDIA business model and the sustainability of its revenues: NVIDIA clients are primarely Hyperscalers and Neo Clouds Providers (NCPs / GPU Clouds). The value chain needs to be sustainable for everyone in the business vertical for NVIDIA to keep charging the premium prices it is currently charging now for its accelerators. The higher the sale price, the longer AI accelerators need to last to breakeven.



So if legacy models are currently very scarce, what about the next generation of hardware? Due to the extreme demand for NVIDIA's accelerators, NVIDIA has inevitably inherited a market position akin to a luxury marquee like Ferrari. Access to the bleeding-edge "Vera Rubin" systems is no longer a matter of mere capital; it is a gated privilege. Customers are reportedly unable to enter the queue for Rubin systems unless they have already maintained high-volume purchases of GB300 systems.


 

NVIDIA DSX Reference Architecture for AI Factories / AI Data Centers.


at the top the traditional datacenter architecture and gradually transitions to direct current datacenter architectures.
Image shows at the top the tradition architecture and gradually transitions to direct current architectures.

Finally NVIDIA has clarified requirements for an AI large scale Datacenter - in particular for AI Training. We clarified this back in 2024, but it's helpful that NVIDIA now makes an official statement. To meet a $1 trillion backlog, the standard Tier 3 data center—defined by diesel generators, transformers, UPS with 5-10 mins batteries and busbars —is being rendered obsolete. The reason is that it's to enable AI Datacenters to scale faster to accomodate more GPUs and foster sales. Sales are bottlenecked by Datacenter capacity, which is constrained by power availability, manufacturing capabilities and capital.

For this reason NVIDIA went deep in co-designing the Datacenter together with its chips, with the following outcomes:

  • Removal of Tier 3 requirement for training. More here on the reasons why.

  • Official increase of operating temperatures of DLC to 45 deg C, to improve PUE, and increase GPU utilization of installed capacity.

The other trend is increase in rack compute density. The reasons for this also explained here. With liquid-cooled Vera Rubin racks hitting 270kW to 600-800kW, there is no physical space left in rack for anything else than the GPUs.


Power shelves that convert Alternated Current to Direct Current need to be moved outside the rack to leave space for additional racks. In GB300 there are 8 Power shelves of 33kW and 8 compute shelves that consume 130kW of TDP and 190kW of EDP. By removing the power shelves, it’s possible to dramatically increase the compute capacity in the rack doubling to 16 shelves per rack.

Beyond that, NVIDIA plans to change the form factor of its rack going from horizontal shelves to vertical shelves further increasing the number of GPUs per rack (244) and the power density up to 600-800kW per rack.

This entails that power will need to travel in the datacenter in direct current, hence changing the whole power distribution from the traditional 400/440 V AC to 800 V DC.


Traditional architecture for a datacenter.
Traditional architecture for a datacenter.

NVIDIA has been clarifying the level of redundancy required for the AI factory, significantly reducing the cost per MW of a datacenter. As reported in our 2024 blog post, large scale training facilities do not require the Tier-3 redundancy. GPU racks do not need to be under UPS+Genset, with the exception of the dataracks used for checkpointing.

The solution adopted by NVIDIA is BTU for 1 minute autonomy that allow the switch from one grid supply to another grid supply. For reference, this removes all the gensets and dramatically reduces the battery requirements typically specified with a UPS.


PSUs now are made to smooth out GPU power peaks.
PSUs now are made to smooth out GPU power peaks.
Improvements of power smoothing over 3 gernetions of GPU. Whilst GB200 had several harmonic issues, GB300 and VR200 have been adressing such issue.
Improvements of power smoothing over 3 gernetions of GPU. Whilst GB200 had several harmonic issues, GB300 and VR200 have been adressing such issue.

To compensate the CAPEX savings inherited by the lower reliability requirements, comes the major technology shift to full 800 V Datacenter infrastructure. Indeed, at 800 V DC, it’s possible to distribute power already throughout the facility in DC – hence Voltage is stepped down from 22kV to 800 V in Solid State Transformer (SST) where also power is converted from Alternated current to Direct Current. However this poses several technological challenges due to the infant stage of this technology, from DC breakers (still expensive), as well as manufacturing capabilities. No manufacturer currently has the manufacturing scale of these power electronics devices, let alone SSTs are still a niche application and an emerging technology.


Several side racks were presented to host PSUs and batteries (BBUs) in side racks for power electronics.
Several side racks were presented to host PSUs and batteries (BBUs) in side racks for power electronics.

We believe the Direct Current 800 V infrastructure adoption is at least 2 years away. Possible intermediate steps will be possible with side racks that contain power shelves and Mobile Battery Unit (MBUs). Power is mostly distributed in AC (still without UPS for GPU racks) and it’s converted to DC only next to rack, where also batteries are installed.

NVIDIA’s DSX architecture prioritizes density and "unity PUE" (Power Usage Effectiveness) over long-duration autonomy.

 

Requirement

Traditional Data Center

NVIDIA’s DSX Architecture

Redundancy

Tier 3 dual-redundancy

Eliminated to maximize GPU density

UPS Systems

Centralized, massive UPS

Replaced by on-rack batteries for compute racks

Power Autonomy

Long-duration backup

1-minute "peak shaving" survival

Power Distribution

Internal Rack batteries

110kW Power Shells in sidecars

Current Conversion

Standard Transformers

Solid State Transformers (SST)

Voltage/Current

440V AC

800V DC by 2028

 

This transition will culminate in 2028 with the "Kyber" form factor, a radical departure from MGX that moves power delivery entirely into sidecar shells to facilitate up to 244 GPUs per rack.

 

Quantum Computing: From Hype to Hybrid Integration

The narrative surrounding quantum computing has transitioned from "standalone general-purpose" dreams to a pragmatic "Hybrid Systems" model. The focus is now on the Quantum Processing Unit (QPU) as a specialized accelerator, a trend we predicted last year.


Physical AI: The 2027 Reality Check

The exuberant predictions for physical AI in 2026 have met a "reality check." The sector remains hamstrung by two critical engineering bottlenecks: hand dexterity (the shift from visual perception to "tactality") and the power-to-compute tradeoff. Onboard reasoning computers currently consume too much power to allow for meaningful autonomy without sacrificing battery life.

 

Consequently, physical AI is now viewed as a 2027 story. This delay is expected to trigger intense M&A activity. Capital is pivoting away from "general-purpose" moonshots toward "best-of-breed" specialization. We anticipate a wave of consolidation as leaders acquire the specific limb-control, vision-processing, and tactile-sensing specialists required to build a commercially viable humanoid.

 

The Rise of the Sovereign Cloud

Geopolitical necessity is fragmenting the global compute map. The emergence of "Sovereign Clouds" is no longer theoretical; in Sweden, four companies have aligned to form Spheric AI, ensuring that compute and data stay within national borders to satisfy European regulatory mandates.

 

This regionalization has led to extreme scarcity. In Norway, for instance, data center capacity is entirely sold out through 2028, with no uncontracted capacity coming online for the next several years. For European governments, the choice is increasingly "Sovereign or Nothing," as they cannot legally utilize non-European clouds for sensitive workloads.

This remains an incredibly interesting market, considering also all the middle east demand that due to the conflict is being relocated in neighbouring cloud zones (Europe and APAC).


The Open Source Power Play: Nemotron vs. The Frontier

Code Rabbit uses two models in series - Nemotron is used to create the context for frontier models such as Claude and GPT.
Code Rabbit uses two models in series - Nemotron is used to create the context for frontier models such as Claude and GPT.

We are witnessing a decoupling of model capability at the frontier. As the correlation between massive training runs and incremental capability slows, NVIDIA has made a strategic move to "commoditize the complement" by funding open-source projects like Nemotron 3.


Artificial Analysis Leaderboard - Nemotron positioning itself as a heavy lifter rather than a premium model.
Artificial Analysis Leaderboard - Nemotron positioning itself as a heavy lifter rather than a premium model.

 

Nemotron 3 reportedly performs several of the tasks of a frontier model while being significantly lighter and more efficient. By releasing such models, NVIDIA ensures that the software layer remains open, affordable and accessible, thereby driving adoption and innovation. NVIDIA is currently the only player that can systematically push an open source agenda. Training LLMs has become a multi billions game. Even large players such as Meta, don’t have the appetite for such large scale “donations” to the hugging face community, let alone smaller players such as Mistral that do not have anymore the financial capabilities of such endeavours. We believe NVIDIA will continue investing in releasing high performance LLM models co designed with their chips and perfectly synchronized with their stack. In the last years, Opensource models have lost significant users due to Frontier models being dramatically more performant and having no certainty of frequent maintenance and updates from the publisher. Now with NVIDIA releasing quality LLMs, we expect a wide adoption around NVIDIA’s Nemotron family as a precursor to frontier models. The promiment example is CodeRabbit that adopted Nemotron for producing the prompt for a frontier model. 95% of the tokens are consumed by Nemotron, until the right information is fed to a Frontier model for feedback.

 

The 40-Gigawatt Question

The AI revolution has moved from the ethereal world of code to the visceral world of heavy engineering. To fulfill the current backlog, the industry needs at least 20-40 GW of capacity, yet construction is plagued by bottlenecks in power generation, with queues for grid connection beyond 2030 and order books from turbines and generators manufacturers stretching out to 2029.

 

This raises the natural demand if the industry can accomodate the massive backlog of NVIDIA in terms of power capacity.

To convert the 1tr dollars backlog in MW we will need to make some rough assumptions.


A standard AI rack now retails for $3 million to $4 million, and a single rack requires the power of approx 130kW (notably different products have different price and different power densities). This gives approximately 40GW of power, which equates to Argentina power generation capacity. The chip was the first bottleneck; the power will be the last.

 
 
 

Comments


bottom of page