top of page
Writer's picturegiorgio sbriglia

Data center design requirements for AI workloads. A Comprenshive guide.

Updated: 6 days ago


NVIDIA GB200 NVL72 Superpod Rack. Typical installation in future AI data centers.

INTRODUCTION


AI's Growing Impact on the Data Center Industry: Insights from 2024 Conferences


Over the past year, the rise of artificial intelligence (AI) has prompted data center operators to reconsider the future of the industry. In 2024, I had the opportunity to attend nearly 10 major data center conferences, where one topic repeatedly dominated the discussions: AI's specific requirements for data centers. These events were highly attended, and I realized how easily I had taken for granted key concepts I was already familiar with, such as AI inference, training, machine learning, liquid cooling, and the distinctions between Tier-3 facilities and high-performance computing (HPC).

A key observation was that two industries—cloud computing and HPC—while sharing certain similarities, often seem unaware of each other’s nuances. AI, in many ways, bridges this gap, depending on the specific use cases. This is why I believe sharing the insights I've gathered on AI's growing demands in data centers over the past year could greatly benefit operators and stakeholders alike.


Understanding AI Infrastructure Requirements


The Shift from Data to Compute


One of the first things to clarify is that AI infrastructure relies heavily on GPUs, which consume significantly more power than the CPUs traditionally used in data centers. GPUs, which are central to high-performance computing (HPC), are why AI requirements align more closely with supercomputing than with typical data centers.

In the GPU industry, the term "compute" is commonly used—largely influenced by NVIDIA's CEO, Jensen Huang, and his public speeches. Historically, data centers have focused on "data" storage and management rather than "compute" power, with rack power densities in the order of 10 kW/rack. This year, the data center industry was caught off-guard when it was asked to shift focus toward compute-heavy applications, a concept familiar in the HPC space, with power requirements from 40 to 140 kW/rack. However, while HPC sites are optimized for raw computational power and such power density requirements are easily managed, they may not be suitable for AI inference workloads, where availability is critical.

This growing divergence between data centers and AI demands raises the need for new terminology to define these specialized facilities. Jensen Huang introduced the term "AI factory" to signify the dawn of the AI industrial revolution. Personally, I believe "Compute Center" could also be a fitting term, as it recalls the traditional "data center" but shifts the emphasis toward computational power rather than data storage.


Training vs. Inference: How AI Workloads Shape Data Center Requirements


Another critical distinction to understand is the varying data center requirements for AI training versus inference workloads. While both processes distribute tasks across multiple GPUs, they operate at dramatically different scales.


In AI training, GPUs work in parallel to optimize a model using techniques like gradient backpropagation. This process often involves thousands of GPUs working synchronously for extended periods—sometimes weeks. For instance, Meta used 48,000 H100 GPUs in parallel to train LLAMA 3.1. Each H100 GPU has up to 94 GB of VRAM, leading to a combined capacity of approximately 4.5 petabytes (PB). Training such large models demands immense computational and energy resources.


On the other hand, inference requires far fewer resources. The number of GPUs needed depends on the size of the model relative to the memory available in each GPU. For example, if a model requires 48 GB of memory and each GPU has 24 GB, two GPUs would be necessary to run the model. Techniques like quantization, which reduces the precision of weights to decrease memory usage (albeit at a potential loss of accuracy), can help reduce model size. However, as AI models grow larger, this becomes increasingly challenging. For example, Meta’s LLAMA 3 model ranges from a 4.8 GB memory requirement with 4-bit weights to 38.4 GB in full 32-bit precision.


Some companies, like Groq, take a different approach, spreading inference across more GPUs than necessary to drastically reduce inference time. By distributing the model over additional GPUs—referred to as LPUs (Tensor Processing Units in Groq's terminology)—they achieve record-breaking token throughput, with LLAMA 3.1 processing up to 736 tokens per second.

In summary, AI training requires significantly more GPUs working in parallel compared to inference, which typically involves far fewer GPUs. Another crucial distinction is that inference often serves live client requests, making uptime and efficient data transfer in and out of the data center critical factors. In contrast, training is more resource-intensive but doesn't have the same real-time availability requirements. As a result, data centers designed for inference need to prioritize low-latency networking and high availability, while those optimized for training focus on raw computational power and energy efficiency.


Despite the significantly lower resource demands of inference compared to training, industry estimates predict that, as AI adoption skyrockets, the majority of hardware and energy consumption will come from inference rather than training. This shift underscores the growing importance of optimizing infrastructure for scalable, efficient AI inference workloads in the years to come.


Having understood the general functional requirements of AI workloads we will go through how they affect data center design in terms of Network, Power Densities, Cooling, Reliability and Location.


NETWORK


East-West and North-South Traffic for Inference and Training

We will start from understanding data and network requirements as they essentially drive requirements for power densitities, cooling and location. Despite being the most critical part of the design and also the major contributor to AI capabilities progress in the last years, AI Network requirements have received the least coverage in all the data center conferences attended this year.

Indeed, AI workloads have profoundly impacted data center network designs, especially when considering traffic patterns. Traditional data centers were primarily optimized for north-south traffic, where data flows between external clients and internal servers (from user to data center, then back). However, with AI, especially in training and inference workloads, east-west traffic (server-to-server communication within the data center) has increased dramatically, reshaping network architectures.

This section will focus on how AI inference and training workloads affect data center network traffic, comparing east-west and north-south traffic demands, and how these differ from traditional data centers.



Traditional Data Centers: Traffic Patterns


Before the rise of AI, traditional data center traffic primarily followed a north-south pattern:

  • North-South Traffic: Data moves between the user (external client) and the server inside the data center. Web browsing, API requests, and database queries follow this path.

  • East-West Traffic: Internal data center traffic where servers communicate with each other to process and distribute data. This pattern is crucial for services like distributed databases and cloud services but generally smaller in volume than north-south traffic.

Traditional applications often prioritized north-south traffic, with servers responding to client requests over relatively low-bandwidth internal networks, focusing more on external throughput.


AI Workloads and Their Traffic Patterns



With AI, especially deep learning models, east-west traffic now dominates, especially during the training phase. Inference workloads, however, may vary in their traffic demands depending on how many GPUs are used to host 1 LLM. Here's a breakdown of how AI affects north-south and east-west traffic in training and inference, compared to traditional workloads.


AI Training: East-West Traffic Dominance

AI model training, especially for large models such as transformers, places unprecedented demands on east-west traffic. Distributed training often involves multiple GPUs or TPUs that must frequently exchange intermediate data and gradients to synchronize model updates. This intra-server and server-to-server communication dominates traffic patterns during training.


  1. East-West Traffic in Training:

    • GPU-to-GPU Communication: Parallel processing techniques (e.g., data parallelism, model parallelism) require frequent exchanges between GPUs to synchronize weights and gradients. This generates a huge amount of east-west traffic.

    • Bandwidth Demand: High bandwidth is essential as GPUs exchange large chunks of data at regular intervals. Each GPU may generate hundreds of gigabytes of inter-GPU traffic per second. The need for ultra-high throughput (e.g., 400G/800G optical connections) between servers has become essential to prevent network bottlenecks.

    • Latency Sensitivity: While latency is important, the focus for east-west traffic in training is primarily on bandwidth. AI training workloads generate massive volumes of data that need to be transmitted with minimal bottlenecks across GPU clusters to minimize the training time


  2. North-South Traffic in Training:

    • Minimal compared to east-west traffic. During the training phase, external interactions (north-south) are limited to logging results or fetching datasets, which are usually done at the start. Most data movement happens internally.


Comparison to Traditional Workloads:

  • In traditional workloads, east-west traffic was often secondary. Services like storage or replication required some server-to-server communication, but nowhere near the scale seen in AI training. For example, database replication or file system mirroring could be bandwidth-heavy but did not demand the same low-latency, high-bandwidth connectivity seen in AI GPU clusters.

  • North-south traffic dominated in traditional data centers because most applications focused on serving requests from external users.

  • Training Clusters have a huge amount of network cables behind the the racks compared to traditional designs.



Back of a NVIDIA GB200 rack showing network cables
Back of a NVIDIA GB200 rack showing network cables

AI Inference: North-South Focus with Controlled East-West Needs

Inference, the process where a trained model makes predictions, has a different set of demands. AI inference can vary significantly based on whether it’s for simple tasks (e.g., image classification) or complex, generative AI tasks (e.g., text generation).

  1. North-South Traffic in Inference:

    • For AI inference tasks, north-south traffic is still significant because most inference tasks involve sending data to the AI model (for example, a user’s voice command or a query to a chatbot), then receiving a result back.

    • Generative AI tasks like LLM inference (e.g., ChatGPT or GPT-based models) are typically less latency-sensitive for north-south traffic. The time it takes to generate a response (which can take seconds or even minutes) is far greater than the network latency. Thus, low-latency north-south traffic is less critical. However, bandwidth becomes a key factor, especially when handling multiple requests from users at scale.

    • For simpler tasks (e.g., image recognition, object detection), latency might be more critical, but even here, completion times far outweigh the impact of minor network latencies.

  2. East-West Traffic in Inference:

    1. GPU-to-GPU Communication: For large, generative models, inference tasks may still involve some east-west traffic between GPUs, especially in cases where a model is split across multiple nodes due to memory constraints. However, this is far less than in training.

    2. Latency Sensitivity: Unlike training, where east-west traffic must be optimized for high throughput, inference east-west traffic needs minimal latency but significantly less bandwidth, as it usually involves the exchange of smaller data packets.

GB200 NVIDIA Superpod network inner topology
GB200 NVIDIA Superpod network inner topology

Comparison to Traditional Workloads:

  • Traditional inference-like workloads in data centers, such as web requests or API calls, were dominated by north-south traffic. While AI inference also relies on north-south traffic, the the latency requirements are completely different as GPU inference times is normally an order of magnitude greater than traditional

  • The additional complexity for AI inference comes from occasional east-west traffic when models span multiple GPUs, but this is minimal compared to the extensive east-west traffic of AI training.


Traffic Comparisons and Evolving Data Center Networks

To summarize the traffic patterns and network demands for AI training and inference:

Aspect

Traditional Workloads

AI Training

AI Inference

North-South Traffic

Primary traffic focus; client-server interactions.

Minimal, limited to fetching datasets and logging.

Critical for data input/output, but generative AI can tolerate higher latencies due to longer processing times.

East-West Traffic

Secondary; low to moderate volume, mainly for server replication, storage, etc.

Dominant traffic, high bandwidth required between GPUs.

Minimal, but can occur in complex models that span multiple GPUs.

Latency Sensitivity

Important for user-facing services.

Secondary to bandwidth; network bandwidth is more important to minimize training time.

For complex tasks, latency is less critical compared to response generation time.

Network Architecture Needs

Standard spine-leaf or tree-like topologies with limited internal bandwidth.

Fat-tree or advanced non-blocking architectures to accommodate high east-west traffic.

More edge-focused networks, but capable of handling both high north-south and occasional east-west traffic.

Data centers serving AI workloads must handle traffic patterns that differ drastically from traditional setups. AI training requires high-bandwidth, low-latency east-west communication to enable efficient distributed training across multiple GPUs, while AI inference focuses more on north-south traffic, particularly for user queries and data handling, though it can tolerate some latency in generative tasks.

To accommodate these new demands, data centers have been shifting towards Fat-Tree architectures and even to multi-rail-optimized infiniband leaf-spine architectures, employing 400G/800G optical interconnects and adopting RDMA-enabled technologies like InfiniBand and RoCEv2 to handle east-west congestion. Additionally, as AI continues to evolve, so too must data center networks, emphasizing flexibility, high throughput, and minimal latency where necessary.


Redundancy in path ways is essential for maintaining high utilization factor of the GPU Servers during training. For this reason, large clusters tend to have a 8 rail infiniband spine-leaf architecture. This means that each GPU Server will have access to 8 different leaf switches, allowing for greater redundancy in case of failure and better de-bottlenecking in case of network congestion. This is what Microsoft is adopting in their next-generation data centers for Openai's compute demands.


To be noted the emergence of optical circuit switches (OCS) to avoid traffic congestion in east-west traffic, so that data can remain always in the optical domain rather than electrical. Google is a prominent adopter of such technology and we will likely see greater adoption of this technology.



RACK POWER DENSITIES

Inference vs Training

In AI-driven data centers, power and cooling requirements are highly dependent on the type of workload—whether AI training or inference. As we've discussed, these workloads differ not only in their data and network demands but also in their significant impact on power density. Traditional computing racks operate at power densities between 5kW and 15kW, but AI workloads, driven by the use of GPUs, push these requirements much higher—often reaching 40kW to 140kW per rack. This increase is primarily due to the immense power and cooling needed to keep GPUs, the core of AI computations, operating at peak performance.


The principle behind these elevated power densities is linked to GPU interconnectivity. The more GPUs required to work in parallel, the more tightly they must be integrated, both computationally and network-wise. In AI training, where thousands of GPUs collaborate to process massive datasets and execute synchronized computations, low-latency, high-throughput networks are crucial. This calls for dense configurations of GPUs packed into racks, interconnected by advanced technologies like NVIDIA’s NVLink and InfiniBand, which provide the high-bandwidth, low-latency communication needed to optimize GPU performance.


This tight collaboration among GPUs results in highly dense hardware setups. For AI training, where GPUs must function as a unified system, the integration demands significantly boost power consumption and cooling requirements. High-performance infrastructure like NVIDIA’s DGX SuperPOD, with configurations such as the GB200 NV72, can draw as much as 130kW to 140kW per rack.


Conversely, AI inference workloads are less power-intensive. Inference tasks typically involve smaller, more independent computations, meaning GPUs don’t need the same level of tightly synchronized parallelism as in training. This allows for more distributed architectures, where each GPU handles specific tasks with lower interdependence. As a result, inference systems require less power per rack, with densities typically ranging from 40kW to 50kW. These setups don’t demand the same high-performance networking or power-hungry configurations as training environments.


COOLING


Cooling Solutions for AI Data Centers: Navigating Power Densities and Network Complexities


As AI workloads push the boundaries of what data centers can handle, cooling has emerged as one of the most critical challenges. Cooling solutions for AI data centers are inevitably influenced by two key factors: network architecture and rack power densities. High-density racks, filled with power-hungry GPUs, generate far more heat than traditional computing racks, and as power density rises, air cooling becomes increasingly inefficient. To address these challenges, the industry has turned to advanced cooling technologies such as RDX (Rear Door Cooling), DLC (Direct Liquid Cooling), and Immersion Cooling.


The Relationship Between Rack Power Density and Cooling


Generally speaking, as rack power density increases, the feasibility of using air as a cooling medium diminishes. When racks consume 40-50kW or more, airflow alone can no longer remove enough heat to keep hardware operating within safe thermal limits. This is where advanced cooling methods come into play:

  • RDX: Rear door cooling uses heat exchangers mounted on the back of racks to cool air before it recirculates. This solution is often deployed in data centers with moderate power densities (40-50kW per rack), commonly seen in AI inference workloads.

  • DLC: Direct liquid cooling involves circulating coolant directly over heat-generating components, such as GPUs. DLC is widely regarded as the go-to solution for AI training workloads, where rack densities can reach up to 140kW.

  • Immersion Cooling: This more radical technology submerges hardware in non-conductive fluids, offering potentially superior cooling, although it faces compatibility and market maturity issues.


Network Complexity and Cooling Challenges


In addition to power density, network architecture plays a pivotal role in the effectiveness of cooling solutions. AI training, which relies heavily on east-west traffic, involves significant inter-server and intra-rack communication. This requires more network cables to support the high-bandwidth, low-latency connections that GPUs demand during distributed training. As the number of cables increases, it becomes more difficult for air to flow freely through the racks, which further complicates air-based cooling systems.

  1. Inference Cooling (RDX):

    • Rack Size: For inference workloads, where network traffic is less demanding and fewer cables are needed, 600mm-wide racks are typically sufficient. Power densities here range from 40kW to 50kW per rack, making RDX an effective solution. The rear door heat exchangers allow for air cooling without interference from excess cabling, making it a cost-effective choice for inference setups.

  2. Training Cooling (DLC):

    • Rack Size and Cable Management: AI training racks require more network cables to support the heavy east-west data traffic. To manage airflow more effectively, 800mm-wide racks are often used, allowing for better cable management and airflow around the components. However, even with wider racks, air cooling is insufficient at these high power densities, which is why DLC is the preferred cooling method. By circulating liquid directly over heat sources, DLC eliminates much of the heat generated by dense GPU clusters, without being obstructed by cables.

    • Hybrid Solutions: Although DLC removes most of the heat, it doesn’t eliminate it entirely. In many cases, DLC is combined with RDX to further improve cooling efficiency. This hybrid approach ensures that any residual heat not captured by liquid cooling is effectively handled by rear-door heat exchangers.


Immersion Cooling: The Future, But Not Yet Mainstream


Immersion cooling is a promising technology with great potential to handle the extreme heat generated by high-density AI workloads. It involves submerging hardware in a dielectric fluid, which absorbs heat more efficiently than air or even liquid flowing through pipes. However, immersion cooling is still an emerging technology and faces several hurdles:

  1. Fluid Compatibility: One of the main barriers to widespread adoption is the lack of NVIDIA certification for immersion fluids. AI training systems rely on expensive and highly sensitive hardware like NVIDIA GPUs, and the company has yet to approve any fluids for use in their equipment. Network components are particularly susceptible to damage from some immersion fluids, which adds to the hesitation.

  2. Hybrid Cooling for Compatibility: To address fluid compatibility concerns, a hybrid cooling approach is being developed. This involves using immersion cooling in parts of the system that can safely tolerate the fluid while employing DLC in more sensitive areas. This method is still in its infancy but represents a potential future path for overcoming immersion cooling’s current limitations.

  3. Market Availability and Performance: Despite its potential, immersion cooling solutions are still relatively scarce, with only a few vendors like Gigabyte offering off-the-shelf immersion cooling systems. These systems, while promising, do not yet match the cooling densities achievable by DLC. For example, Gigabyte’s immersion cooling tank, while offering a cooling capacity of 80kW, can only house 21U of equipment in a footprint of 1.16 x 0.9 meters, which is less efficient in terms of density compared to traditional DLC racks.

    As immersion cooling tanks become taller and more efficient, they are expected to eventually provide higher cooling densities, but for now, DLC remains the most effective solution for high-density AI workloads.


Current and Future Cooling Solutions for AI Workloads


Cooling solutions for AI data centers have evolved significantly, driven by the increasing power densities of AI hardware and the complexity of network interconnects, especially in training environments. RDX remains a popular choice for inference workloads with moderate power densities and fewer network demands, while DLC is the preferred method for AI training, where dense GPU configurations and extensive cabling necessitate more robust cooling.

Immersion cooling, while promising, is still in its early stages and faces challenges related to compatibility and market maturity. However, as technology advances and suppliers develop more efficient and compatible immersion cooling solutions, it may become the leading cooling method for AI data centers in the future. For now, though, the combination of DLC and RDX provides the most effective cooling for AI workloads, ensuring that data centers can handle the immense computational demands of modern AI without overheating.

In summary, RDX is ideal for inference, DLC leads in training environments, and immersion cooling—while not fully realized—holds the promise of the future.





RELIABILITY


Debunking the Myth of AI Reliability Requirements


A frequently misunderstood aspect of AI infrastructure is the reliability requirements. Many in the industry assume that AI workloads, particularly for training, must adhere to the same Tier 3 availability standards as traditional cloud-based services. This misconception stems from how AI inference is similar to cloud services, where reliability and availability are paramount for serving end-users via APIs in commercial applications. However, AI training is an entirely different scenario, one that doesn't fit neatly into the traditional data center reliability models.


The Tier 3 Misconception


In traditional cloud environments, Tier 3 standards aim to ensure continuous service availability, which makes sense for AI inference workloads. AI inference, like other cloud services, is often deployed as a remote service with real-time user interactions, meaning there’s a level of responsibility in ensuring continuous, uninterrupted availability.

However, AI training functions very differently. Training large-scale AI models happens in HPC clusters and supercomputers, not in standard cloud environments. In AI training, checkpointing is used as a backup method, typically every 30 minutes. This means that, in the event of a power loss or system failure, at most, 30 minutes of computational work is lost. This fundamentally changes the economics and requirements for system reliability during training because the risk and cost of downtime are mitigated by this frequent checkpointing process.


The Economics of AI Training Redundancy


DGX Price

515000

USD

DGX Power Consumption

14.3

kW

DGX normalized CAPEX

 $36,013,986

eur/MW

DGX amortization time

5

years

DGX amortization cost per year 

 $7,202,797

eur/MW/year

DGX amortization cost per hour

 $822

eur/MW/hour


When discussing redundancy for AI training, the focus isn’t on uptime for end-users, but on the capital expenditure (CAPEX) amortization of expensive NVIDIA GPUs. To better understand the economics of redundancy for training, let's break down some key numbers:

  • The B200 DGX system, a popular NVIDIA solution, retails for $515,000 and consumes 14.3 kW of power.

  • If we normalize these numbers over 1 MW of power capacity, the total CAPEX of 1 MW of DGX systems amounts to approximately $36 million.

  • Given the pace of technological advancement, hardware like GPUs typically becomes obsolete within five years, meaning the annual amortization cost for 1 MW of DGX capacity is $7.2 million per year.

  • This translates to a cost of $822 per MW per hour of GPU operation. Therefore, any improvements in reliability and uptime must be weighed against this cost.

The table below, which breaks down the availability requirements for different Tiers, shows the expected DGX CAPEX loss during downtime for each tier level and the cost differential of upgrading between tiers.


Tier availability requirements

downtime hours/year

Absolute DGX amortization cost

differential DGX amortization breakeven cost for Tier upgrade cost $/MW/year

99.67%

28.8

 $                23,697

 

99.74%

22.7

 $                18,655

 $    5,042

99.98%

1.6

 $                  1,297

 $  17,359


As you can see, the costs associated with leveling up between Tiers offer diminishing returns. The conclusion? There is no financial advantage in moving up to higher Tiers for AI training environments, which is why most HPC clusters run their expensive GPU clusters without Tier classification at all.

This concept is often hard for Tier 3 professionals to digest, but the math—and the best practices from industry leaders—support this approach.


Availability Considerations in AI Training


While availability during short downtimes (like those managed by checkpointing) isn't a significant concern in AI training, there are other aspects of availability that matter, especially for longer-term shutdowns. For example:

  • Dual redundant fiber is often recommended to prevent network failures, especially if it serves several megawatts (MW) of capacity. The cost of a redundant fiber connection is typically less than the cost of recovering from a network outage in these large systems.


Additionally, one must consider that each GPU in a training cluster contributes to the overall performance. If a single GPU or a cluster fails, it could stop the entire training process, which can be catastrophic, particularly for large jobs involving thousands of GPUs. This is why HPC facilities for AI training incorporate significant redundancy at the GPU level, especially in power supply, using dual busbars and dual power supply units (PSUs) to ensure that a failure doesn’t halt the entire training operation.


Cooling Redundancy: RDX vs. DLC


Cooling redundancy is another area that must be carefully considered in AI training environments. Two of the most common cooling methods are RDX (Rear Door Cooling) and DLC (Direct Liquid Cooling), each with its own advantages and challenges.

  1. RDX: In RDX, air is still the primary cooling medium, with the water in the rear door heat exchanger cooling the air as it exits the rack. The advantage of this system is that redundancy isn’t dependent on the water supply. If one RDX unit fails, the nearby racks can compensate by cooling the air that flows through the failed rack. This level of redundancy makes RDX a highly reliable solution, especially for inference workloads.

  2. DLC: DLC offers a much more direct and efficient cooling method, particularly for high-density training workloads. However, it also introduces more potential points of failure. If the liquid supply is compromised (e.g., a leak or fitting failure), the entire system may be at risk of overheating. Unlike RDX, where neighboring racks can help mitigate cooling loss, DLC’s dependency on liquid makes it hypothetically more critical if redundancy isn’t properly planned.


LOCATION


Location Considerations for AI Data Centers: Beyond Proximity to Metropolises


As AI data centers become critical infrastructure for both inference and training workloads, many people often ask whether these centers need to be located near large metropolitan areas or the well-known FLAP markets (Frankfurt, London, Amsterdam, Paris). The simple answer is no—AI data centers don't need to be near big cities or high-traffic internet hubs. The location criteria for AI data centers differ significantly from traditional cloud or enterprise data centers due to the specific nature of AI workloads, which have different latency, bandwidth, and regulatory requirements.


Do AI Data Centers Need to Be Close to Metropolises and FLAP?

No, AI data centers do not need to be close to major metropolitan areas or established internet exchange points. The primary reason for this is that latency for north-south traffic (the communication between external users and the data center) is not a critical factor for AI, especially for inference and training workloads.

  • Inference: While inference workloads might seem to require low-latency networks, the reality is that the time it takes for a large language model (LLM) to produce an output is orders of magnitude longer than the latency introduced by geographic distance. For example, generating a response from a generative AI model like GPT takes from 0.34 seconds to 32 seconds to generate the fist response, far longer than the milliseconds of delay caused by data transmission over long distances. Latency from our data center in Norway to Milan (Italy) is approximately 0.03 seconds. Thus, the location doesn't have to be near end-users as long as bandwidth requirements are met.

  • Training: AI training workloads, such as training large models across distributed GPUs, are entirely bandwidth-dependent. Latency is not a factor for training, as the GPUs need to share massive amounts of data over high-bandwidth connections but not in real-time. This means an AI data center focused on training could be located almost anywhere, provided it has access to the necessary network infrastructure for handling high-throughput east-west traffic between GPUs.


Can an AI Data Center Be Located Anywhere?


While AI data centers don’t have strict geographic latency constraints, there are still limitations regarding location, largely driven by regulatory requirements and infrastructure availability.

  1. Regulations: According to the European Data Act, AI workloads need to be run within the GDPR region (primarily Europe). This is because GDPR-compliant regions ensure that AI systems adhere to stringent privacy and data protection laws. As a result, much of the demand for AI data centers in Europe is driven by these regulatory factors.

  2. Cross-Border Workloads: While AI workloads can be run within the broader GDPR region, there are exceptions for sensitive data (such as government or security-related data). In such cases, data sovereignty may dictate that the workloads must be kept within national borders to safeguard critical information.


Criteria for Identifying Suitable Locations for AI Data Centers


Given that AI data centers can be more geographically flexible, the question becomes: what criteria should be used to select an optimal location for an AI data center? While many considerations overlap with traditional data centers, the specific needs of AI infrastructure require additional attention.

  1. Power Availability:

    • AI workloads, particularly training, are extremely power-hungry, with some racks consuming as much as 130-140kW. Therefore, the availability of reliable power is a critical consideration. In regions with energy constraints or unreliable grids, AI data centers may face challenges in maintaining optimal performance.

    • Renewable energy sources: As sustainability becomes a growing concern, locating data centers near renewable energy sources—such as wind, solar, or hydropower—can help reduce the environmental impact of AI infrastructure.

  2. Fiber Connectivity:

    • While latency is not a priority for AI workloads, bandwidth is. AI data centers, especially those used for training, require high-speed, high-capacity fiber connections to transfer large datasets and facilitate fast communication between GPUs in distributed training environments.

    • Having a redundant fiber connection is also critical to ensure that downtime due to fiber outages does not interrupt training processes, which could have costly implications.

  3. Cooling Solutions:

    • AI data centers tend to have high power densities, especially for training tasks, which necessitates advanced cooling solutions. Smart cooling options, such as proximity to cold environments (Nordic countries), lake or seawater cooling, or efficient heat reuse systems, can help reduce operational costs and increase sustainability.

    • Given the amount of heat generated by AI workloads, being in regions where ambient temperatures are naturally lower can be a huge advantage, as it reduces the strain on cooling infrastructure.

  4. Electricity Prices:

    • The cost of running an AI data center is closely tied to electricity prices. Locating a data center in regions where electricity is more affordable is crucial to managing long-term operational costs. AI workloads, especially model training, run for days or even weeks, so power costs are a major OPEX consideration.

  5. Environmental Considerations:

    • Beyond the technical and financial aspects, it's important to consider the environmental impact of an AI data center. With the rising global emphasis on sustainability, many organizations now prefer to build data centers that can use a high percentage of renewable energy. This is also becoming an important factor in gaining public trust and adhering to corporate sustainability goals.

  6. Regulatory Environment:

    • As mentioned earlier, regulations like the European Data Act play a significant role in determining where AI workloads can be processed. Ensuring compliance with local laws, especially when it comes to data privacy, sovereignty, and cross-border data flows, is crucial for AI data centers operating in regions like Europe.


The Future of AI Data Center Locations


As the AI industry continues to grow, it's likely that non-traditional data center locations will see increased demand. AI data centers are less bound by the same proximity requirements as traditional cloud services, opening up opportunities to build these facilities in regions that offer cheaper power, better cooling options, or more favorable regulations.

For example, countries in the Nordic region, with their abundant renewable energy and cold climates, are becoming increasingly attractive for AI data centers. At the same time, Eastern Europe and non-FLAP regions in Western Europe are also seeing growth, driven by lower electricity costs and GDPR-compliant infrastructures.

In conclusion, while AI data centers don't need to be located near metropolises like traditional cloud centers, there are several key factors to consider when selecting a site, including power availability, fiber connectivity, cooling solutions, and regulatory compliance. As the demand for AI grows, so too will the diversity of locations where these specialized data centers are built.


249 views0 comments

Comments


bottom of page