AI Workloads Are Changing Network Design: East-West Traffic, 400G/800G, and the New Bottlenecks

Overview

AI workloads, both the training side and the inference side, have a traffic profile that is meaningfully different from the traditional enterprise workload of web traffic, SaaS, email, and database replication. The differences are not subtle, and they are starting to matter at the network design layer for organizations that are not hyperscalers. Inference workloads, especially agentic AI workloads that chain multiple model calls together with tool use in between, generate more east-west traffic, more long-lived flows, more latency sensitivity, and more pressure on the upstream and the data center fabric than the traditional enterprise traffic mix.

The trend matters for working network admins because the design implications are not limited to the data centers that host the AI workloads. Inference traffic that originates from an application in one data center and terminates at a GPU cluster in another data center crosses the WAN; inference traffic that originates from a user device and terminates at an edge inference node crosses the campus network; training traffic that crosses a multi-tenant data center crosses the spine-leaf fabric. The traffic patterns that used to be the exclusive concern of the hyperscalers are starting to land in enterprise networks, and the design questions that follow are starting to land in the day-to-day work of network architects and operators.

The most important framing is that this is a trend, not a discontinuity. The traditional enterprise traffic mix is not going away; web, SaaS, email, and database traffic will still dominate the bytes in most enterprise networks for the foreseeable future. What is changing is that the new traffic is shaped differently, and the network design has to accommodate both the old traffic mix and the new traffic patterns. The right response is to identify where the new traffic patterns are landing in the network and to make targeted design changes in those places, not to redesign the entire network around the new workload.

How it works

The traditional enterprise traffic pattern is dominated by north-south traffic: client to server, user to application, branch to data center, user to cloud. The flows are typically short-lived (a web request, a database query, a SaaS API call), the bandwidth per flow is modest, and the latency sensitivity is bounded by the user's tolerance for page load or query response time. The traffic is bursty and well-served by traditional TCP congestion control, traditional QoS, and traditional oversubscription ratios in the access and aggregation layers.

The AI workload traffic pattern is meaningfully different. Inference traffic between an application server and a GPU inference node is east-west within the data center, often between adjacent racks or adjacent pods. The flows are often long-lived (a multi-step agent invocation that holds an inference session open for seconds to minutes), the bandwidth per flow can be high (especially for multimodal inference that carries images, audio, or video between the application and the model), and the latency sensitivity is bounded by the model's response time, which is on the order of tens of milliseconds for chat and sub-second for many agentic workflows. The traffic is less bursty than web traffic and is poorly served by traditional oversubscription ratios in the access and aggregation layers.

Training traffic is even more extreme. A training job that runs across multiple GPUs in a cluster generates all-to-all communication between the GPUs, with each GPU exchanging gradients with every other GPU at every training step. The communication pattern is bandwidth-intensive, latency-sensitive (the training step cannot complete until the slowest gradient exchange completes), and tolerant of packet loss in the sense that the training will eventually converge but inefficient if packet loss triggers retransmission at scale. The traffic pattern is what has driven the design of specialized AI fabrics with high-bandwidth, low-latency interconnects (NVLink, InfiniBand, RoCE), but the trend in 2026 is to extend the same design considerations to the Ethernet fabric that connects the GPU clusters to the rest of the data center and to the wider network.

The move to 400G and 800G in the data center fabric is the most visible response to the AI workload traffic pattern. 400G and 800G switching is not new in the sense that the standards have been defined for several years; what is new is the deployment trajectory in 2026. The hyperscalers have been deploying 400G and 800G for some time; the enterprise data center is now following, driven by the AI workload deployment in enterprise data centers rather than by the bandwidth requirements of the traditional enterprise workload. The optics (400G FR4, 400G DR4, 800G PSM8, 800G 2xFR4) and the switch ASICs that support the higher speeds have become more available and more cost-effective, which has accelerated the deployment.

In practice

For most enterprise network admins, the AI workload trend is going to land as a series of specific design questions rather than as a wholesale redesign. The first question is where the inference workloads are going to run. If the inference workloads are running in a hyperscaler-provided service (Azure OpenAI, AWS Bedrock, Google Vertex AI, Anthropic on Bedrock or Vertex), the network impact is at the WAN edge: more bandwidth to the hyperscaler, more careful latency engineering for the workloads that are latency-sensitive, and more attention to the routing and traffic-engineering policies that govern the path from the user to the inference endpoint.

The second question is whether the inference workloads are running in an on-premises or colocation data center. If so, the network impact is at the data center fabric: more bandwidth between the application servers and the GPU cluster, lower oversubscription ratios in the spine-leaf fabric that carries the east-west traffic, and consideration of whether the existing fabric is 25G/100G (likely undersized for AI workloads) or 100G/400G (likely adequate for moderate-scale inference) or 400G/800G (likely adequate for heavy inference or training workloads). The right answer depends on the specific AI workload mix and the scale at which the organization is deploying inference.

The third question is whether the AI workload deployment is going to drive an upgrade of the campus network or the WAN. For most enterprise deployments in 2026, the answer is no: the campus network and the WAN are sized for the traditional workload mix, and the AI workload traffic is a small enough fraction of the total to be carried by the existing capacity. The exceptions are organizations that are deploying edge inference (inference at the campus or branch level for latency-sensitive applications) or organizations that are deploying large-scale training workloads on-premises. For those organizations, the campus and the WAN become part of the AI workload design, and the same considerations (bandwidth, latency, jitter) apply at the campus and the WAN as at the data center fabric.

Common mistakes

The first mistake is assuming that AI workloads will be carried by the existing network without design changes. The traditional enterprise network is sized for the traditional workload mix, and the AI workload traffic pattern is meaningfully different. A network that carries traditional web and SaaS traffic comfortably may be undersized for AI workload traffic even at the same total bytes per second, because the AI workload traffic has different characteristics (east-west, long-lived, latency-sensitive) that the existing design does not optimize for.

The second is over-designing the network for the AI workload. The right response to a changing workload mix is targeted design changes in the places where the new workload lands, not a wholesale redesign of the entire network around the new workload. A campus network that has been sized for traditional enterprise traffic does not need to be redesigned around 400G switching just because the data center is being upgraded for AI workloads. The right response is to identify the specific places where the AI workload traffic lands and to make targeted design changes in those places.

The third is conflating training traffic with inference traffic. The two have different network requirements. Training traffic is bandwidth-intensive, latency-sensitive, and runs in tightly-coupled GPU clusters that are usually on a specialized fabric (NVLink, InfiniBand, RoCE) rather than on the general Ethernet fabric. Inference traffic is more modest in bandwidth per flow but more distributed, and it runs on the general Ethernet fabric at the data center, campus, and WAN level. Designing the network for training traffic when the workload is inference traffic is over-investment; designing for inference traffic when the workload is training traffic is under-investment.

The fourth is ignoring the latency and jitter requirements of agentic AI workloads. A traditional web request can tolerate hundreds of milliseconds of latency and tens of milliseconds of jitter without the user noticing. An agentic AI workflow that chains multiple inference calls together with tool use in between cannot tolerate that latency and jitter without the workflow feeling sluggish to the user. The latency and jitter requirements of the new workload are tighter than the latency and jitter requirements of the old workload, and the network design has to accommodate the tighter requirements in the places where the new workload lands.

Defensive guidance

Inventory the AI workloads that are running or planned in your environment, and identify the network segments that carry the traffic for those workloads. The inventory is the input to the targeted design changes that the new workload requires. The right level of detail is per-workload: what model, what input/output sizes, what latency tolerance, what bandwidth per flow, what traffic pattern (east-west, north-south, all-to-all). The inventory should also identify which AI workloads are running in hyperscaler services, which are running in the on-premises data center, and which are running at the edge.

Make targeted design changes in the network segments that carry AI workload traffic. The changes are different per segment: at the data center fabric, the changes are about bandwidth and oversubscription; at the WAN, the changes are about bandwidth and routing policy; at the campus, the changes are about bandwidth and QoS for the edge inference workloads. The targeted changes are smaller and cheaper than a wholesale redesign, and they deliver the operational benefit that the AI workload requires.

Treat AI workload traffic as a first-class citizen in the network monitoring and capacity planning process. The traffic pattern is different from the traditional workload mix, and the monitoring and capacity planning that was tuned for the traditional mix may not catch the AI workload's emerging capacity or performance problems. The right operational discipline is to add the AI workload traffic to the dashboards and the capacity planning models, and to alert on the specific symptoms that the AI workload generates (long-lived flows at high bandwidth, latency-sensitive flows exceeding their latency budget, GPU cluster interconnect utilization approaching capacity).

Plan for the workload mix to keep shifting. The AI workload mix is changing rapidly, and the network design that is adequate for the current AI workload mix may not be adequate for the next iteration. The right operational discipline is to keep the AI workload inventory current, to keep the network monitoring and capacity planning current with the workload, and to plan the network upgrades in line with the workload projections rather than waiting for the workload to outgrow the network.

For organizations that are not deploying AI workloads on-premises, the network impact is bounded but real. The bandwidth to the hyperscaler AI services goes up, the latency requirements get tighter, and the routing and traffic-engineering policies that govern the path to the AI services get more important. The right operational answer is to size the WAN and the internet edge for the new workload mix, to engineer the routing and traffic-engineering policies for the latency-sensitive traffic, and to monitor the AI workload traffic at the WAN and internet edge as a first-class category. The hyperscaler-provided AI services do not insulate the enterprise network from the AI workload trend; they shift the impact from the data center fabric to the WAN and internet edge.

AI Workloads Are Changing Network Design: East-West Traffic, 400G/800G, and the New Bottlenecks

Overview

How it works

In practice

Common mistakes

Defensive guidance

Related articles

Post-Quantum Cryptography Planning for the Network: What to Inventory Now

Wi-Fi 7 in Real Deployments: Beyond Throughput, What 802.11be Actually Buys You

SASE Orchestrators and the Convergence of SD-WAN, SSE, and Cloud Security