Essay · 038

Memory Is the New Constraint

April 13, 20268 min readEsther Wong

For the past decade, the AI infrastructure conversation has been dominated by compute — GPUs, FLOPS, chip architecture. That framing is now obsolete. The binding constraint in generative AI is memory: how much you have, how fast it moves, and whether you can get any at all.

I say this not as a market thesis but as a direct observation from conversations with the people who make and sell memory at scale.

The Demand Signal Nobody Expected

When I spoke with Sumit Sadana, Chief Business Officer at Micron, one data point stopped me: customers are placing memory orders five years in advance. The standard horizon has historically been one year. This is not a procurement quirk — it is a structural signal that the largest AI operators have concluded that memory supply will be scarce for a very long time, and they are locking in capacity accordingly.

The supply picture today is stark. Micron can currently meet only half to two-thirds of demand from its largest customers. In response, the company has committed to a capex budget exceeding $25 billion for this year alone — up from an initial estimate of $20 billion just three months prior. Construction-related capex is expected to increase by more than $10 billion in the following year. This is a company accelerating at full throttle while still running behind.

The broader market is confirming the same thesis. Bank of America has characterized 2026 as a memory supercycle similar to the boom of the 1990s, with global DRAM revenue forecast to surge 51% year-over-year.

SK Hynix projects DRAM average selling prices rising 243% year-over-year³, with operating margins climbing above 70%. Meta's VP of Engineering has stated publicly: "We're absolutely worried about HBM supply." When hyperscalers say that out loud, it reflects a reality already baked into their roadmaps.

Part of what is driving this tightness is structural, not just demand-driven. HBM manufacturing consumes fab capacity at roughly a 3:1 ratio compared to DDR5¹. As the industry pivots toward HBM to serve AI workloads, it simultaneously compresses available capacity for conventional DRAM — a tightening loop that will not resolve quickly.

Fab capacity allocation, DRAM vs HBM

HBM's 3:1 fab ratio vs DDR5 means every pivot toward AI workloads compresses conventional DRAM supply in lockstep.

Why Memory Became the Performance Lever

The shift in memory's strategic importance tracks directly to how large language models actually run. Research has established that LLM inference is memory-bound, not compute-bound — even at large batch sizes, DRAM bandwidth saturation is the primary bottleneck. That is a counterintuitive finding for anyone who spent the last several years thinking about AI infrastructure in terms of GPU utilization.

This has a concrete implication: adding memory capacity is one of the highest-leverage interventions available to an inference operator. Increasing memory to support longer-flow context windows can drive a 2-3x improvement in inference performance². In a market where inference cost is a real competitive variable, a 2-3x performance multiplier from a memory configuration change is the kind of number that reshapes economics.

Memory has transitioned from a supporting component to a critical performance lever.
— AMD's CTO, in conversation

That is not positioning language. It reflects the technical reality that the compute-to-memory ratio in current AI architectures has created a genuine bottleneck that more GPUs alone cannot solve.

3C AGI Partners CEO Esther Wong speaking at HumanX

The HBM product roadmap reflects this urgency. HBM4 is expected in mass production by mid-2026, featuring 2,000 pins at 8-10 Gbps per pin. HBM4E, expected in 2027, is projected to boost bandwidth another 50%. The roadmap is aggressive because the demand is real and the use case is not going away.

Agentic AI Changes the Math Entirely

Everything above describes the current state of inference for standard chatbox-style queries. Agentic AI changes the math in ways that make the memory constraint significantly more acute.

A single agentic query requires 2x more HBM bandwidth and generates 4x more tokens compared to a standard query. These are not incremental differences — they represent a step change in per-query resource consumption. As agentic workloads become a larger share of total AI traffic, the aggregate demand on memory infrastructure compounds accordingly.

Agentic workflows also introduce a secondary dynamic that is underappreciated: a steep increase in CPU demand alongside GPUs. Many agentic flows run heavily on CPUs — orchestration logic, tool calls, context management — which means the infrastructure profile of an agentic stack looks meaningfully different from a pure GPU inference cluster. This has implications for how data centers are configured and for which silicon vendors find themselves in unexpected demand.

The HBM market is being repriced in real time to reflect this trajectory. From approximately $35 billion in 2025, the market is projected to reach roughly $100 billion by 2028. Micron, which captured 21% of the global HBM market in 2025 and became a preferred vendor for NVIDIA GB200 systems, is one clear beneficiary. But the more important observation is systemic: HBM capacity is now a first-order constraint on the rate at which agentic AI can scale.

What This Means for the Stack

Several structural repricing events are now in motion.

Memory suppliers — Micron, SK Hynix, and Samsung — are moving from commodity cyclicality to strategic criticality. The five-year forward order dynamic is the clearest evidence that the largest customers have already internalized this. These are not speculative bets; they are capacity reservations made by operators who cannot afford to be constrained. The margin profiles being projected by SK Hynix (operating margins above 70%) suggest this is not a temporary spike³.

With Youngchun Cho, Senior Director of Global Business Development at SK hynix US

Samsung is perhaps the most telling signal of all. In a recent conversation with the head of Samsung US, the framing was direct: the company is now flooded with cash and, for the first time, is structurally exploring investing into its ecosystem in the United States. That is a remarkable statement from a company that has historically been a pure supplier. When the world's largest memory manufacturer begins thinking about deploying capital into the downstream ecosystem it serves, it tells you something important: the people closest to the hardware believe this cycle is long, deep, and worth owning.

For AI infrastructure investors, the memory layer deserves the same analytical rigor that has been applied to compute. The GPU stack has been extensively mapped — valuations, competitive dynamics, supply chains. The memory stack has not received equivalent attention, which means pricing inefficiencies likely remain. Companies building on top of HBM-dense architectures, and those enabling software-defined optimization of memory allocation in inference pipelines, are worth examining closely.

The CPU story is also worth watching. Agentic workflows are driving CPU demand in ways that were not widely anticipated twelve months ago. This creates a more interesting setup for x86 and ARM compute than the pure GPU narrative would suggest.