2026-04-13
0
By Esther Wong
For the past decade, the AI infrastructure conversation has been dominated by compute — GPUs, FLOPS, chip architecture. That framing is now obsolete. The binding constraint in generative AI is memory: how much you have, how fast it moves, and whether you can get any at all. I say this not as a market thesis but as a direct observation from conversations with the people who make and sell memory at scale.
The Demand Signal Nobody Expected
When I spoke with Sumit Sadana, Chief Business Officer at Micron, one data point stopped me: customers are placing memory orders five years in advance. The standard horizon has historically been one year. This is not a procurement quirk — it is a structural signal that the largest AI operators have concluded that memory supply will be scarce for a very long time, and they are locking in capacity accordingly.
The supply picture today is stark. Micron can currently meet only half to two-thirds of demand from its largest customers. In response, the company has committed to a capex budget exceeding $25 billion for this year alone — up from an initial estimate of $20 billion just three months prior. Construction-related capex is expected to increase by more than $10 billion in the following year. This is a company accelerating at full throttle while still running behind.
The broader market is confirming the same thesis. Bank of America has characterized 2026 as a"memory supercycle similar to the boom of the 1990s1 with global DRAM revenue forecast to surge 51% year-over-year. SK Hynix projects DRAM average selling prices rising 243% year-over-year2 with operating margins climbing above 70%. Meta’s VP of Engineering has stated publicly: “We’re absolutely worried about HBM supply.” When hyperscalers say that out loud, it reflects a reality already baked into their roadmaps.
Part of what is driving this tightness is structural, not just demand-driven. HBM manufacturing consumes fab capacity at roughly a 3:1 ratio compared to DDR53 As the industry pivots toward HBM to serve AI workloads, it simultaneously compresses available capacity for conventional DRAM — a tightening loop that will not resolve quickly.
Why Memory Became the Performance Lever
The shift in memory’s strategic importance tracks directly to how large language models actually run. Research has established that LLM inference is memory-bound, not compute-bound6 — even at large batch sizes, DRAM bandwidth saturation is the primary bottleneck. That is a counterintuitive finding for anyone who spent the last several years thinking about AI infrastructure in terms of GPU utilization.
This has a concrete implication: adding memory capacity is one of the highest-leverage interventions available to an inference operator. Increasing memory to support longer-flow context windows can drive a 2–3x improvement in inference performance. In a market where inference cost is a real competitive variable, a 2–3x performance multiplier from a memory configuration change is the kind of number that reshapes economics.
AMD’s CTO reinforced this framing directly in my conversation with him: memory has transitioned from a supporting component to a critical performance lever. That is not positioning language. It reflects the technical reality that the compute-to-memory ratio in current AI architectures has created a genuine bottleneck that more GPUs alone cannot solve.

AMD CTO Mark Papermaster speaking at HumanX
The HBM product roadmap reflects this urgency. HBM4 is expected in mass production by mid-2026, featuring 2,000 pins at 8–10 Gbps per pin7. HBM4E, expected in 2027, is projected to boost bandwidth another 50%. The roadmap is aggressive because the demand is real and the use case is not going away.
Agentic AI Changes the Math Entirely
Everything above describes the current state of inference for standard chatbox-style queries. Agentic AI changes the math in ways that make the memory constraint significantly more acute.
A single agentic query requires 2x more HBM bandwidth and generates 4x more tokens compared to a standard query. These are not incremental differences — they represent a step change in per-query resource consumption. As agentic workloads become a larger share of total AI traffic, the aggregate demand on memory infrastructure compounds accordingly.
Agentic workflows also introduce a secondary dynamic that is underappreciated: a steep increase in CPU demand alongside GPUs. Many agentic flows run heavily on CPUs — orchestration logic, tool calls, context management — which means the infrastructure profile of an agentic stack looks meaningfully different from a pure GPU inference cluster. This has implications for how data centers are configured and for which silicon vendors find themselves in unexpected demand.
The HBM market is being repriced in real time to reflect this trajectory. From approximately $35 billion in 20253, the market is projected to reach roughly $100 billion by 2028. Micron, which captured 21% of the global HBM market in 20255 and became a preferred vendor for NVIDIA GB200 systems, is one clear beneficiary. But the more important observation is systemic: HBM capacity is now a first-order constraint on the rate at which agentic AI can scale.
What This Means for the Stack
Several structural repricing events are now in motion.
Memory suppliers — Micron, SK Hynix, and Samsung — are moving from commodity cyclicality to strategic criticality. The five-year forward order dynamic is the clearest evidence that the largest customers have already internalized this. These are not speculative bets; they are capacity reservations made by operators who cannot afford to be constrained. The margin profiles being projected by SK Hynix (operating margins above 70%) suggest this is not a temporary spike2.

With Youngchun Cho, Senior Director of Global Business Development at SK hynix US
Samsung is perhaps the most telling signal of all. In a recent conversation with the head of Samsung US, the framing was direct: the company is now flooded with cash and, for the first time, is structurally exploring investing into its ecosystem in the United States. That is a remarkable statement from a company that has historically been a pure supplier. When the world’s largest memory manufacturer begins thinking about deploying capital into the downstream ecosystem it serves, it tells you something important: the people closest to the hardware believe this cycle is long, deep, and worth owning.
For AI infrastructure investors, the memory layer deserves the same analytical rigor that has been applied to compute. The GPU stack has been extensively mapped — valuations, competitive dynamics, supply chains. The memory stack has not received equivalent attention, which means pricing inefficiencies likely remain. Companies building on top of HBM-dense architectures, and those enabling software-defined optimization of memory allocation in inference pipelines, are worth examining closely.
The CPU story is also worth watching. Agentic workflows are driving CPU demand in ways that were not widely anticipated twelve months ago. This creates a more interesting setup for x86 and ARM compute than the pure GPU narrative would suggest.
Finally, the data center itself is being restructured around memory constraints. Operators who locked in HBM supply early — and who are building infrastructure optimized for memory bandwidth rather than raw compute — have a durable advantage. The constraint is not going to ease on a short time horizon, and the operators who treat memory as a strategic asset rather than a procurement line item will be better positioned.
The GPU era of AI infrastructure is not over, but the singular focus on compute has obscured a more pressing bottleneck. The executives who build and sell the components at the base of the AI stack are telling a consistent story: memory is where scarcity lives, memory is where performance is determined, and memory is where the next several years of infrastructure investment will be won and lost. That story deserves to be taken seriously.

Evening drinks with AMD CTO Mark Papermaster and Rebecca Bellan from Techcrunch to recap the day
Sources
1. Bank of America / iGenUltra — Agentic AI & Memory Supercycle — https://www.igenultra.com/blog/the-agentic-ai-revolution-is-here-agentic-ai-multi-agent-systems-1775761483
2. SK Hynix 2026 Market Outlook (via iGenUltra) — https://www.igenultra.com/blog/the-agentic-ai-revolution-is-here-agentic-ai-multi-agent-systems-1775761483
3. Investing Fox — Micron and the RAM crisis, December 2025 — https://investingfox.com/en/micron-and-the-ram-crisis-at-the-end-of-2025-how-ai-is-turning-memory-into-a-key-component
4. Blocks & Files — Micron HBM record quarter, December 2025 — https://www.blocksandfiles.com/ai-ml/2025/12/18/micron-rides-hbm-surge-to-record-quarter/1722141
5. Financial Content / MarketMinute — Micron HBM boom 2025 — https://markets.financialcontent.com/stocks/article/marketmin ute-2025-12-25-the-memory-wall-crumbles-how-microns-hbm-boom-redefined-the-ai-landscape-in-2025
6. arXiv — GPU Bottlenecks in Large-Batch LLM Inference — https://arxiv.org/html/2503.08311v2
7. LinkedIn / Sharada Yeluri — HBM4 roadmap — https://www.linkedin.com/posts/sharada-yeluri_hbm-activity-7403459527876206592-cNlk
8. Samsung / Future Memory Storage — Agentic AI & Memory-Centric Computing — https://files.futurememorystorage.com/proceedings/2025/20250807_DRAM-304-1_SO.pdf