Research · Satellite

DeepSeek's $0.27 Token Economics — What a 20× Pricing Gap Means for the Memory Supercycle

If frontier-quality output tokens cost $1.10/M from a Chinese open-weight model and $25–30/M from US frontier, the memory cycle inflection date is Q1 2027, not 2028.

Companion piece to our pillar research The China AI Disruption Thesis — Why The Sell-Side Consensus Is Six Months Late. This piece zooms in on Vector 1 (token commoditization) and connects it to the memory cycle (Morgan Stanley / BofA framework) to show why the inflection date is materially earlier than consensus prices.

The Pricing Table That Re-Rates the AI Stack

Model	Input ($/M tokens)	Output ($/M tokens)	Output Multiple
DeepSeek V4 Pro	$0.27	$1.10	1.0× (baseline)
DeepSeek V4 Flash	$0.14	$0.28	0.25×
Qwen 3.5 Max (Alibaba)	$0.40	$1.20	1.1×
Gemini 2.5 Pro	$3.50	$10.50	9.5×
Claude Sonnet 4.6	$3.00	$15.00	13.6×
Claude Opus 4.7	$5.00	$25.00	22.7×
GPT-5.5	~$5.00	$30.00	27.3×

Source: vendor API pricing pages, May 2026.

On output tokens — where the margin is concentrated and where the bulk of enterprise spend resides — the multiple between DeepSeek V4 Pro and US frontier sits between 14× (Claude Sonnet) and 27× (GPT-5.5). The mid-tier US production model (Gemini 2.5 Pro) is at 9.5× the price for benchmark-equivalent intelligence.

Is This a Subsidy? The Three Independent Tests

The natural sell-side rebuttal is: DeepSeek is selling tokens below cost, the Chinese government is subsidizing, and the spread will normalize once the subsidy is exhausted. We have run three independent tests of this hypothesis and conclude the pricing reflects structural cost advantage, not subsidy.

Test 1: Is the Architecture Cheaper?

DeepSeek V4 is a mixture-of-experts (MoE) model with reported active parameter count of approximately 37B per inference call against a 671B total parameter count. The implied compute per token is materially lower than dense frontier US models (where active = total parameters). Published training-cost figures place DeepSeek V4 at $5–8M total training cost — versus reported (and disputed) US frontier training budgets in the $50–500M range.

Even if the published training costs are aggressive (and they likely are — they exclude infrastructure depreciation, salaries beyond the core team, data acquisition, and prior-model R&D), the architectural efficiency gap is real. Active-parameter inference compute is 10–20× cheaper than dense-frontier inference compute. The pricing reflects this.

Test 2: Are the Competing Chinese Vendors Pricing Coherently?

If DeepSeek were running a loss-leader campaign to capture market share, we would expect Alibaba (Qwen), Tencent (Hunyuan), Baidu (Ernie), and ByteDance (Doubao) to price either much higher (waiting for the loss leader to exhaust) or much lower (chasing share). Instead, all five major Chinese inference vendors are within a 30% pricing range of each other on equivalent intelligence tiers.

This indicates a competitive equilibrium at the prevailing cost structure — not a subsidy from a single actor.

Test 3: Does Inference Deploy on Chinese Hardware Cost-Effectively?

The Huawei Ascend 910C and Atlas 800 architecture support DeepSeek V4 inference at reported cost per token that is roughly consistent with the API pricing plus a reasonable margin. The "subsidy" thesis would require Huawei to be selling Atlas 800 systems at a loss, which contradicts Huawei's own financial disclosures (the company posted RMB 153B net income in 2025).

All three tests point to the same conclusion: the pricing reflects structural cost advantage rooted in (a) architectural efficiency, (b) Chinese-priced electricity inputs, (c) Huawei-supplied inference silicon at materially lower hardware cost than NVIDIA-equivalents. The gap is not closing through subsidy exhaustion.

The S3 Analogue — 80% Compression Over Six Years

The historical analogue we draw is AWS S3 storage pricing (2007–2013). The product launched at approximately $0.15 per GB-month. By 2013 it had fallen to $0.03 per GB-month — an 80% compression over six years. The mechanism was commodity competition (Google Cloud, Azure, Backblaze, smaller regional providers) arriving at the same operational cost frontier and forcing price.

AI inference is following the same curve but compressed into 18–24 months. A frontier-quality output token cost approximately $30/M in mid-2023 (GPT-4 at launch) and is now available at $1.10/M from DeepSeek V4 Pro. That is a 96% compression in 30 months.

For the curve to flatten, the structural cost drivers (compute per token, electricity per inference, silicon per dollar) would need to stop compressing. None of those are likely to stop. The Chinese open-weight architecture cycle is accelerating, not decelerating.

How This Re-Rates the Memory Supercycle

Morgan Stanley raised its Micron price target to $250 in May 2026 citing a "memory supercycle similar to 2017" with "structural HBM tightness through 2027." Bank of America projects DRAM revenue +51% YoY in 2026 and the HBM market at $54.6B. Both frameworks assume:

AI compute demand is a one-way function of model scaling.
CXMT (Chinese DRAM) remains a ~5% global share player throughout the cycle.
Hyperscaler capex sustains memory demand growth through 2028.

The token-economics data refutes assumption (1). If output token prices compress 5–10× over 18 months, the per-dollar AI revenue requires a 5–10× growth in token volume just to hold revenue flat. That demand growth materializes (more agents, more inference workloads), but the hyperscaler-margin per token compresses. The capex justification shifts from "we need more GPUs because revenue per workload is high" to "we need more GPUs because volume scales but margin compresses" — a fundamentally different capex commitment psychology.

The CXMT data refutes assumption (2). CXMT has:

Shifted 60,000 wafers/month (20% of capacity) to HBM-only production.
Demonstrated DDR5-8000 and LPDDR5X-10667 production capability.
Announced LPDDR6 first-mover status for H2 2026, ahead of Samsung and Micron.
Refiled for STAR Market listing with a ~$5B war chest for two additional fabs.

If CXMT reaches 8–10% global DRAM share by year-end 2026 (which the capacity additions support), that is sufficient to materially erode pricing power on the marginal commodity ton. The memory cycle inflection moves from 2028 (sell-side consensus) to Q1 2027 (our base case).

Combining (1) and (2): the memory supercycle is still real for H1 2026 (we agree directionally), but its duration is shorter than consensus. The Morgan Stanley framework that assumes "structural HBM tightness through 2027" is too optimistic on the back half.

The Micron-Specific Implication

Micron is the cleanest US-listed expression of the memory cycle and therefore the cleanest vehicle to test the thesis. The bull and bear case decompose as follows:

Case	Q1 2027 Outcome	MU Implied Move
MS Bull	HBM tight through 2027, MU FY27 EPS $25+, multiple expansion	+30–45% from spot
Our Base	Memory inflection Q1 2027, CXMT at 8–10% share, FY27 EPS $14–17, multiple compression	−25–35% from spot
Tail Bear	Token economics force hyperscaler capex pause, MU FY27 EPS $9–12	−45–55% from spot

The MU listed options surface prices a roughly +15% / −20% one-year range as the implied move. Our base case is outside that range to the downside. Our tail bear is materially outside it. This is the asymmetry the dispersion structure captures via long single-name vega. See Dispersion Trading the China AI Thesis for the trade construction.

Monitoring — Token + Memory Joint Signals

Seven series we track:

DeepSeek API pricing tape — any further cut materially compresses US frontier margin.
Qwen / Hunyuan / Doubao / Ernie API pricing — Chinese inference competitive landscape.
HBM3E contract price tape (TrendForce monthly) — leading indicator of cycle inflection.
DRAM contract price tape — commodity layer.
CXMT capacity disclosures + STAR Market filing data — supply ramp signal.
Hyperscaler quarterly capex revisions — the demand-side feedback loop.
Token-volume disclosures from US vendors (Anthropic, OpenAI, Google) — volume-side compensation for margin compression.

Conclusion

DeepSeek V4 Pro at $1.10/M output tokens is not a subsidy. It is the new structural cost frontier of frontier-quality AI inference, validated by competitive equilibrium with four other major Chinese vendors and by the Huawei Ascend hardware economics. The 20× output multiple between Chinese open-weight and US frontier is not closing on a subsidy-exhaustion timeline — it is widening on the architectural-efficiency curve.

For the memory supercycle thesis, this translates directly: the per-token revenue compression rewrites the hyperscaler capex equation in a way that pulls forward the cycle inflection from 2028 to Q1 2027. The Morgan Stanley / BofA framework is directionally right for H1 2026 and structurally wrong for H2 2026 onward.

For Micron specifically, the implied options pricing under-models the asymmetry of the base case versus consensus. The cleanest expression of the gap is long single-name vega via the dispersion structure described in Dispersion Trading the China AI Thesis.

For the full thesis: The China AI Disruption Thesis. For the financial-architecture transmission mechanism: The Hyperscaler Bond Wall. For the physical-infrastructure constraint: PJM at $329/MW-day.

Disclaimer: This document is for informational purposes only and does not constitute investment advice, an offer, or a solicitation. CrossVol Research does not make trade recommendations. The opinions expressed are those of the authors at the time of writing and may change without notice. CrossVol Research and its principals may hold positions, directly or indirectly, in entities mentioned herein. Past performance is not indicative of future returns. Communication promotionnelle non-MIFID dans l'Union européenne.

The Pricing Table That Re-Rates the AI Stack

Is This a Subsidy? The Three Independent Tests

Test 1: Is the Architecture Cheaper?

Test 2: Are the Competing Chinese Vendors Pricing Coherently?

Test 3: Does Inference Deploy on Chinese Hardware Cost-Effectively?

The S3 Analogue — 80% Compression Over Six Years

How This Re-Rates the Memory Supercycle

The Micron-Specific Implication

Monitoring — Token + Memory Joint Signals

Conclusion

The China AI Disruption Thesis

CrossVol Research on Substack

Live on Bluesky

Live on X

Before you go...

Go from reader to trader.