On its surface, the U.S. chip sanctions regime appears to have locked in an American victory in the AI race. As of late 2025, the best U.S. AI chips were roughly five times more powerful than China’s leading chips; according to one analysis, that gap is projected to widen to 17 times by the second half of 2027. Yet this single-axis framing sits in striking tension with the assessment offered by U.S. industry leaders themselves. Testifying before the U.S. Senate Commerce Committee in May 2025, AMD CEO Lisa Su stated explicitly that maintaining the U.S. competitive edge in AI innovation “actually requires excellence at every layer of the stack.” AI competitiveness, in other words, is a multi-layered system spanning silicon, software, models, energy, and ecosystems – and a chokepoint at any single layer is insufficient to secure the whole.
China’s response operates precisely on this multi-layered logic: rather than confront the American chip fortress head-on, it circumvents it – replicating the strategy of “encircling the cities from the countryside” that it has already deployed successfully in solar panels and consumer electronics, among other sectors. The logic is straightforward: forgo a frontal assault on the high-end market, and instead penetrate the global mid-to-low-end application market through algorithmic efficiency, energy advantage, and aggressive pricing, until scale dynamics begin to compress the high-end fortress in reverse.
The Three Layers of Chinese Advantage
The central terrain of this contest is what Jensen Huang, Nvidia’s CEO, termed token factory economics – a metric cluster anchored on tokens per watt and complemented by cost per token. Speaking at NVIDIA GTC 2026, Huang framed AI factories as fundamentally power-constrained systems: capacity does not scale with demand, so efficiency becomes decisive, and tokens per watt, token speed, and cost per token emerge as the core metrics. Across both of these metrics, China is now constructing a structural advantage.
At the algorithmic level, Chinese companies can glean more tokens from fewer chips. DeepSeek reportedly trained its V3 model for $6 million – compared to roughly $100 million for OpenAI’s GPT-4 – using approximately one-tenth of the compute consumed by Meta’s comparable LLaMA 3.1 model. The Mixture-of-Experts (MoE) architecture allows Chinese developers to compensate for their generational silicon disadvantage with structural efficiency.
At the hardware level, Chinese domestic chips are now rapidly closing the gap with the H20, Nvidia’s China-specific export variant. According to research by Guosen Securities, Baidu’s third-generation Kunlun P800 chip reaches roughly 345 TFLOPS at FP16, on par with Nvidia’s A100, with interconnect bandwidth approaching that of the H20. In September 2025, Alibaba T-Head’s Parallel Processing Unit (PPU) accelerator was demonstrated on Chinese state television as performing on par with the H20; China Unicom has since deployed over 16,000 PPUs at its Qinghai data center. Crucially, on the cost dimension, the PPU’s domestic 7nm process and 2.5D packaging make a single card 40 percent cheaper than the imported H20. Together, these developments are reshaping the competitive landscape on both tokens per watt and cost per token simultaneously.
China is also driving down cost per token through its energy strategy. By the end of 2025, China’s installed power generation capacity reached 3.89 billion kilowatts, with wind and solar contributing 1.84 billion kW – 47.3 percent of the total. Chinese electricity costs run 30-50 percent below those in the United States. Changjiang Securities has gone so far as to characterize tokens as a “power derivative,” noting that electricity accounts for 60–70 percent of large-model operating costs. Tokens, in effect, allow China to export the economic value of its domestic electricity globally – without exporting a single kilowatt.
At the market level, pricing itself becomes the weapon. MiniMax M2.5 and Zhipu GLM-5 charge $0.30 per million input tokens on OpenRouter, compared with $5 for Anthropic’s Claude Opus 4.6 – roughly one-sixteenth the price. The true force of this differential, however, lies in the fact that it does not come at the expense of performance.
On SWE-Bench Verified – the industry’s gold-standard coding benchmark – MiniMax M2.5 scores 80.2 percent, trailing Claude Opus 4.6’s 80.8 percent by a mere 0.6 percentage points. Both models complete benchmark tasks in nearly identical time (22.8 minutes for M2.5 versus 22.9 minutes for Opus 4.6), yet the per-task cost differs by a factor of 20: roughly $0.15 for M2.5 against $3.00 for Opus 4.6. For a mid-sized engineering team, this translates into monthly costs of $225 versus $4,500 for substantively equivalent output.
To be analytically honest, this near-parity is concentrated in coding and agentic tool use; on pure mathematical reasoning (AIME) and abstract reasoning (ARC-AGI), the flagship models from OpenAI and Google retain clear leads. Yet the dimensions where China has reached parity – coding, document processing, office automation, customer-service agents – are precisely the most commercially salient enterprise workloads. Chinese tokens, in other words, are penetrating the global enterprise AI market by provided 99 percent of the capability at 5 percent of the price.
This is the essence of the “encircling the cities from the countryside” playbook: victory does not require producing the most advanced commodity, but producing a good-enough commodity at structurally lower cost until the opponent’s premium pricing model loses its market sustainability.
The strategic effect is already visible. According to OpenRouter, the world’s largest LLM API aggregation platform, Chinese models surpassed U.S. models in weekly token call volume for the first time during the week of February 9-15, 2026, reaching 4.12 trillion tokens against 2.94 trillion for U.S. models; the following week extended that lead to 5.16 trillion – a 127 percent increase in just three weeks. During the week of February 16-22, four of the top five most-used models on the platform were Chinese – MiniMax M2.5, Moonshot’s Kimi K2.5, Zhipu’s GLM-5, and DeepSeek V3.2 – collectively accounting for 85.7 percent of total top-five call volume. By February 24, Chinese models had captured 61 percent of OpenRouter’s total token consumption, with MiniMax M2.5 alone consuming 2.45 trillion tokens in a single week – a 197 percent week-over-week jump.
Most strategically revealing, however, is the testimony of the platform’s own leadership: OpenRouter COO Chris Clark observed that Chinese open-weight models have captured significant market share precisely because they are “disproportionately heavy in agentic flows run by U.S. firms.” The cities, in other words, have begun fielding the countryside’s army for their most commercially salient agentic tasks. Just as Mao Zedong’s original military doctrine instructed forces to avoid decisive battle where the enemy was strongest and instead accumulate momentum from the countryside until the cities themselves could be surrounded, China’s token economy is now executing the same logic at digital scale – encircling the American high-end AI fortress from the global periphery upward.
The Sanctions Regime Meets the Token Economy
The American export-control regime rests on an unstated premise: that compute scarcity at the training stage would translate into capability scarcity at the deployment stage. The token economy severs this transitive chain. Once the binding constraint shifts from training FLOPs to inference watts, the operative question is no longer “who owns the most powerful chips” but “who can deliver the cheapest token at the moment of use”
Sanctions constrain the input of the last war while China competes on the output of the next one. Read in this light, AMD CEO Lisa Su’s testimony that competitiveness “actually requires excellence at every layer of the stack” is revealing. The United States has fortified a single layer – silicon – while leaving four others comparatively exposed: energy, models, inference infrastructure, and ecosystem integration.
Chris Clark’s observation that the heaviest consumers of Chinese tokens in agentic workflows are American firms themselves warrants its own strategic treatment. When U.S. SaaS companies route agentic calls through MiniMax or GLM-5, they are not merely procuring a cheaper input; they are embedding Chinese inference into the productivity layer of the American economy. This is structurally distinct from, and more intimate than, solar panel dependency. Tokens carry cognitive function. They get fine-tuned against enterprise workflows, accumulate institutional muscle memory, and reshape the surrounding software stack around their own quirks and capabilities. The switching cost is not the price of a new supplier but the cost of retraining an entire operational architecture. Critically, because most leading Chinese models are open-weight, the dependency cannot be cleanly severed through API-level restrictions: U.S. firms can self-host the same models on domestic hardware, preserving the cognitive dependency while erasing the regulatory handle. The countryside has not merely surrounded the cities; the cities have invited it inside the gates.
If China’s offensive operates across three layers – algorithm, energy, market – then a coherent American response must answer at each.
The single highest-leverage move available to Washington is grid and generation buildout. Nuclear restart, transmission-permitting reform, and behind-the-meter generation for data centers no longer belong to climate policy or energy policy as separate categories – they are AI policy. The political contests surrounding the Inflation Reduction Act and federal permitting reform should be reframed in precisely this light. Without cheap electricity, no chip advantage survives the inference era.
American policy and capital allocation remain disproportionately oriented toward frontier model training. The competitive battlefield, however, has already shifted to inference throughput: tokens per watt, not parameters per model. Federal R&D priorities, Department of Energy compute allocations, and DARPA programs require rebalancing toward inference-stack efficiency – better serving frameworks, sparser architectures, and specialized inference silicon. China’s algorithmic-efficiency lead is not insurmountable, but only if Washington first acknowledges that this is where the current contest is being fought.
Walling American firms off from Chinese tokens would raise U.S. productivity costs without bending the trajectory of the global market. A more durable response is a positive one: coordinate with the European Union, Japan, India, and Gulf states on a “trusted token” tier – interoperability standards, provenance verification, and procurement preferences for democratic-aligned inference providers in regulated sectors such as finance, healthcare, defense, and government. This converts the contest from one of price – which the United States will lose – into one of trust and standards, where it remains positioned to win.
Conclusion
The chip war was not a strategic error. When the United States honed in on semiconductors as its strategic advantage, AI was dominated by training, capability was defined by frontier models, and speed was decided by H100 cluster scale. The architects behind the export control policy made the judgment any serious analyst of that period would have made.
The problem now is that the AI landscape has shifted while the American strategic map has not. The unit of competition has shifted from parameters to tokens, the binding constraint changed from FLOPs to watts, and market share is decided by deployment economics rather than frontier capability. Yet the United States is still allocating capital and political will based on how the industry looked in 2022.
This is the deepest strategic lesson the token economy offers: in eras of technological transition, the gravest danger is not the adversary’s advantage but one’s own outdated assumption about where the contest is being fought. In the next decade of AI competition, victory will belong not to the country with the most advanced chips, but to the country that first recognizes how the industry itself is evolving.

