Analysis for informational purposes only. Capital at risk.
The Long Tail Wins: By delivering 80–90% of Western flagship capabilities at just 20% of the cost, Chinese AI models are commoditizing the API spot market. On leading aggregator platforms, Chinese models surged from a 10% token volume share in January to 36% by late April 2026, capturing the price-sensitive compute of developers, SMEs, and startups.
The Enterprise Walls: Despite spot market success, geopolitical and compliance friction prevent Chinese LLMs from penetrating Western blue-chip enterprises. Additionally, Chinese providers currently lack the “harness” depth—the sophisticated workflow orchestration, safety guardrails, and tool integration—required to rival products like Gemini and Claude Code.
The Energy Edge: As AI workloads shift from GPU-heavy training toward energy-intensive inference, power supply is becoming the new compute bottleneck. China added roughly 8x the generation capacity of the US in 2025, providing a structural scale advantage as inference demand accelerates.
The AI Market: Three Tiers, One Battleground
The global AI market divides into three distribution tiers.
Direct enterprise APIs and cloud wrappers, such as Azure, AWS, and Google, serve large organisations.
Product subscriptions such as ChatGPT Plus, Claude Code, and Gemini NotebookLM create loyalty through integration of models and harness: the interface, workflows, and integrations layered around the underlying model.
Undoubtedly, US providers hold commanding positions in both tiers.
The aggregator tier is structurally different.
Platforms like OpenRouter allow developers to route queries dynamically across hundreds of models in real time. No contracts. No lock-in.
This segment represents less than estimated 10% of global token volume. However, as it is the only segment operating as a genuine spot market, it is where competitive pressure surfaces first.
The Spot Market: 10% to 36% Share in Four Months
According to OpenRouter, a leading aggregator, major Chinese LLMs such as DeepSeek, Alibaba (Qwen), Moonshot (Kimi), MiniMax, etc, collectively captured a 36% market share of token volume in late April 2026, up from roughly 10% in early January.
Conversely, the combined share of major US hyperscalers dropped from 70% to 40% over the same period.


Weekly token volume on OpenRouter has surged more than 3x in three months. Developers are running production workloads through Chinese models at scale.

The mechanism is straightforward. Aggregators attract cost-sensitive developers and SMEs who use dynamic routing: simple, routine tasks go to the cheapest capable model; complex queries are reserved for premium US models.
Chinese LLMs have captured the bulk of that routine compute by delivering 80–90% of Western flagship performance at approximately 20% of the cost.


The EV Playbook — and Its Second Chapter
This pricing strategy has a clear precedent.
Chinese electric vehicle manufacturers did not enter global markets by competing with BMW at the premium end. They commoditised good-enough transport: adequate range, functional design, accessible price. Volume followed. Then came the upmarket move — BYD now competes directly with European peers on performance, not just price.
Chinese AI is running the same sequence. Phase one is commoditisation of routine inference: capture the long tail of developer workloads that do not require frontier model capability. That phase is underway.
Phase two, moving upmarket into enterprise and application layers, is where the constraints currently exist.
Geopolitics and the Harness Barriers
Chinese LLMs face two barriers above the aggregator tier in the international market.
Compliance Wall: Western large enterprises hesitate to deploy Chinese LLMs due to compliance and data residency concerns. Chinese LLMs mitigate these concerns by making the model freely downloadable and allowing Western corporations to host it behind their firewalls. In addition, enterprises in emerging markets such as Middle East, Southeast Asia could potentially face less friction compared with Europe and the US.
The Harness Gap: Success at the application and subscription tier depends not on model quality alone but on the surrounding architecture: the UI, safety guardrails, tool integrations, agent orchestration, and memory systems layered around the model.
For example, Claude Code outperforms a standard OpenClaw setup, even if OpenClaw uses the same underlying Claude model, due to the sophistication of Anthropic’s harness.
The same logic applies to NotebookLM, GitHub Copilot, and ChatGPT’s ecosystem. Western providers have years of harness investment embedded in products that Chinese developers have not yet matched.
QClaw is Tencent’s attempt to narrow that gap. Built on the OpenClaw framework, it adds enterprise security features, one-click installation, and pre-configured agent workflows for marketing automation, trip planning, and tax filing. Tencent is currently piloting it for overseas deployment. It is not yet near Claude Code or NotebookLM. It signals that the upmarket move is beginning.
The Blueprint Leak: Anthropic’s Harness Moat Got Mapped
In late March 2026, Anthropic accidentally published the entire 512,000-line source code of Claude Code. The exposure included proprietary multi-agent orchestration protocols, memory consolidation architecture, and tool-calling subsystems, key success factors of its flagship product.
Competitors can now study the design behind the most commercially successful agentic harness in the market, potentially levelling the playing field.
The Energy Edge
Previously, the AI supply constraint was related to training: GPU-heavy model development consumes massive compute. That constraint disadvantages Chinese developers through export controls on advanced Nvidia chips.
The workload is now shifting. As autonomous agent deployments scale, inference dominates total AI compute. Inference is a different problem, relying more on memory bandwidth, network latency, and power supply. That shift moves the battleground toward AI infrastructure capacity.

China’s power infrastructure is built for this. In 2025, China added an estimated 540 GW of new generation capacity, approximately 8.5 times total US additions. That surplus translates directly into continuous, low-cost power for inference data centres at scale.
US and European hyperscalers face the opposite dynamic: protracted grid interconnection queues, transformer shortages, and planning constraints that create meaningful lag between capital commitment and operational capacity.
This article is a “periodical publication” for information only and is not investment advice or a solicitation to buy or sell securities. This article does not constitute a “personal recommendation” or “investment advice” under UK FCA regulations. Investing in equities involves significant risk. The author holds NO position in the securities mentioned. There is no warranty as to completeness or correctness. Please do your own due diligence or consult a licensed financial adviser. Please read the Full Disclaimer before acting on any information. Images created with the assistance of Gemini AI.
