Why is this NVIDIA update important for enterprises?

Huang's speech is a **control plane shift** manifesto: moving AI compute control from x86 CPUs (Intel/AMD) to NVIDIA's **Vera CPU** and **CUDA** ecosystem. - **Who is being encircled?** Intel and AMD's server CPU business is directly attacked, while open-standard challengers (AMD with HIP, Intel with oneAPI) are also boxed in. By claiming 'CPU for AI', NVIDIA aims to make enterprises believe only Vera delivers agent-required ultra-low latency, **locking CPU procurement**. - **What assets are locked?** The CUDA ecosystem is the chain. Once on Vera+Rubin, users must use the full NVIDIA software stack (CUDA X, NVIDIA AI Enterprise), preventing migration. Vera CPU likely uses proprietary interconnects (e.g., NVLink-C2C), further binding the data center. - **What physical limitations are hidden?** The 30x token throughput claim likely uses specific models (e.g., GPT-3 175B) with FP8/INT4 quantization vs. Hopper or competitors at same precision. Real-world **tail latency**, **power density** (Blackwell TDP 1000W+), and **liquid cooling costs** are downplayed. Vera CPU's ecosystem immaturity (OS, compilers, libraries) imposes huge **software migration costs** and **supply chain lock-in**.

What is the impact level of this intelligence?

This intelligence is assessed as having Major impact on enterprise technology decisions.

NVIDIA 2026-06-25

Vendor Strategy Impact: Major Conf: 85%

NVIDIA Unveils Vera CPU for AI Agents, Shifting Control from x86 to Proprietary Silicon

Summary

At the annual meeting, Huang announced Vera CPU for AI agents paired with Rubin GPU, claimed Blackwell delivers 30x token throughput over next-best platform, and reiterated CUDA as a moat. This move aims to shift AI compute control from general-purpose CPUs to NVIDIA's proprietary architecture.

Key Takeaways

Huang's keynote at NVIDIA's FY2026 annual meeting reveals three layers:
First, Blackwell is positioned as the 'king of inference', claiming 30x token throughput over the next-best platform, though test conditions and comparison targets (likely AMD MI300X or Intel Gaudi) are undisclosed. This performance claim, if verified, could accelerate AI inference deployments but lacks independent benchmarking.
Second, the Vera Rubin platform is touted as 'the most important product launch', where Vera CPU is purpose-built for AI agents with ultra-low latency, while Rubin GPU handles reasoning. Huang stated 'all previous CPUs were designed for humans', implicitly deeming traditional x86 CPUs (Intel/AMD) inadequate for agent workloads, thus creating a new CPU market.
Third, the CUDA X library ecosystem is called the 'crown jewel', supporting 7000+ apps, serving as an insurmountable moat against competitors. This reinforces NVIDIA's strategy of locking users into its hardware via software.
Additionally, H200 China export license was granted but no revenue yet, and physical AI is the next growth phase.

Why It Matters

Huang's speech is a control plane shift manifesto: moving AI compute control from x86 CPUs (Intel/AMD) to NVIDIA's Vera CPU and CUDA ecosystem.

Who is being encircled? Intel and AMD's server CPU business is directly attacked, while open-standard challengers (AMD with HIP, Intel with oneAPI) are also boxed in. By claiming 'CPU for AI', NVIDIA aims to make enterprises believe only Vera delivers agent-required ultra-low latency, locking CPU procurement.
What assets are locked? The CUDA ecosystem is the chain. Once on Vera+Rubin, users must use the full NVIDIA software stack (CUDA X, NVIDIA AI Enterprise), preventing migration. Vera CPU likely uses proprietary interconnects (e.g., NVLink-C2C), further binding the data center.
What physical limitations are hidden? The 30x token throughput claim likely uses specific models (e.g., GPT-3 175B) with FP8/INT4 quantization vs. Hopper or competitors at same precision. Real-world tail latency, power density (Blackwell TDP 1000W+), and liquid cooling costs are downplayed. Vera CPU's ecosystem immaturity (OS, compilers, libraries) imposes huge software migration costs and supply chain lock-in.

PRO Decision

【Vendors (Competitors)】 AMD and Intel should quickly launch low-latency CPU+GPU combo solutions for AI agents, e.g., AMD's MI400 with Zen 5 via Infinity Fabric for unified memory, and open ROCm with CUDA migration tools. Jointly push OAM standards in OCP to break NVIDIA's interconnect monopoly.
【Enterprises (CIOs/Architects)】 Immediately run independent benchmarks of Blackwell's 30x claim, demanding exact test config (model, precision, batch size), and compare tail latency with AMD MI300X and Intel Gaudi 3. Before adopting Vera CPU, assess software porting costs: if CUDA code relies on cuDNN/TensorRT, migration may take months. Reserve at least 20% of heterogeneous compute budget for non-NVIDIA platforms to maintain bargaining power and supply chain flexibility.
【Investors】 Beware that the 'AI factory' narrative may inflate stock price, but gross margins could suffer from Blackwell's high-power liquid cooling costs. Monitor Vera CPU adoption: if enterprises delay due to lock-in fears, growth may slow. Watch AMD/Intel's AI CPU roadmaps (e.g., Intel Granite Rapids with AMX) that may win inference at lower TCO.

Source: 每日经济新闻

View Original →

Get 3-5 key AI infrastructure signals weekly →

Summary

Key Takeaways

Why It Matters

PRO Decision

💬 Comments (0)