Reports
AI-generated structured vendor updates
Huawei Pushes Token-Based Billing at MWC Shanghai 2026: Shifting Carrier Monetization from Bytes to AI Inference Value
At MWC Shanghai 2026, Huawei urged carriers to shift from byte-based to token-based billing for AI workloads, showcasing a 372% token throughput improvement in long-sequence inference via its AI Inference Acceleration Solution. It also highlighted the Upper-6 GHz band as critical for AI wearables requiring 20 Mbps uplink, aiming to reposition 5G-A networks as AI compute delivery infrastructure.
Huawei Unveils AI-Centric Network with Token Monetization, UCM Caching Breaks Long-Context Barriers
At MWC Shanghai 2026, Huawei unveiled an AI-native network architecture integrating service, network, and compute, shifting from traffic-centric to intelligence-centric operations. The Unified Cache Manager (UCM) extends KV cache to petabyte-scale external storage, achieving 372% token throughput gains on GLM-5.1 at 128K sequence lengths. Token monetization frameworks and agentic operations enable carriers to charge for AI inference capacity and personalize services.
Qualcomm Dragonfly: 250-core CPU, HBC memory, UALink interconnects target AI inference TCO
Qualcomm unveils full data center portfolio: Dragonfly C1000 250-core Oryon CPU (>5GHz, PCIe Gen7, CXL), HBC near-memory compute (133TB/s Gen1, 18x-54x effective BW), AI300 inference accelerator (UALink/ESUN scale-up), and 800G/1.6T connectivity. Multi-year Meta CPU deal. Commercial sampling 2027-2028. Targets inference TCO with tokens-per-watt leadership.
NVIDIA and AWS Default GPU Vector Search with cuVS, G7 Instances Deliver 4.6x Inference
NVIDIA and AWS collaborate to embed cuVS as default GPU-accelerated vector search in OpenSearch Serverless, delivering 10x faster indexing at 1/4 cost. New EC2 G7 instances with RTX PRO 4500 Blackwell GPUs achieve up to 4.6x inference performance. AWS achieves GB300 Exemplar Cloud status for training.
NVIDIA Unveils 45°C Liquid Cooling for Rubin Chips, Slashes Water Use 100%
NVIDIA announces a liquid cooling system for its Rubin GPUs running 45°C coolant (hotter than a hot tub), using dry coolers in a closed loop to cut electricity and eliminate water evaporation (100% reduction). However, chillers may still be needed in hot climates, and chip longevity impacts remain unaddressed.
Micron-Anthropic Deal Locks AI Memory Demand, But Stock Price Already Priced In
Micron signed a long-term supply contract with Anthropic covering HBM, DRAM, and SSDs, with joint analysis of memory subsystems for AI workloads. Micron also participated in Anthropic's Series H. This aims to transform memory from a commodity to an AI infrastructure asset, but the stock has already run up, requiring proof of sustained scarcity premium.
AMD MI430X GPU Delivers >200 TFLOPS Native FP64, Reshaping HPC-AI Convergence Baseline
AMD powers 4 of top 10 TOP500 supercomputers and previews MI430X GPU with >200 TFLOPS native FP64. This targets AI-for-science workloads, making double-precision compute a key metric for converged HPC-AI infrastructure, directly challenging NVIDIA and Intel.
NVIDIA's AI Agents and Digital Twins Reshape Telecom Network Control Plane
At DTW Ignite 2026, NVIDIA showcases its AI agent platform integrating NeMo synthetic data, NemoClaw secure runtime, OpenShell sandbox, and RTX PRO 6000-accelerated digital twins, aiming for autonomous telecom operations. Partners include SoftBank, Amdocs, NTT DATA, etc., moving from task automation to full autonomy.
AWS Lambda MicroVMs: Stateful Isolated Sandboxes via Firecracker Snapshots
AWS launches Lambda MicroVMs, leveraging Firecracker for VM-level isolation, near-instant launch/resume, and stateful execution. Users build images from Dockerfiles in S3, launch from pre-initialized snapshots, and suspend/resume automatically, enabling multi-tenant AI code sandboxes and interactive analytics.
Arm servers capture >45% data center revenue, x86 ecosystem under AI-driven assault
IDC reports Q1 2026 global server revenue hit a record $122.6B, with Arm-based servers capturing >45% share (x86 at 52%). Accelerated servers (GPU/ASIC/FPGA) generated >70% revenue. Nvidia's Grace CPU (NVL72) and hyperscaler custom Arm chips drive the shift; x86 still leads in unit volume but faces supply constraints.
Nvidia Vera Rubin CPU: 10-Wide Core Redefines CPU for Agentic Computing
At GTC Taipei 2026, Nvidia unveiled the Vera Rubin CPU with a custom 10-wide fetch/decode/execute pipeline, claiming world-leading IPC and bandwidth. Designed for agentic computing, it complements Nvidia GPUs. Nvidia also announced a partnership with Microsoft to reinvent the PC as a Personal AI and committed to returning 50% of free cash flow to shareholders.
Dell PowerEdge XE8812: Liquid-Cooled Density Trap with NVIDIA Vera Rubin NVL4
Dell launches PowerEdge XE8812 with NVIDIA Vera Rubin NVL4, delivering 144 GPUs per rack, 300kW+ power, and 100% direct liquid cooling. It offers a generational leap in memory and compute density for HPC and AI, but deeply locks users into Dell's PowerRack, iDRAC, and ORv3 ecosystem from chip to rack.
NVIDIA Rubin 100% Liquid Cooling at 45°C Slashes Cooling Energy 40%
NVIDIA Rubin generation achieves 100% liquid cooling with coolant up to 45°C, eliminating fans and cold aisles. The DSX reference design uses closed-loop dry coolers, reducing cooling energy ~40% and water consumption to near zero. Rack density triples, marking a fundamental shift in AI factory cooling.
AWS Seizes Agent Control Plane with MCP Gateway and AgentCore
AWS launches managed web search for Bedrock AgentCore, autonomous agents in Amazon Quick, subagent MicroVM orchestration with LangChain, and MCP Gateway, shifting enterprise AI agents from prototypes to governed infrastructure with cloud-native control planes and execution isolation.
Cisco Leverages NVIDIA Spectrum Silicon and Nexus One to Reshape AI Network Control Plane
Cisco launches N9100 switches with NVIDIA Spectrum-6/4 silicon, delivering 102.4T throughput. It also introduces Nexus One unified management plane spanning NX-OS and SONiC, and extends Hybrid Mesh Firewall to BlueField DPUs for AI workload security offload, aiming for a turnkey AI fabric control plane.
Google AI Studio Starter Tier: Pre-wired Serverless Stack Trades Control for Zero-Friction Deployment
Google introduces Starter Tier for AI Studio, a pre-wired stack of Cloud Run, Firestore, Cloud SQL for PostgreSQL, and Firebase Authentication, deployable without a payment method. It locks users to a single region, limited APIs, and shared quotas, but offers zero-downtime upgrade to full GCP, aiming to lower AI deployment barriers while deepening ecosystem lock-in.
AMD MEXT Acquisition Turns NAND Flash into DRAM-Class Memory, Halving AI Inference Cost
AMD acquires MEXT, whose technology makes cheap NAND flash behave like expensive DRAM, doubling to quadrupling usable memory capacity while halving costs. This targets inference and agentic AI memory bottlenecks. AMD also signs a 30MW AI compute deployment deal with Rackspace, rolling out from 2026 to 2028.
Tesco's £100M Lawsuit Exposes VMware Lock-In, Accelerates Enterprise Virtualization Exodus
Tesco sues Broadcom over a 237% price hike after VMware's perpetual license termination, covering ~40,000 workloads. The case undermines enterprise trust in software licensing and may trigger a mass migration to Nutanix, Red Hat OpenShift Virtualization, and Proxmox, reshaping the virtualization ecosystem.
NVIDIA & Coherent Expand 6-Inch InP Fab, Locking AI Optical Interconnect Supply Chain
Coherent breaks ground on the world's first 6-inch indium phosphide fab in Texas, backed by $2B from NVIDIA and multi-billion purchase commitments. The facility produces lasers, transceivers, and pluggable optics for silicon photonics interconnects, enabling NVIDIA's Vera Rubin Ultra NVL576 576-GPU clusters and signaling a mass shift from copper to optical backbones in AI data centers.
Huawei's LogicFolding: 3D Stacking Rewrites AI Chip Rules
Huawei's Tau Scaling Law and LogicFolding architecture boost transistor density by 55% and power efficiency by 41% via vertical logic stacking, targeting 1.4nm-class by 2031. Ascend 920/910C chips are now used for DeepSeek V4-Pro post-training, signaling real-world AI workload deployment and challenging Nvidia's dominance in China.