Reports
AI-generated structured vendor updates
Intel at Computex 2026: CPU as Agentic AI Orchestrator, x86 Reclaims Inference Control
At Computex 2026, Intel unveiled the 288-core Xeon 6+ (Intel 18A) and 3rd-gen Core Ultra, claiming Agentic AI shifts CPU:GPU ratio from 1:8 to 1:1. Partnering with SambaNova and Foxconn for rack-scale inference systems, Intel repositions the CPU as the orchestrator for multi-step AI reasoning, aiming to reclaim control from GPU-centric architectures.
Qualcomm Dragonfly: 250-core CPU, HBC memory, UALink interconnects target AI inference TCO
Qualcomm unveils full data center portfolio: Dragonfly C1000 250-core Oryon CPU (>5GHz, PCIe Gen7, CXL), HBC near-memory compute (133TB/s Gen1, 18x-54x effective BW), AI300 inference accelerator (UALink/ESUN scale-up), and 800G/1.6T connectivity. Multi-year Meta CPU deal. Commercial sampling 2027-2028. Targets inference TCO with tokens-per-watt leadership.
Cisco Launches AI Troubleshooting Agent for Industrial Networks, Shifting Control Plane
Cisco launches AI Troubleshooting for Industrial Networks, an ambient agent on Cisco Cloud Control. It monitors switch syslogs, uses deterministic logic to diagnose physical and network faults, and provides OT technicians with actionable fix steps, aiming to reduce MTTD and MTTR by minimizing escalations to network experts.
OpenAI and Broadcom Unveil Jalapeno Inference ASIC, Reshaping AI Hardware Landscape
OpenAI, in collaboration with Broadcom, has developed Jalapeno, a custom LLM inference accelerator. The chip uses a multi-chip module with HBM3E memory and achieved tape-out in just nine months. Designed for OpenAI's model stack, it aims to reduce inference costs and dependency on NVIDIA GPUs, with initial deployment planned for late 2026.
TSMC Hikes Advanced Node Prices 5-10%, Squeezing AI Chip Margins
TSMC informs clients of 5-10% price hikes across all advanced nodes (7nm+), affecting 74% of wafer revenue. Apple, Nvidia, AMD, and others face higher costs, potentially raising AI infrastructure prices.
SK hynix Files for $29B Nasdaq IPO to Fund AI Memory Fabs and EUV Tools
SK hynix files for a $29.4B ADR listing on Nasdaq, with proceeds earmarked for its Yongin fab, Cheongju HBM packaging plant, and ASML EUV scanners. New capacity won't arrive until 2027, but the move solidifies its 57% HBM market share and locks in critical EUV supply.
Huawei and Hubei Mobile Validate AI Inference Acceleration: External KV Cache Boosts Throughput 372%
Huawei and Hubei Mobile completed the first operator AI inference acceleration trial, using OceanStor A800 storage and Ascend A3 supernode with UCM to externalize KV Cache to PB-level storage, achieving up to 372% TPS improvement for long-context inference on GLM-5.1 and MiniMax M2.5 models.
Microsoft Embeds AI Agents into Cloud Ops: Azure Copilot and AKS on Bare Metal Reshape Control Plane
At Build 2026, Microsoft announced GA of Azure Copilot Observability Agent, alongside AKS on Bare Metal, Managed System Node Pools, and Fleet Manager, integrating observability, orchestration, and capacity planning into an agent-driven closed-loop system, backed by 2GW dedicated energy for AI scaling.
NVIDIA and AWS Default GPU Vector Search with cuVS, G7 Instances Deliver 4.6x Inference
NVIDIA and AWS collaborate to embed cuVS as default GPU-accelerated vector search in OpenSearch Serverless, delivering 10x faster indexing at 1/4 cost. New EC2 G7 instances with RTX PRO 4500 Blackwell GPUs achieve up to 4.6x inference performance. AWS achieves GB300 Exemplar Cloud status for training.
China's LineShine Tops TOP500: CPU-Only 2.2 ExaFLOPS with ARMv9 and HBM Memory
LineShine supercomputer achieves 2.198 ExaFLOPS FP64 sustained using 13.79 million ARMv9 cores across 20,480 nodes, making it the first system to exceed 2 ExaFLOPS without GPUs. Each node has dual LX2 CPUs (304 cores) with 32GB HBM, demonstrating a CPU+HBM architecture breakthrough for HPC.
Microsoft Launches Azure Copilot Observability Agent to Lock Ops Control Plane
Microsoft announces GA of Azure Copilot Observability Agent, built on Azure Monitor. It correlates signals across agents, apps, infrastructure, and services to provide unified operational context. This move aims to lock AI-driven incident diagnosis and remediation workflows deeply within the Azure ecosystem.
NVIDIA Unveils 45°C Liquid Cooling for Rubin Chips, Slashes Water Use 100%
NVIDIA announces a liquid cooling system for its Rubin GPUs running 45°C coolant (hotter than a hot tub), using dry coolers in a closed loop to cut electricity and eliminate water evaporation (100% reduction). However, chillers may still be needed in hot climates, and chip longevity impacts remain unaddressed.
NVIDIA Launches Agent Toolkit: Nemotron Models, OpenShell Runtime for Specialized AI Agents
NVIDIA unveils Agent Toolkit, an open modular foundation with Nemotron models, NemoClaw blueprints, and OpenShell runtime, enabling enterprises to build secure, specialized AI agents. It targets life sciences, cybersecurity, and industrial workflows, aiming to turn frontier models into domain-specific digital coworkers.
NVIDIA Vera Rubin NVL4: CPU-GPU Fusion Locks Supercomputing Architecture
NVIDIA announces the Vera Rubin NVL4 supercomputing platform, integrating the Rubin GPU and Vera CPU via NVLink and InfiniBand for end-to-end acceleration, delivering over 7 exaflops of AI compute. The ARM-based Vera CPU marks a strategic deepening in data center CPUs, with availability expected in Q4 2026.
Arm Server Share Hits 45%: NVIDIA's Bundling Strategy Reshapes AI Infrastructure
IDC data shows Arm-based servers now hold over 45% of the global server market, driven by NVIDIA's bundling of its Arm-based Vera CPU with GPU systems like NVL72 and Rubin. x86 share shrinks to 52%, while accelerated systems contribute over 70% of revenue. ODM direct sales account for 50.2%, with Dell revenue growing 244.1% YoY.
TSMC Bets on CoPoS and Glass Substrates: Packaging Paradigm Shifts from Wafer-Level to Panel-Level, AI Chip TCO Inflection
TSMC is replacing CoWoS with CoPoS (panel-level packaging), using 750x620mm square panels and glass core substrates, achieving 20-30% unit area cost reduction. Volume production targets 2028, with AMD Zen 7 as first key customer. This fundamentally alters AI chip packaging economics and capacity scaling.
MediaTek Lands Exclusive Google TPU v9 Inference Upgrade Triggerfish with 2x SRAM
Google plans a TPU v9 inference upgrade, Triggerfish, exclusively fabbed by MediaTek. It features 2-3x on-chip SRAM, HBM4E DRAM, and a simulation die for local management. Production starts late 2027 with 1-2M units lifecycle, unit price ~30% higher than Humufish.
Google TPU v9 Switches to MediaTek, Breaking Broadcom's AI ASIC Monopoly
Google moves its TPU v9 Humufish design and integration contract from Broadcom to MediaTek, which handles I/O chip design and packaging. Combined with a split-foundry strategy (TSMC N2 compute, Samsung 2nm I/O), this marks a systematic effort to build a multi-vendor, multi-node supply chain, directly dismantling Broadcom's dominance in custom AI ASICs.
OpenAI GPT-5.6: 1.5M Context Window, Digital Employee Push, Price War on Anthropic
OpenAI is launching GPT-5.6 with a 1.5M token context window, 10-15% token efficiency improvement, and pricing at 1/3 of Claude Fable 5. The model pivots to digital employee roles via agentic workflows, code generation, and Playwright automation, directly targeting Anthropic's stalled Fable 5 user base.
Intel AI Box Ultra Hits the Road: PC-class Compute Enters Car, Locks Down Edge AI Ecosystem
Intel and Changan Auto launch the AI Box Ultra solution based on the Core Ultra platform, bringing PC-class compute and Android app ecosystem to the cockpit. It emphasizes on-device AI inference, privacy, and offline capability. The move targets Qualcomm and NVIDIA but hides X86 power/thermal drawbacks.