Reports
AI-generated structured vendor updates
NVIDIA Acquires Groq LPU: Inference Architecture Shift from HBM to On-Chip SRAM
NVIDIA signs ~$20B licensing deal with Groq for LPU tech, featuring 230MB on-chip SRAM at 80TB/s bandwidth. This targets Transformer inference decode, replacing HBM bottlenecks with ultra-low latency on-chip storage, potentially reshaping the AI inference chip landscape.
NVIDIA Tops Data Center Ethernet Market: GPU Compute Dictates Network Architecture
IDC reports NVIDIA captured 21.5% of the data center Ethernet switch market in Q1 2026, with $2.1B revenue. This milestone, driven by the Spectrum-X platform using RoCE and NVLink, marks a control shift where GPU compute dictates network architecture, directly challenging Cisco and Arista.
SK Hynix HBM4E Samples: 3nm Logic, 384GB/GPU, Igniting AI Memory Bandwidth Arms Race
SK Hynix has sampled its 12-layer HBM4E, featuring TSMC 3nm logic die and enhanced per-pin bandwidth, targeting Nvidia Rubin Ultra with 384GB per GPU. This marks the start of a sprint with Samsung in next-gen AI memory, where HBM BOM share has surged to 65-70%.
NVIDIA Absorbs Groq LPU: Feynman GPU to Integrate SRAM Inference Tile, Hybrid Architecture by 2028
NVIDIA secures Groq's LPU inference technology via a non-exclusive license and key hires, planning to integrate large SRAM tiles into its 2028 Feynman GPU using TSMC SoIC hybrid bonding. This enables deterministic scheduling and 80TB/s on-chip bandwidth, shifting NVIDIA from a pure GPU vendor to a hybrid inference/training platform.
Z.ai GLM-5.2 Open-Source: 744B MoE, 1M Context, MIT License as Geopolitical Shield
Z.ai releases GLM-5.2: 744B MoE with 40B activated parameters, 1M input and 131K output context, under MIT license. Released one day after Anthropic Fable 5's government takedown, it offers a downloadable, unbanable alternative with Anthropic API compatibility for zero-code migration, giving enterprises a sovereign AI option.
SGLang 0.5.13: Two-Stage MoE Routing Prefetch & Sparse KV Cache Deliver 25x Inference Speedup
SGLang 0.5.13 introduces MoE-specific two-stage routing prefetch (lightweight proxy network to preload top-k expert weights) and sparse KV cache (grouped by activation path), achieving 25x inference speedup on NVIDIA GB300 NVL72. On A100, throughput +65%, latency -40%, memory -10%, routing overhead -62%, outperforming vLLM.
Google TurboQuant:KV缓存压缩6倍,内存股应声下跌——AI推理效率拐点信号
...