推理 - AI Infrastructure Intelligence Search

NVIDIA Other 1970-01-01

NVIDIA Acquires Groq LPU: Inference Architecture Shift from HBM to On-Chip SRAM

NVIDIA signs ~$20B licensing deal with Groq for LPU tech, featuring 230MB on-chip SRAM at 80TB/s bandwidth. This targets Transformer inference decode, replacing HBM bottlenecks with ultra-low latency on-chip storage, potentially reshaping the AI inference chip landscape.

NVIDIA Other 1970-01-01

NVIDIA Tops Data Center Ethernet Market: GPU Compute Dictates Network Architecture

IDC reports NVIDIA captured 21.5% of the data center Ethernet switch market in Q1 2026, with $2.1B revenue. This milestone, driven by the Spectrum-X platform using RoCE and NVLink, marks a control shift where GPU compute dictates network architecture, directly challenging Cisco and Arista.

Samsung Electronics Other 1970-01-01

SK Hynix HBM4E Samples: 3nm Logic, 384GB/GPU, Igniting AI Memory Bandwidth Arms Race

SK Hynix has sampled its 12-layer HBM4E, featuring TSMC 3nm logic die and enhanced per-pin bandwidth, targeting Nvidia Rubin Ultra with 384GB per GPU. This marks the start of a sprint with Samsung in next-gen AI memory, where HBM BOM share has surged to 65-70%.

NVIDIA Other 1970-01-01

NVIDIA Absorbs Groq LPU: Feynman GPU to Integrate SRAM Inference Tile, Hybrid Architecture by 2028

NVIDIA secures Groq's LPU inference technology via a non-exclusive license and key hires, planning to integrate large SRAM tiles into its 2028 Feynman GPU using TSMC SoIC hybrid bonding. This enables deterministic scheduling and 80TB/s on-chip bandwidth, shifting NVIDIA from a pure GPU vendor to a hybrid inference/training platform.

Research Other 1970-01-01

Z.ai GLM-5.2 Open-Source: 744B MoE, 1M Context, MIT License as Geopolitical Shield

Z.ai releases GLM-5.2: 744B MoE with 40B activated parameters, 1M input and 131K output context, under MIT license. Released one day after Anthropic Fable 5's government takedown, it offers a downloadable, unbanable alternative with Anthropic API compatibility for zero-code migration, giving enterprises a sovereign AI option.

NVIDIA Other 1970-01-01

SGLang 0.5.13: Two-Stage MoE Routing Prefetch & Sparse KV Cache Deliver 25x Inference Speedup

SGLang 0.5.13 introduces MoE-specific two-stage routing prefetch (lightweight proxy network to preload top-k expert weights) and sparse KV cache (grouped by activation path), achieving 25x inference speedup on NVIDIA GB300 NVL72. On A100, throughput +65%, latency -40%, memory -10%, routing overhead -62%, outperforming vLLM.

Google Other 1970-01-01

Google TurboQuant：KV缓存压缩6倍，内存股应声下跌——AI推理效率拐点信号

...

Reports

Filter

NVIDIA Acquires Groq LPU: Inference Architecture Shift from HBM to On-Chip SRAM

NVIDIA Tops Data Center Ethernet Market: GPU Compute Dictates Network Architecture

SK Hynix HBM4E Samples: 3nm Logic, 384GB/GPU, Igniting AI Memory Bandwidth Arms Race

NVIDIA Absorbs Groq LPU: Feynman GPU to Integrate SRAM Inference Tile, Hybrid Architecture by 2028

Z.ai GLM-5.2 Open-Source: 744B MoE, 1M Context, MIT License as Geopolitical Shield

SGLang 0.5.13: Two-Stage MoE Routing Prefetch & Sparse KV Cache Deliver 25x Inference Speedup

Google TurboQuant：KV缓存压缩6倍，内存股应声下跌——AI推理效率拐点信号