推理优化 - AI Infrastructure Intelligence Search

OpenAI Other 2026-06-26

OpenAI and Broadcom Tape Out First Inference ASIC Jalapeño in 9 Months, Targeting NVIDIA Dominance

OpenAI and Broadcom unveil Jalapeño, their first custom inference ASIC, fabricated on TSMC 3nm and optimized for Transformer models. Targeting a 50% inference cost reduction, it taped out in 9 months and is slated for deployment in gigawatt-scale data centers by late 2026, marking OpenAI's strategic pivot to full-stack AI infrastructure and a direct challenge to NVIDIA's inference hegemony.

MediaTek Other 2026-06-23

MediaTek Lands Exclusive Google TPU v9 Inference Upgrade Triggerfish with 2x SRAM

Google plans a TPU v9 inference upgrade, Triggerfish, exclusively fabbed by MediaTek. It features 2-3x on-chip SRAM, HBM4E DRAM, and a simulation die for local management. Production starts late 2027 with 1-2M units lifecycle, unit price ~30% higher than Humufish.

Microsoft Azure Other 2026-06-22

Google unveils 8th-gen TPU: 3x training speed, 3x SRAM for inference, redefines AI compute TCO

At Cloud Next 2026, Google launched 8th-gen TPU with dual variants: TPU 8t for training (9600 per pod, 2PB shared memory) and TPU 8i for inference (1152 per pod, 3x on-chip SRAM). Also announced Gemini Enterprise Agent Platform, N4 Axion ARM instances (2x price-performance vs x86), and AI-driven security with Wiz.

Google Other 2026-06-17

谷歌推出Android 17系统多项AI功能分阶段上线

...

NVIDIA Other 2026-05-25

NVIDIA Vera CPU Threatens x86: 1.5x Performance, 4x Density, Full-Stack AI Lock-In

Rumors indicate NVIDIA will unveil its first general-purpose CPU Vera at Computex 2026, claiming 1.5x x86 performance, 2x throughput, and 4x rack density. Shipment targets: 1.2M units in FY2027, 4.2M in FY2028. Vera targets the AI inference shift from 1:8 to 1:1 CPU/GPU ratio, complementing Grace to create a full GPU+CPU stack.

Meta Other High Signal 2026-03-11

Meta Accelerates Custom AI Chip Roadmap with Focus on Inference Optimization

Meta plans to launch four generations of MTIA AI chips in two years, adopting an 'inference-first' design strategy optimized for generative AI tasks. Built on PyTorch and open standards, the chips enable seamless data center deployment, targeting improved compute efficiency and cost control.

AMD Other Medium Signal 2026-03-02

AMD Launches ROCm AI Developer Hub to Strengthen Software Ecosystem

AMD introduces the ROCm AI Developer Hub, offering centralized software tools and resources for AI model training and inference optimization on AMD GPUs. The platform streamlines development through documentation, tools, and best practices, enhancing efficiency from development to deployment.

Reports

Filter