推理芯片 - AI Infrastructure Intelligence Search

TSMC Other 2026-06-17

台积电首次公开CoWoS玻璃基板开发计划

...

Google Cloud Other 2026-06-15

Google TPU 8th Gen Splits Training and Inference Chips, Inflection Point in AI Infra TCO

Google Cloud unveils 8th-gen TPU with separate training (TPU8t) and inference (TPU8i) chips, delivering 3x training pod performance and 80% inference dollar-performance improvement. Vertex AI evolves into Gemini Enterprise Agent Platform, while the Smals sovereign cloud contract validates public sector AI adoption under strict compliance.

Qualcomm Other 2026-06-14

Qualcomm AI200 on AWS: Inference Chip Ecosystem Shifts from Nvidia Singularity to Multi-Alliance

Qualcomm's AI200 inference chip (768GB memory) is slated for broad AWS deployment by 2026, aiming to reduce cloud AI inference costs. This marks Qualcomm's strategic pivot from mobile to cloud, leveraging AWS's custom silicon initiative to challenge Nvidia's inference monopoly and restructure the cloud inference chip ecosystem.

Microsoft Azure Product Launch 2026-06-03

Microsoft Maia 200 Mass-Produced, Cobalt 200 Previewed: AI Inference Control Shifts to Azure

At Build 2026, Microsoft announced mass production of Maia 200 AI inference chips, preview of Cobalt 200 ARM processors, and the MAI-Thinking-1 reasoning model (35B params). This signals a full-stack vertical integration to reduce NVIDIA dependency and lock Azure AI workloads.

Amazon Other High Signal 2026-02-28

AWS Launches Inferentia2 Chip for Generative AI Infrastructure Optimization

AWS launched second-gen Inferentia2 AI inference chip, designed for Transformer models with 4x performance boost and support for 175B parameter models. Integrated into EC2 Inf2 instances with UltraClusters architecture for large-scale deployment, offering 40% better cost-performance and 50% lower power consumption than GPU instances.

NVIDIA Other 1970-01-01

NVIDIA Acquires Groq LPU: Inference Architecture Shift from HBM to On-Chip SRAM

NVIDIA signs ~$20B licensing deal with Groq for LPU tech, featuring 230MB on-chip SRAM at 80TB/s bandwidth. This targets Transformer inference decode, replacing HBM bottlenecks with ultra-low latency on-chip storage, potentially reshaping the AI inference chip landscape.

Reports

Filter