AI训练 - AI Infrastructure Intelligence Search

Google Cloud Other 2026-06-21

Google Trillium TPU: 4.7x Training Boost Masks Vendor Lock-in and Ecosystem Risks

Google Cloud unveils 6th-gen TPU Trillium with 3nm process, delivering 4.7x training and 2.5x inference performance gains, with 2x energy efficiency over NVIDIA H100. However, Trillium is exclusive to Google Cloud TPU v6p instances and deeply integrated into AI Hypercomputer architecture, creating a full-stack lock-in from silicon to networking.

Microsoft Azure Other 2026-06-21

Microsoft Azure Debuts Blackwell Ultra AI Supercomputer, Training-as-a-Service Reshapes Ecosystem

Microsoft Azure launched an AI supercomputer cluster powered by NVIDIA Blackwell Ultra GPUs, delivering over 200 exaflops of AI compute. It introduced AI Training as a Service for on-demand model training and partnered with OpenAI to deploy GPT-6 training clusters by 2027. Liquid cooling achieves a PUE of 1.08, positioning Azure as the premier cloud for trillion-parameter models.

Meta Other 2026-06-17

Meta面临欧盟巨额罚款风险WhatsApp AI功能受阻

...

TSMC Other 2026-06-17

台积电首次公开CoWoS玻璃基板开发计划

...

AMD Other 2026-06-16

AMD Critical RCE Vulnerability Disclosed After 124 Days, Sparks AI Infrastructure Security Crisis

Security researcher mr.bruh publicly disclosed a critical remote code execution (RCE) vulnerability in AMD processors after 124 days without a fix, with AMD refusing a $10,000 bounty. The flaw affects AI servers running AMD EPYC and Instinct, likened to a Log4j moment for AI infrastructure, forcing enterprises to reassess chip-level security response and supply chain risk.

NVIDIA Other 2026-05-16

NVIDIA CUDA Heap Overflow Exposes GPU Cloud Isolation Flaw: Driver-Level Security Must Move to Hardware

At Pwn2Own Berlin 2026, a heap overflow in NVIDIA CUDA Toolkit's NVVM compiler (CVE-2026-12839) enabled GPU cloud cross-tenant escape. The attack chain from malicious PTX to driver compromise to host kernel breaks current driver-level isolation, forcing a fundamental security architecture re-evaluation for shared GPU AI infrastructure.

Amazon Partnership High Signal 2026-04-15

AWS Signs $38B AI Cloud Partnership with OpenAI

OpenAI signs 7-year $38B deal with AWS, deploying thousands of NVIDIA GB200/GB300 GPUs. OpenAI's first major Azure infrastructure diversification.

OpenAI Other High Signal 2026-03-31

OpenAI Secures $122B Funding for Global AI Infrastructure Expansion

OpenAI has raised $122 billion to expand frontier AI capabilities globally, invest in next-generation compute infrastructure, and meet growing demand for ChatGPT, Codex and enterprise AI solutions. This record funding will significantly scale up its AI training clusters and inference infrastructure.

AMD Other Medium Signal 2026-03-19

AMD and Celestica Launch Rack-Scale AI Platform Helios

AMD partners with Celestica to launch Helios rack-scale AI platform, integrating Instinct accelerators and EPYC processors for chip-to-rack optimization. The platform targets AI training and inference workloads with performance and efficiency enhancements for data center and cloud providers.

NVIDIA Other High Signal 2026-03-18

NVIDIA Advances AI Robotics from Simulation to Production

NVIDIA demonstrates a new paradigm for robotics development by unifying simulation and production environments, accelerating industrial automation. The solution integrates AI training frameworks with edge computing architecture, delivering end-to-end development platforms for manufacturing and agriculture.

Hewlett Packard Enterprise Other High Signal 2026-03-16

HPE Alletra MP X10000 Becomes First NVIDIA-Certified Object Storage Platform for Enterprise AI

HPE announces its Alletra Storage MP X10000 is the first object-based platform certified by NVIDIA for enterprise AI. This signifies the extension of AI performance certification standards from the compute layer to the data layer, aiming to address data access bottlenecks in large-scale AI training, fine-tuning, and inference.

Meta Other High Signal 2026-03-11

Meta Accelerates Custom AI Chip Roadmap with Focus on Inference Optimization

Meta plans to launch four generations of MTIA AI chips in two years, adopting an 'inference-first' design strategy optimized for generative AI tasks. Built on PyTorch and open standards, the chips enable seamless data center deployment, targeting improved compute efficiency and cost control.

NVIDIA Other Medium Signal 2026-03-10

NVIDIA Launches RTX PRO Server Virtualization for Game Development AI Infrastructure

NVIDIA introduces RTX PRO Server, a centralized virtualized GPU platform using RTX PRO 6000 GPU and vGPU software. It leverages MIG technology to partition a single GPU into up to 48 user instances, enhancing resource utilization and team collaboration. The solution integrates AI training with graphics workflows for dynamic resource allocation and unified cross-region development.

NVIDIA Other High Signal 2026-03-09

ABB and NVIDIA Integrate Omniverse for High-Fidelity Industrial Robot Simulation

ABB Robotics integrates NVIDIA Omniverse into RobotStudio to launch RobotStudio HyperReality. Achieves 99% simulation accuracy via USD export and virtual controllers, enabling synthetic data for AI training. Reduces deployment costs by 40% and accelerates time-to-market by 50%.

Huawei Other Medium Signal 2026-03-05

Huawei Releases AI-Native Data Center Networking Solution Galaxy AI Fabric 2.0

Huawei launched Galaxy AI Fabric 2.0 data center networking solution with AI-Native architecture for autonomous networking. It includes self-developed Solar 5.0 chip switches, iLossless 3.0 algorithm, and intelligent management platform, supporting 10,000-card AI clusters.

TSMC Other High Signal 2026-03-05

TSMC Advances AI Hardware Innovation with Advanced Process and 3D Packaging

TSMC reveals AI technology research progress, focusing on N3/N2 advanced nodes and 3D Fabric heterogeneous integration. It enhances AI chip performance and efficiency through optimized transistor architecture and packaging, targeting memory bandwidth bottlenecks for cloud-to-edge AI applications.

Huawei Other High Signal 2026-03-04

Huawei Launches AI Data Platform with Compute-Storage Separation

Huawei launched an AI data platform featuring compute-storage separation architecture for efficient data flow. It integrates high-performance file system supporting EB-level data and accelerates AI training data preparation by 30%. Provides unified data management with seamless integration to major AI frameworks and Ascend hardware.

AMD Other Medium Signal 2026-03-02

AMD Launches ROCm AI Developer Hub to Strengthen Software Ecosystem

AMD introduces the ROCm AI Developer Hub, a comprehensive resource platform for AI development, covering guides to performance optimization. It showcases RL training with Slime on AMD GPUs, demonstrating ecosystem capabilities in complex AI scenarios.

AMD Other Medium Signal 2026-02-28

AMD Partners with TCS to Deploy Helios AI Rack Architecture in India

AMD partners with Tata Consultancy Services to introduce the Helios rack-scale AI architecture in India, built on Instinct MI300 accelerators for large-scale AI training and inference workloads. The solution is delivered as complete racks, scalable to thousands of nodes, optimized for generative AI and HPC. The collaboration leverages TCS's integration services in cloud, AI, and cybersecurity for end-to-end AI solutions.