算力 - AI Infrastructure Intelligence Search

Anthropic Other 2026-07-12

Anthropic Locks 3.5GW TPU Compute with Broadcom, Signaling Shift to Custom AI ASICs

Broadcom's Q2 FY2026 filing reveals a 3.5GW TPU compute deal with Anthropic starting 2027. This marks a strategic shift from general-purpose GPUs to custom ASICs for AI workloads, with OpenAI and Meta making similar multi-GW commitments, signaling a fundamental change in AI infrastructure.

Apple Other 2026-07-10

PrismML's 1-bit Compression: 27B Qwen Model Runs Fully on iPhone 17 Pro in 4GB

PrismML compressed a 27B-parameter dense LLM (Qwen 3.6) to 4GB, running fully on iPhone 17 Pro. Using native 1-bit quantization (weights as {-1, +1}), it achieves >92% compression, 8x faster inference, and 75-80% energy reduction. This challenges Apple's sparse architecture, potentially shifting edge AI from cloud-reliant to device-native.

Huawei Other 2026-07-10

Huawei Ascend 10K-Card Cluster Goes Live, UnifiedBus Protocol Pools All Resources

Huawei launched an Ascend 10,000-card AI cluster in Shaoguan, Guangdong, and showcased the Atlas 950 SuperPoD with its proprietary UnifiedBus interconnect supporting 8,192 NPUs at 16.3 PB/s. Huawei Cloud also entered the Gartner 2026 Cloud AI Infrastructure Leaders quadrant, reinforcing its push for a self-contained AI ecosystem.

Samsung Electronics Other 2026-07-10

Samsung GAIA AI PC Chip Samples with Memory-Centric NPU, Targeting 50 TOPS

Samsung launches GAIA AI PC processor with 4nm process and memory-centric NPU, integrating LPDDR5X controller with NPU for near-memory computing, achieving 40% energy efficiency improvement and 50 TOPS. Certified for Microsoft Copilot+ PC, Lenovo to adopt in Q4 2026.

Amazon Other 2026-07-10

AWS Sells Trainium 3 Externally, Challenging NVIDIA's AI Training Chip Dominance

AWS begins external sales of its Trainium 3 AI training chip, fabricated on TSMC 3nm process, delivering 2.52 PFLOPS per chip. Early customers include Anthropic and Uber. This move directly challenges NVIDIA's dominance and marks AWS's strategic shift from cloud provider to chip vendor.

NVIDIA Other 2026-07-08

NVIDIA Rigel Core: Single-Threaded CPU as the New Control Plane for Agentic AI

NVIDIA unveils Rosa CPU architecture with custom Rigel core (Arm v9.2), targeting single-threaded performance for Agentic AI workloads, paired with Feynman GPU (1.6nm, 50 PFLOPS) in 2028. This shifts CPU design from core-count scaling to serial-latency optimization, directly challenging AMD EPYC and Intel Xeon dominance.

Anthropic Other 2026-07-07

Anthropic企业AI采用首超OpenAI 300亿年化收入运行率确认

...

MediaTek Other 2026-07-07

MediaTek and Alibaba Cloud Deploy Tongyi Qianwen LLM on Dimensity Chips

MediaTek partners with Alibaba Cloud to deploy a small version of the Tongyi Qianwen LLM on Dimensity 9300/8300 mobile platforms, enabling offline multi-turn conversations. This move aims to capture edge AI inference control via NPU optimization and SDK integration, directly challenging Qualcomm.

Meta Other 2026-07-07

Meta Cuts 1,395 Reality Labs Jobs, Pivots to AI Cloud to Challenge AWS and Azure

Meta plans to lay off 1,395 employees in July 2026, primarily from Reality Labs, while raising capex to $125-145B to focus on AI infrastructure. It is building a cloud business to sell AI compute externally, signaling a strategic pivot from AR/VR to AI cloud services.

NVIDIA Other 2026-07-07

NVIDIA Denies Kyber NVL144 Delay, But 78-Layer PCB Bottleneck Exposes AI Hardware Physics Limit

NVIDIA officially denies reports of Kyber NVL144 rack delay to 2028, but SemiAnalysis revelations about a 78-layer ultra-high-density PCB midplane bottleneck and Rubin Ultra cancellation expose hard physical limits in signal integrity and manufacturing, opening a strategic window for AMD and Google.

Amazon Other 2026-07-06

AWS boosts Trainium 3 shipments, accelerating ASIC substitution for NVIDIA GPUs

Supply chain sources indicate Amazon AWS has instructed vendors to increase Trainium 3 shipments for Q3 2026 by 20-30%. This signals strong confidence in its custom ASIC strategy to reduce dependence on NVIDIA GPUs, leveraging superior cost and power efficiency for cloud AI training.

Anthropic Other 2026-07-06

Anthropic Starts Custom AI Chip Development, Talks Samsung 2nm, Aims for Compute Independence

Anthropic has initiated its own AI chip development and is in talks with Samsung for 2nm foundry services. The move aims to reduce reliance on NVIDIA GPUs, optimize inference costs, and strengthen its technology moat ahead of a potential IPO. It joins OpenAI, Google, and others in the custom ASIC race, signaling a shift from software to hardware competition.

AMD Other 2026-07-06

AMD Unveils Zen 6/7 CPU and MI400/500 GPU Roadmap, Targets NVIDIA Rubin with HBM4 and 2nm

AMD unveiled its Zen 6/7 CPU and MI400/500 GPU roadmap at its 2026 Financial Analyst Day, featuring TSMC 2nm process and HBM4 memory. The MI400 series boasts 432GB memory, 19.6TB/s bandwidth, and 40 PFLOPs FP4 performance, directly targeting NVIDIA's Vera Rubin architecture with an annual cadence to disrupt the AI hardware monopoly.

Amazon Other 2026-07-06

AWS Trainium 3 Shipments Surge 20-30%, Shifting AI Compute Control from NVIDIA to Custom Silicon

Supply chain sources indicate AWS has raised Q3 Trainium 3 server shipments by 20-30%, driven by Anthropic. Trainium 2 is sold out, Trainium 3 nearly fully booked, with customers already queuing for Trainium 4 and development of Trainium 5 underway. This signals AWS's aggressive push to own the AI compute stack via custom silicon.

Anthropic Other 2026-07-06

Anthropic's $15B Australia Bet: AI Infra Shifts to Energy Arbitrage

Anthropic plans to invest $15B to secure 1.4GW of data center capacity in Australia, aiming to activate 1GW by next year. This move bypasses US grid bottlenecks from local opposition and litigation, building a hybrid model of self-build, partnerships, and cloud leasing. It signals a shift in AI infra deployment toward energy and regulatory arbitrage.

Anthropic Other 2026-07-05

Anthropic Launches Custom AI Chip: Vertical Integration to Control Inference Cost and Supply

Anthropic launched Claude Sonnet 5 and revealed a custom AI chip initiative, using Samsung foundry. This move aims to reduce dependency on NVIDIA, control long-term inference costs, and marks Anthropic's shift from a pure software company to a vertically integrated infrastructure firm.

OpenAI Other 2026-07-05

OpenAI Winds Down Fine-Tuning API: A Strategic Shift in AI Customization Landscape

OpenAI plans to phase out its fine-tuning API by 2027, stopping new task creation but allowing inference on existing models. This forces startups relying on fine-tuning for differentiation to migrate to open-source models or RAG, reshaping the AI customization ecosystem.

Meta Other 2026-07-03

Meta Admits AI Agent Stagnation, Plans to Sell Compute to Challenge Cloud Triopoly

Meta CEO Zuckerberg admits AI agent development is behind schedule, pushing ROI timeline to 3-6 months. Concurrently, Meta plans to sell AI compute and model access externally, directly challenging AWS, Azure, and GCP's cloud oligopoly, signaling a pivot from internal AI infrastructure to a commercial cloud provider.

OpenAI Other 2026-07-03

OpenAI Slashes Inference Costs 50%, Runs ChatGPT on Hundreds of GPUs via System-Level Optimization

OpenAI reduces AI inference costs by over 50% through system-level optimizations: model quantization (FP16 to INT4/INT8), KV-Cache optimization, dynamic batching, and speculative decoding. Using only hundreds of NVIDIA GPUs to serve ChatGPT's unlogged-in traffic, inference gross margin jumps from 38% to 65%, nearing breakeven.

NVIDIA Other 2026-07-02

NVIDIA AI Compute Partnership: Revenue Share and Credit Backstop to Lock Cloud Providers into DSX AI Factories

NVIDIA launches AI Compute Partnership with revenue sharing and credit backstop, shifting from hardware sales to recurring service revenue. Initial projects include 40K GB300 chips for Sharon AI and 170K GPUs for Firmus, totaling 200K+ high-end chips. NVIDIA is becoming the 'central bank' of AI compute, squeezing cloud brokers.

Reports

Filter