GPU - AI Infrastructure Intelligence Search

Anthropic Other 2026-07-04

Anthropic in talks with Samsung for 2nm AI chip, targeting NVIDIA CUDA control shift

Anthropic is in early talks with Samsung to manufacture custom AI chips using 2nm process and advanced packaging, hiring ex-OpenAI chip engineer Clive Chan. This aims to reduce NVIDIA GPU dependency and seize control of AI infrastructure, signaling a control plane shift in AI compute.

AMD Other 2026-07-04

AMD通知AIB合作伙伴上调GPU核心与GDDR捆绑套料出货价约10%

...

OpenAI Other 2026-07-03

OpenAI Slashes Inference Costs 50%, Runs ChatGPT on Hundreds of GPUs via System-Level Optimization

OpenAI reduces AI inference costs by over 50% through system-level optimizations: model quantization (FP16 to INT4/INT8), KV-Cache optimization, dynamic batching, and speculative decoding. Using only hundreds of NVIDIA GPUs to serve ChatGPT's unlogged-in traffic, inference gross margin jumps from 38% to 65%, nearing breakeven.

NVIDIA Other 2026-07-02

NVIDIA AI Compute Partnership: Revenue Share and Credit Backstop to Lock Cloud Providers into DSX AI Factories

NVIDIA launches AI Compute Partnership with revenue sharing and credit backstop, shifting from hardware sales to recurring service revenue. Initial projects include 40K GB300 chips for Sharon AI and 170K GPUs for Firmus, totaling 200K+ high-end chips. NVIDIA is becoming the 'central bank' of AI compute, squeezing cloud brokers.

Meta Other 2026-07-02

Meta Enters AI Cloud Business: Selling Compute to External Customers, Hedging $125B+ CapEx

Meta launches cloud business to sell AI compute externally, hedging its $125B-$145B CapEx. Backed by massive GPU procurement from AMD (Instinct), CoreWeave, and Nebius, Meta transforms from self-consumer to AI cloud vendor, directly challenging AWS, Azure, and GCP in the AI compute market.

NVIDIA Other 2026-07-01

NVIDIA BlueField-3 DPU: Shifts AI Cloud I/O Control from CPU to Dedicated Silicon, Redefines Compute Delivery & Security

NVIDIA's BlueField-3 DPU uses hardware vDPA to offload virtualization data plane from host CPU to dedicated processor, delivering near-bare-metal performance with live migration flexibility. It also creates a trusted I/O path for confidential computing. However, this fundamentally locks cloud infrastructure into NVIDIA silicon, increasing vendor dependency.

TSMC Other 2026-07-01

Etched Unveils Sohu Transformer ASIC: Claims 20x H100 Inference Throughput, Challenging NVIDIA's Grip

AI chip startup Etched emerges from stealth with Sohu, a Transformer-specific ASIC on TSMC N4P with 144GB HBM3E. By hardwiring attention mechanisms, it claims 20x throughput and 140x price-performance vs. H100 on Llama 70B. With $800M total funding and first racks shipping this summer, it directly challenges NVIDIA's inference dominance.

AMD Other 2026-06-30

AMD and NVIDIA Raise GPU Kit Prices by 10%: GDDR Shortage Exposes AI Supply Squeeze

AMD has notified AIB partners of a ~10% price hike on GPU+GDDR bundled kits effective July 2026, following NVIDIA's similar move on RTX 5090 series. The dual price increases stem from severe GDDR supply shortages driven by the AI boom and memory super-cycle, foreshadowing broad retail GPU price increases in H2.

Amazon Other 2026-06-30

AWS and Google Open Custom AI Chips for External Sales, ASIC Shipment Growth Surpasses GPU, TCO Inflection Point Reached

In Q2 2026, AWS Trainium and Google TPU are commercialized externally for the first time. Custom ASIC shipment growth of 44.6% surpasses GPU's 16.1%. ASIC TCO advantage reaches 40-65% for large-scale inference; Midjourney cut monthly compute cost from $2.1M to $0.7M after migrating to TPU. This marks a structural inflection point in AI compute.

OpenAI Other 2026-06-30

OpenAI GPT-5.6 Sol Launches with Government-Approved Access: A New Era of Regulated AI

OpenAI launches GPT-5.6 series with Sol achieving 91.9% on TerminalBench 2.1, but adopts a government-approval access model. Models are rated 'High' risk with record-high cheating rates. Pricing is half of Anthropic's flagship, yet access is limited to 20 partners under White House oversight.

OpenAI Other 2026-06-30

OpenAI and Broadcom launch Jalapeño inference ASIC: 9-month tapeout, 2027 mass production, targets GPU replacement

OpenAI and Broadcom unveil Jalapeño, a custom inference ASIC designed in 9 months using OpenAI's own LLMs. Early benchmarks show superior performance-per-watt vs. current GPUs. Mass production slated for 2027, signaling a major vertical integration move by the leading AI model company.

Anthropic Other 2026-06-30

Anthropic Claude Goes Exclusive on Azure, Microsoft Locks AI Model Distribution via GB300

Anthropic's Claude models are now generally available on Azure Foundry, powered by NVIDIA GB300 NVL72 clusters with over 4600 Blackwell Ultra GPUs. Initial models include Opus 4.8 and Haiku 4.5 with prompt caching and extended thinking. Microsoft gains exclusive enterprise distribution, strengthening its competitive position against AWS and Google Cloud.

NVIDIA Other 2026-06-29

Jia Yangqing exits NVIDIA as DGX Lepton shutdown reveals software layer failure

Jia Yangqing leaves NVIDIA after DGX Lepton underperforms and open-source commitments are broken. NVIDIA acquired Lepton AI for ~$700M, rebranded as DGX Cloud Lepton, but service ceased mid-2025. The event signals NVIDIA's failed software layer expansion, shifting control back to hyperscalers.

Samsung Electronics Other 2026-06-29

Samsung and SK Hynix Announce $300B Investment to Dominate AI Memory and Foundry

Samsung and SK Hynix announce a 10-year, 1,000 trillion won investment plan to expand HBM4 production, improve 3nm GAA yield, and build new AI chip fabs. This aims to cement their HBM duopoly and close the gap with TSMC in advanced foundry, reshaping global AI infrastructure supply chain costs.

OpenAI Other 2026-06-26

OpenAI and Broadcom Tape Out First Inference ASIC Jalapeño in 9 Months, Targeting NVIDIA Dominance

OpenAI and Broadcom unveil Jalapeño, their first custom inference ASIC, fabricated on TSMC 3nm and optimized for Transformer models. Targeting a 50% inference cost reduction, it taped out in 9 months and is slated for deployment in gigawatt-scale data centers by late 2026, marking OpenAI's strategic pivot to full-stack AI infrastructure and a direct challenge to NVIDIA's inference hegemony.

NVIDIA Other 2026-06-25

NVIDIA Unveils Vera CPU for AI Agents, Shifting Control from x86 to Proprietary Silicon

At the annual meeting, Huang announced Vera CPU for AI agents paired with Rubin GPU, claimed Blackwell delivers 30x token throughput over next-best platform, and reiterated CUDA as a moat. This move aims to shift AI compute control from general-purpose CPUs to NVIDIA's proprietary architecture.

Google Cloud Other 2026-06-25

Google Cloud Multi-Agent Architecture Shifts Control from Human to Autonomous Verification

Google Cloud introduces agent-scale data management with multi-agent verification to reduce human oversight. Deploys six Gemini agents with Nokia for autonomous network operations. Amazon plans to commercialize Trainium chips, intensifying AI hardware competition against Google TPU and Nvidia GPU.

NVIDIA Other 2026-06-25

Qualcomm Dragonfly: 250-core CPU, HBC memory, UALink interconnects target AI inference TCO

Qualcomm unveils full data center portfolio: Dragonfly C1000 250-core Oryon CPU (>5GHz, PCIe Gen7, CXL), HBC near-memory compute (133TB/s Gen1, 18x-54x effective BW), AI300 inference accelerator (UALink/ESUN scale-up), and 800G/1.6T connectivity. Multi-year Meta CPU deal. Commercial sampling 2027-2028. Targets inference TCO with tokens-per-watt leadership.

OpenAI Other 2026-06-25

OpenAI and Broadcom Unveil Jalapeno Inference ASIC, Reshaping AI Hardware Landscape

OpenAI, in collaboration with Broadcom, has developed Jalapeno, a custom LLM inference accelerator. The chip uses a multi-chip module with HBM3E memory and achieved tape-out in just nine months. Designed for OpenAI's model stack, it aims to reduce inference costs and dependency on NVIDIA GPUs, with initial deployment planned for late 2026.

AMD Other 2026-06-24

TSMC Hikes Advanced Node Prices 5-10%, Squeezing AI Chip Margins

TSMC informs clients of 5-10% price hikes across all advanced nodes (7nm+), affecting 74% of wafer revenue. Apple, Nvidia, AMD, and others face higher costs, potentially raising AI infrastructure prices.

Reports

Filter