Filter

×
Active Filters Clear All
Keyword: TPU ×
79 Total Reports
1/4 Page
NVIDIA Other 2026-06-13

NVIDIA AgentPerf Benchmark: Blackwell Ultra Delivers 20x More Agents per Megawatt vs Hopper

NVIDIA and Artificial Analysis unveil AgentPerf, the first benchmark for agentic AI workloads. Results show the GB300 NVL72 platform delivers up to 20x more concurrent agents per megawatt than the HGX H200 when running DeepSeek V4 Pro, using real coding agent trajectories to measure throughput and responsiveness.

Intel Other 2026-06-12

Google Awards 3M+ TPU Packaging Orders to Intel Foundry, Breaking TSMC's CoWoS Monopoly

Google has awarded Intel Foundry over 3 million units of next-gen TPU advanced packaging orders, leveraging Intel's EMIB technology with production starting in 2028. This marks Intel Foundry's largest external customer win and a pivotal shift in AI chip packaging away from TSMC's CoWoS monopoly.

Microsoft Other 2026-06-11

Microsoft & NVIDIA RTX Spark Brings 1 Petaflop AI to Windows, Reshaping Local Inference

At Computex 2026, Microsoft unveiled RTX Spark, an Arm-based AI superchip co-developed with NVIDIA and MediaTek, delivering up to 1 petaflop AI performance and 128GB unified memory for local 120B parameter models. Intel Arc G3 and Qualcomm Snapdragon X2 series also launched, accelerating the Windows AI PC ecosystem.

NVIDIA Other 2026-06-11

NVIDIA Locks Local AI Inference Control with DiffusionGemma Parallel Generation

NVIDIA optimizes Google DeepMind's DiffusionGemma open model, which generates 256 tokens in parallel for 4x speedup over autoregressive models. Achieves 1000 tokens/sec on H100, 150 tokens/sec on DGX Spark, running fully locally with no cloud cost. This reinforces NVIDIA GPU's centrality in compute-bound local AI inference.

Amazon Other 2026-06-10

Anthropic Claude Fable 5 on AWS: Data Retention Policy Breaches Cloud Security Boundary, Erodes Enterprise Data Sovereignty

AWS and Anthropic launch Claude Fable 5 with long-running async execution, advanced vision, and proactive self-verification. Access requires 30-day data retention and sharing with Anthropic, moving inference data outside AWS security boundary. Harmful prompts fall back to Opus 4.8, introducing complex pricing and governance risks.

Google Other 2026-06-09

GKE Inference Gateway Prefix Caching: 92% Faster AI Inference with Hidden Lock-in

Google Cloud launches GKE Inference Gateway with prefix caching and model-aware routing, achieving 92.8% lower TTFT and 15.7% higher throughput on Llama 3.1 8B. Snap reports 75-80% cache hit rates. However, deep integration with GKE Gateway API risks lock-in, limiting multi-cloud portability.

NVIDIA Other 2026-06-09

NVIDIA NVFP4: Native 4-Bit Training Boosts Throughput 1.73x, Locks Blackwell Ecosystem

NVIDIA introduces NVFP4, a native 4-bit format on Blackwell, enabling lossless mixed-precision pretraining in JAX/MaxText. Achieves 1.73x throughput gain over FP8 on Llama 3.1 405B (GB300). Techniques like micro-block scaling and Random Hadamard Transform boost performance but lock users into NVIDIA hardware.

Amazon Other 2026-06-06

AWS Bedrock New Console Embraces OpenAI/Anthropic APIs, Shifting Control to Inference Layer

AWS launches a new Bedrock console powered by the bedrock-mantle endpoint, natively supporting OpenAI and Anthropic API protocols. Users can seamlessly switch between GPT, Claude, and open-weight models. This move standardizes model access, aiming to lock users into AWS's unified inference plane while weakening individual model provider API lock-in.

NVIDIA Other 2026-06-04

NVIDIA Nemotron 3 Ultra: A MoE-Based Control Plane for Cost-Efficient AI Agent Orchestration

NVIDIA launches Nemotron 3 Ultra, a 550B-parameter MoE model (55B active) purpose-built for AI agent orchestration. Featuring Multi-Teacher On-Policy Distillation (MOPD) and a Hybrid Mamba-Transformer architecture, it achieves 5x throughput and 30% cost savings on tasks like SWE-bench, signaling a shift of reasoning control to a layered agent system.

Amazon Other 2026-06-02

AWS Hosts OpenAI GPT-5.5 & Codex: Control Shifts from Model to Cloud

AWS launches OpenAI GPT-5.5, GPT-5.4, and Codex on Bedrock via the Responses API. This integrates frontier models into AWS infrastructure for data residency and capacity management, but locks users into Bedrock's ecosystem.

Google Other 2026-06-01

Google AlloyDB Remote MCP Server GA: Standardizing AI Agent Data Access with Open Protocol

Google Cloud announces GA of AlloyDB Remote MCP Server, enabling AI agents to securely access operational data via HTTP endpoints. Built on open MCP protocol, it offers IAM fine-grained authorization, Model Armor protection, and audit logging, integrated with AlloyDB’s ScaNN vector index (10B+ vectors, 6x speed) and AI functions, positioning AlloyDB as the single source of truth for enterprise agentic workloads.

NVIDIA Other 2026-06-01

NVIDIA Alpamayo: Closed-Loop RL Post-Training Bridges AV Sim-to-Real Gap

NVIDIA's Alpamayo platform introduces AlpaGym, an open-source, high-throughput closed-loop RL post-training framework. It integrates AlpaSim simulator, Cosmos-RL distributed training, and Physical AI datasets, enabling AV models to learn from the consequences of their own actions in simulation, significantly reducing the gap between training and deployment.

NVIDIA Other 2026-06-01

NVIDIA Cosmos 3: Open-Source Physical AI Model with MoT for Ecosystem Lock-in

NVIDIA releases Cosmos 3, a unified physical AI foundation model with Mixture-of-Transformers architecture combining reasoning, world generation, and action generation. Open-sourced with training scripts and six synthetic datasets, but deployment optimized for NVIDIA NIM and GPUs, signaling an ecosystem lock-in strategy.

NVIDIA Other 2026-06-01

NVIDIA Vera CPU: Custom Olympus Core and LPDDR5X Redefine CPU for Agentic AI Factories

NVIDIA unveils Vera CPU with 88 custom Olympus cores, 1.2TB/s LPDDR5X bandwidth, and SCF fabric, targeting CPU execution bottlenecks in agentic AI and reinforcement learning. Claiming 1.8x performance over x86 and memory power under 30W, it shifts AI factory metrics from cores-per-dollar to tokens-per-dollar.

Google Product Launch 2026-05-22

Google I/O 2026 Pivots to Agentic AI: Antigravity 2.0 and TPU 8t/8i Reshape Control Plane

At I/O 2026, Google unveiled Gemini 3.5 Flash (4x output speed), Antigravity 2.0 multi-agent orchestration, TPU 8t/8i (3x training, 2x inference perf/W), and Gemini Spark, signaling a full pivot to Agentic AI infrastructure. By integrating platform and silicon, Google shifts control from model APIs to orchestration and hardware lock-in.

Google Other 2026-05-21

Google Antigravity Control Plane Redefines AI Development, Locks Agent Orchestration

At I/O 2026, Google launched Antigravity 2.0 desktop app and CLI/SDK as a unified agent control plane, alongside Gemini 3.5 Flash/Omni models, Managed Agents API, and native Android support in AI Studio. This aims to streamline AI development from prototype to production, but effectively locks developers into Google's ecosystem and cloud services.

Anthropic Other 2026-05-19

KPMG Embeds Claude for 276k Staff, Reshaping Professional Services AI

KPMG announces a global alliance with Anthropic, embedding Claude into its core Digital Gateway platform and making it available to all 276,000+ employees. This integration, starting with tax and legal services and expanding to cybersecurity and private equity, signifies a fundamental shift from AI-assisted work to an AI-native service delivery model, positioning Claude as the default intelligence layer for professional services.

Google Other 2026-05-19

Google TPU 8t/8i Enables Cross-Datacenter Training, Gemini 3.5 Flash 4x Faster

Google unveils TPU 8t (training) and TPU 8i (inference) with 3x raw compute and 2x perf-per-watt. JAX/Pathways enable distributed training across 1M+ TPUs across sites. Gemini 3.5 Flash delivers 4x output tokens per second vs frontier models. SynthID adopted by OpenAI, Nvidia, Kakao, Eleven Labs.

Google Other 2026-05-18

Google Cloud Managed MCP Server Shifts AI Data Layer Control from SQL to Standardized Protocol

Google Cloud introduces Managed MCP Tools, standardizing AI-to-data interaction via the Model Context Protocol. The blog outlines five scenarios from static APIs to MCP agents, highlighting MCP as an open standard that decouples reasoning from data access, though the managed implementation tightly couples to BigQuery.

Cloudflare Other 2026-05-18

Cloudflare Tests Anthropic Mythos: AI-Driven Exploit Chain Construction and Proof Generation

Cloudflare's Project Glasswing tested Anthropic's Mythos Preview, revealing its ability to automatically chain multiple low-severity bugs into exploitable PoCs with runnable code. They built a multi-stage harness to manage noise and context limits, achieving a significant leap in vulnerability discovery quality.