Reports
AI-generated structured vendor updates
NVIDIA AgentPerf Benchmark: Blackwell Ultra Delivers 20x More Agents per Megawatt vs Hopper
NVIDIA and Artificial Analysis unveil AgentPerf, the first benchmark for agentic AI workloads. Results show the GB300 NVL72 platform delivers up to 20x more concurrent agents per megawatt than the HGX H200 when running DeepSeek V4 Pro, using real coding agent trajectories to measure throughput and responsiveness.
Google Awards 3M+ TPU Packaging Orders to Intel Foundry, Breaking TSMC's CoWoS Monopoly
Google has awarded Intel Foundry over 3 million units of next-gen TPU advanced packaging orders, leveraging Intel's EMIB technology with production starting in 2028. This marks Intel Foundry's largest external customer win and a pivotal shift in AI chip packaging away from TSMC's CoWoS monopoly.
Cisco AI Defense Policy Studio: Meta-Prompting Unwritten Policy into Auditable Guardrails
Cisco introduces AI Defense Policy Studio, an AI assistant that guides policy owners through authoring custom guardrails via a chat-and-review UI. It uses meta-prompting to translate informal guidance into human- and model-readable policy documents, directly deployable to Cisco AI Defense for runtime enforcement across models and applications.
NVIDIA Halos OS: A Certified Safety OS That Seizes Control of Autonomous Driving
NVIDIA introduces Halos OS, a full-stack safety system comprising ASIL D certified Halos Core, standardized Halos SDK, AI guardrails in Halos Applications, and cloud-based Safety Evaluation Framework. Built on DRIVE Hyperion, it aims to embed safety into L4 robotaxis from the ground up.
NVIDIA Locks Local AI Inference Control with DiffusionGemma Parallel Generation
NVIDIA optimizes Google DeepMind's DiffusionGemma open model, which generates 256 tokens in parallel for 4x speedup over autoregressive models. Achieves 1000 tokens/sec on H100, 150 tokens/sec on DGX Spark, running fully locally with no cloud cost. This reinforces NVIDIA GPU's centrality in compute-bound local AI inference.
Google Lightning Engine: 4.9x Spark Performance with Ecosystem Lock-in Risks
Google Cloud launches Lightning Engine GA for Apache Spark, delivering up to 4.9x faster performance via vectorized native execution on Gluten/Velox. Optimized Cloud Storage and BigQuery connectors boost throughput, but the premium tier and deep integration create vendor lock-in risks.
GKE Inference Gateway Prefix Caching: 92% Faster AI Inference with Hidden Lock-in
Google Cloud launches GKE Inference Gateway with prefix caching and model-aware routing, achieving 92.8% lower TTFT and 15.7% higher throughput on Llama 3.1 8B. Snap reports 75-80% cache hit rates. However, deep integration with GKE Gateway API risks lock-in, limiting multi-cloud portability.
Cloudflare AI Gateway Adds Identity-Driven Budgets, Seizing AI Traffic Control
Cloudflare launches spend limits and identity-driven budgets (closed beta) in AI Gateway, integrating with Cloudflare Access. It enables per-user, per-team dollar budgets with fallback routing, shifting AI cost governance from model providers to the gateway control plane.
NVIDIA Nemotron 3 Ultra: A MoE-Based Control Plane for Cost-Efficient AI Agent Orchestration
NVIDIA launches Nemotron 3 Ultra, a 550B-parameter MoE model (55B active) purpose-built for AI agent orchestration. Featuring Multi-Teacher On-Policy Distillation (MOPD) and a Hybrid Mamba-Transformer architecture, it achieves 5x throughput and 30% cost savings on tasks like SWE-bench, signaling a shift of reasoning control to a layered agent system.
Google's gcs-analytics-core Library Boosts Iceberg and Spark Performance on GCS
Google Cloud announces gcs-analytics-core, an open-source Java library integrated into Iceberg 1.11.0+ GCSFileIO. It uses vectored I/O and smart Parquet prefetching to reduce scan latency. TPC-DS benchmarks show 18%-71% scan time improvement, but execution time gains are modest for large datasets (1.58% at 10TB).
Intel at Computex 2026: 18A, Rackscale, and the Shift to CPU-Centric AI Orchestration
Intel unveils Core Ultra Series 3 on 18A, Xeon 6+ with 288 e-cores, a hybrid local inference orchestrator with Perplexity, rackscale AI infrastructure with Foxconn, and disaggregated inference cloud with SambaNova. The keynote positions the CPU as the central orchestrator for agentic AI, signaling a control plane shift from GPU to x86.
Cisco Live 2026: AI Defense Upgrades with Policy Studio, Adaptive Red Teaming, Agent Supply Chain Security
At Cisco Live 2026, Cisco unveiled AI Defense upgrades: adaptive red teaming, Policy Studio for natural language policy, and agent supply chain security with CI/CD integration. It also launched AgenticOps autonomous network operations and native integrations with Amazon Bedrock, Google ADK, LangChain, aiming to secure multi-framework agent environments.
Intel and SambaNova Rackscale AI: CPU Regains Inference Control Plane
At Computex 2026, Intel unveiled rack-scale AI infrastructure combining Xeon 6+ with SambaNova SN-50 RDUs, plus a fully disaggregated inference cloud (prefill on NVIDIA Blackwell, decode on RDUs) by Vector Core Compute. This aims to reposition the CPU as the central orchestrator for inference, challenging GPU dominance.
Arm and NVIDIA RTX Spark: Unified Memory PC Architecture Targets Agentic AI, Encircles x86
Arm and NVIDIA unveil RTX Spark, an Arm-based Grace CPU + Blackwell RTX GPU platform with unified memory, targeting Windows on Arm for agentic AI inference. It delivers 1 Petaflop, reduces token cost, and signals a PC paradigm shift from app-driven to agent-driven, backed by Microsoft.
Cisco AI Defense Update: Agent Supply Chain Security as Platform Lock-In
Cisco updates AI Defense for agent security with adaptive red teaming, Policy Studio, and automated agent dependency graph scanning. It claims platform-agnostic protection across AWS Bedrock, Google ADK, LangChain, but deeply ties into Cisco Secure AI Factory with NVIDIA, raising concerns about lock-in and runtime overhead.
Google AlloyDB Remote MCP Server GA: Standardizing AI Agent Data Access with Open Protocol
Google Cloud announces GA of AlloyDB Remote MCP Server, enabling AI agents to securely access operational data via HTTP endpoints. Built on open MCP protocol, it offers IAM fine-grained authorization, Model Armor protection, and audit logging, integrated with AlloyDB’s ScaNN vector index (10B+ vectors, 6x speed) and AI functions, positioning AlloyDB as the single source of truth for enterprise agentic workloads.
Intel Reclaims AI Control Plane: Xeon 6+ and E835 Target Agentic Orchestration
Intel launches Xeon 6+ (288 E-cores on 18A), E835 200GbE controllers, and Crescent Island GPU. The strategy repositions the CPU as the control plane for agentic AI orchestration and data movement, while using E835 Ethernet to standardize AI data center networking.
Google Launches A2UI: Open Protocol for Agent-Driven UI in Gemini Enterprise
Google introduces A2UI, an open protocol enabling AI agents to return JSON payloads describing interactive UI components (date pickers, maps) for native rendering in Gemini Enterprise. It integrates with A2A and Flutter, solving the text-only limitation while preventing HTML injection.
NVIDIA's Triple Play: Vera CPU, N1X Laptop Chip, and $6.5B Silicon Photonics Reshape AI Infra Control
NVIDIA delivers first agent-specific Vera CPU (88 Arm v9.2 cores, 1.2TB/s memory bandwidth), teases consumer N1X laptop chip, and invests $6.5B in silicon photonics. This shifts AI orchestration control from x86 to NVIDIA's Arm ecosystem, while CPO addresses memory wall, but volume production remains challenging until post-2028.
NVIDIA Vera CPU Benchmark Crushes x86: Memory Bandwidth Hegemony for Agentic AI
Phoronix benchmarks show NVIDIA Vera CPU with 88 custom Olympus cores (Armv9.2), 1.2TB/s LPDDR5X bandwidth, and 450W TDP outperforming Intel/AMD x86 across agentic AI workloads. It achieves 1.5x overall performance vs 128-core x86, 90% STREAM TRIAD efficiency, and 20-second Linux kernel compilation.