Reports
AI-generated structured vendor updates
Subquadratic Claims Quadratic Attention Breakthrough: Independent Benchmarks Confirm 52x Speedup at 1M Tokens
Miami startup Subquadratic releases independent benchmarks for its SubQ model, claiming 52x faster than FlashAttention at 1M tokens and up to 1000x compute reduction via Subquadratic Sparse Attention (SSA). Skeptics question if it's a fine-tune of existing models; full architecture remains unpublished.
Microsoft Shifts Copilot Cowork to Usage-Based Pricing, Eyes DeepSeek for Cost-Efficiency
Microsoft transitions Copilot Cowork to usage-based billing (Copilot Credits) and considers integrating fine-tuned DeepSeek V4 or open-source models as low-cost alternatives, hosted on Azure. This move addresses high costs from intensive usage and signals a multi-model strategy.
NVIDIA GB300 NVL72 Dominates AgentPerf: 20x More Agents per Megawatt Reshapes AI Inference
NVIDIA's GB300 NVL72 tops the first agentic-AI benchmark AgentPerf, running up to 20x more agents per megawatt than an H200 on DeepSeek V4 Pro. The test stresses multi-step tool-calling workloads, highlighting a new efficiency metric that will drive data center hardware decisions toward agent-optimized architectures.
US Export Controls Halt Anthropic's Fable/Mythos: AI Geopolitical Precedent Set
The US Commerce Department suspends access to Anthropic's Fable 5 and Mythos 5 models for all foreign nationals, including Anthropic's own foreign employees, citing national security. Models are taken offline immediately. Anthropic dispatches executives to Washington for negotiations, marking a potential turning point for AI export controls.
Huawei's LogicFolding: 3D Stacking Rewrites AI Chip Rules
Huawei's Tau Scaling Law and LogicFolding architecture boost transistor density by 55% and power efficiency by 41% via vertical logic stacking, targeting 1.4nm-class by 2031. Ascend 920/910C chips are now used for DeepSeek V4-Pro post-training, signaling real-world AI workload deployment and challenging Nvidia's dominance in China.
NVIDIA Blackwell Sweeps MLPerf: NVLink and NVFP4 Redefine AI Training Economics
NVIDIA Blackwell dominates MLPerf Training 6.0, submitting across all seven benchmarks including MoE workloads. GB300 NVL72 delivers up to 1.6x faster training than GB200, with fifth-gen NVLink unifying 72 GPUs as one giant GPU. NVFP4 low-precision training and massive scale (8,192 GPUs) set new industry standards.
AMD Acquires MEXT: AI-Predicted Flash Nears DRAM Performance to Cut AI Memory TCO
AMD acquires MEXT, an AI-driven memory optimization startup. MEXT's predictive technology makes NAND Flash behave like DRAM, expanding effective memory capacity for AI workloads and lowering TCO. The tech will be integrated across AMD's data center portfolio (EPYC, Instinct) to address memory bottlenecks in large models.
AMD Open-Sources AI Software Stack on Vultr, Taking on NVIDIA CUDA Ecosystem
AMD launches a suite of open-source, modular enterprise AI software components on Vultr Marketplace, including AMD Inference Microservices (AIMs), AI Workbench, Resource Manager, and Solution Blueprints. This aims to provide production-grade AI infrastructure without vendor lock-in, directly challenging NVIDIA's CUDA ecosystem.
Z.ai GLM-5.2 Ships Usable 1M-Token Context, No Benchmarks, Two Thinking Levels
Z.ai releases GLM-5.2 with a claim of usable 1M-token context and two thinking-effort levels. No standard benchmarks are provided, raising concerns about real-world performance. The model targets replacing chunking-based RAG with native long-context reasoning.
NVIDIA GB300 NVL72 Delivers 20x Agentic Coding Efficiency, Setting New Inference Benchmark
NVIDIA's GB300 NVL72 achieves 20x more concurrent coding agents per megawatt than H200 on the new AA-AgentPerf benchmark, leveraging 72-GPU NVLink fabric, MXFP4 kernels, and MoE optimizations. This first standardized agentic inference benchmark redefines data center capacity planning for AI agents.
NVIDIA AgentPerf Benchmark: Blackwell Ultra Delivers 20x More Agents per Megawatt vs Hopper
NVIDIA and Artificial Analysis unveil AgentPerf, the first benchmark for agentic AI workloads. Results show the GB300 NVL72 platform delivers up to 20x more concurrent agents per megawatt than the HGX H200 when running DeepSeek V4 Pro, using real coding agent trajectories to measure throughput and responsiveness.
AMD, Dell, Cambridge Launch UK Sovereign AI Lab to Challenge NVIDIA's CUDA Dominance with Open ROCm
AMD, Dell, and the University of Cambridge launch the Sovereign AI Innovation Lab (SAIL) in the UK, deploying Zenith supercomputer with 5th Gen EPYC and Instinct MI355X GPUs, plus the Sunrise fusion AI system. The lab promotes open, interoperable AI infrastructure based on AMD ROCm, challenging NVIDIA's CUDA lock-in and offering long-term technology choice for national AI initiatives.
NVIDIA Blackwell Ultra GB300 NVL72: 1.44 EFLOPS FP4, 50x AI Factory Boost
NVIDIA launches Blackwell Ultra GB300 NVL72 rack system with 72 Blackwell Ultra GPUs and 36 Grace CPUs, delivering 1,440 PFLOPS FP4 sparse, 20TB HBM3e, 130TB/s NVLink. Claims 50x AI factory output over Hopper. Available now.
Cisco Open Sources Model Provenance Kit, Targeting AI Supply Chain Security Governance
Cisco released the open-source Model Provenance Kit, which uses a tiered strategy to analyze model metadata, tokenizer structure, and weight-level signals to generate unique fingerprints and verify the lineage and integrity of AI models. This aims to address risks of tampering, forgery, and compliance in the AI model supply chain.
NVIDIA Shifts AI Infrastructure Metric from FLOPS to Cost Per Token
NVIDIA advocates for "cost per token" as the primary economic metric for AI infrastructure, replacing "FLOPS per dollar." This shift moves the focus from computational inputs to business outputs, requiring full-stack optimization across hardware, software, and networking to lower enterprise AI inference TCO.
Microsoft Foundry Integrates Fireworks AI for Enhanced Open Model Inference Platform
Microsoft integrates Fireworks AI inference service into Microsoft Foundry, offering high-performance open model access with pay-per-token and provisioned throughput unit billing, and supports bring-your-own-weights to streamline enterprise deployment and operations.
Cisco Reveals Enterprise AI Tool Usage Patterns and Security Risks via DNS Telemetry
Cisco analyzed generative AI tool usage via secure access and DNS telemetry, revealing ChatGPT dominance and malicious domain impersonation risks. The approach demonstrates network traffic monitoring for AI tool assessment, providing actionable methodology for security teams.
NVIDIA Extends CUDA Tile Programming Model to Julia Language
NVIDIA introduces its CUDA Tile high-level GPU programming model to the Julia ecosystem via the cuTile.jl package. This move aims to lower the barrier to high-performance GPU kernel development by abstracting low-level thread and memory management with a tile-based data model, while maintaining high syntax and performance parity with the Python version.
Huawei Ascend 910C Trains 1.6T-Parameter MoE Model: First Full Pipeline on Domestic AI Chips
Huawei, in collaboration with research institutes, completed full-parameter post-training of DeepSeek-V4-Pro (1.6 trillion parameters, MoE) on an Ascend 910C cluster. Key metrics: stable 1,500 steps on 1,000 cards, 30% compute utilization, 14% operator efficiency gain, zero reliance on foreign GPUs. This marks the first end-to-end trillion-parameter training loop on domestic chips.