DeepSeek - AI Infrastructure Intelligence Search

OpenAI Other 2026-06-20

Subquadratic Claims Quadratic Attention Breakthrough: Independent Benchmarks Confirm 52x Speedup at 1M Tokens

Miami startup Subquadratic releases independent benchmarks for its SubQ model, claiming 52x faster than FlashAttention at 1M tokens and up to 1000x compute reduction via Subquadratic Sparse Attention (SSA). Skeptics question if it's a fine-tune of existing models; full architecture remains unpublished.

Microsoft Other 2026-06-18

Microsoft Shifts Copilot Cowork to Usage-Based Pricing, Eyes DeepSeek for Cost-Efficiency

Microsoft transitions Copilot Cowork to usage-based billing (Copilot Credits) and considers integrating fine-tuned DeepSeek V4 or open-source models as low-cost alternatives, hosted on Azure. This move addresses high costs from intensive usage and signals a multi-model strategy.

Anthropic Other 2026-06-18

NVIDIA GB300 NVL72 Dominates AgentPerf: 20x More Agents per Megawatt Reshapes AI Inference

NVIDIA's GB300 NVL72 tops the first agentic-AI benchmark AgentPerf, running up to 20x more agents per megawatt than an H200 on DeepSeek V4 Pro. The test stresses multi-step tool-calling workloads, highlighting a new efficiency metric that will drive data center hardware decisions toward agent-optimized architectures.

Anthropic Other 2026-06-17

US Export Controls Halt Anthropic's Fable/Mythos: AI Geopolitical Precedent Set

The US Commerce Department suspends access to Anthropic's Fable 5 and Mythos 5 models for all foreign nationals, including Anthropic's own foreign employees, citing national security. Models are taken offline immediately. Anthropic dispatches executives to Washington for negotiations, marking a potential turning point for AI export controls.

MediaTek Other 2026-06-17

Huawei's LogicFolding: 3D Stacking Rewrites AI Chip Rules

Huawei's Tau Scaling Law and LogicFolding architecture boost transistor density by 55% and power efficiency by 41% via vertical logic stacking, targeting 1.4nm-class by 2031. Ascend 920/910C chips are now used for DeepSeek V4-Pro post-training, signaling real-world AI workload deployment and challenging Nvidia's dominance in China.

NVIDIA Other 2026-06-16

NVIDIA Blackwell Sweeps MLPerf: NVLink and NVFP4 Redefine AI Training Economics

NVIDIA Blackwell dominates MLPerf Training 6.0, submitting across all seven benchmarks including MoE workloads. GB300 NVL72 delivers up to 1.6x faster training than GB200, with fifth-gen NVLink unifying 72 GPUs as one giant GPU. NVFP4 low-precision training and massive scale (8,192 GPUs) set new industry standards.

AMD Other 2026-06-15

AMD Acquires MEXT: AI-Predicted Flash Nears DRAM Performance to Cut AI Memory TCO

AMD acquires MEXT, an AI-driven memory optimization startup. MEXT's predictive technology makes NAND Flash behave like DRAM, expanding effective memory capacity for AI workloads and lowering TCO. The tech will be integrated across AMD's data center portfolio (EPYC, Instinct) to address memory bottlenecks in large models.

AMD Other 2026-06-15

AMD Open-Sources AI Software Stack on Vultr, Taking on NVIDIA CUDA Ecosystem

AMD launches a suite of open-source, modular enterprise AI software components on Vultr Marketplace, including AMD Inference Microservices (AIMs), AI Workbench, Resource Manager, and Solution Blueprints. This aims to provide production-grade AI infrastructure without vendor lock-in, directly challenging NVIDIA's CUDA ecosystem.

Research Other 2026-06-15

Z.ai GLM-5.2 Ships Usable 1M-Token Context, No Benchmarks, Two Thinking Levels

Z.ai releases GLM-5.2 with a claim of usable 1M-token context and two thinking-effort levels. No standard benchmarks are provided, raising concerns about real-world performance. The model targets replacing chunking-based RAG with native long-context reasoning.

NVIDIA Other 2026-06-13

NVIDIA GB300 NVL72 Delivers 20x Agentic Coding Efficiency, Setting New Inference Benchmark

NVIDIA's GB300 NVL72 achieves 20x more concurrent coding agents per megawatt than H200 on the new AA-AgentPerf benchmark, leveraging 72-GPU NVLink fabric, MXFP4 kernels, and MoE optimizations. This first standardized agentic inference benchmark redefines data center capacity planning for AI agents.

NVIDIA Other 2026-06-13

NVIDIA AgentPerf Benchmark: Blackwell Ultra Delivers 20x More Agents per Megawatt vs Hopper

NVIDIA and Artificial Analysis unveil AgentPerf, the first benchmark for agentic AI workloads. Results show the GB300 NVL72 platform delivers up to 20x more concurrent agents per megawatt than the HGX H200 when running DeepSeek V4 Pro, using real coding agent trajectories to measure throughput and responsiveness.

AMD Other 2026-06-11

AMD, Dell, Cambridge Launch UK Sovereign AI Lab to Challenge NVIDIA's CUDA Dominance with Open ROCm

AMD, Dell, and the University of Cambridge launch the Sovereign AI Innovation Lab (SAIL) in the UK, deploying Zenith supercomputer with 5th Gen EPYC and Instinct MI355X GPUs, plus the Sunrise fusion AI system. The lab promotes open, interoperable AI infrastructure based on AMD ROCm, challenging NVIDIA's CUDA lock-in and offering long-term technology choice for national AI initiatives.

NVIDIA Product Launch 2026-05-29

NVIDIA Blackwell Ultra GB300 NVL72: 1.44 EFLOPS FP4, 50x AI Factory Boost

NVIDIA launches Blackwell Ultra GB300 NVL72 rack system with 72 Blackwell Ultra GPUs and 36 Grace CPUs, delivering 1,440 PFLOPS FP4 sparse, 20TB HBM3e, 130TB/s NVLink. Claims 50x AI factory output over Hopper. Available now.

Cisco Other High Signal 2026-04-30

Cisco Open Sources Model Provenance Kit, Targeting AI Supply Chain Security Governance

Cisco released the open-source Model Provenance Kit, which uses a tiered strategy to analyze model metadata, tokenizer structure, and weight-level signals to generate unique fingerprints and verify the lineage and integrity of AI models. This aims to address risks of tampering, forgery, and compliance in the AI model supply chain.

NVIDIA Other High Signal 2026-04-15

NVIDIA Shifts AI Infrastructure Metric from FLOPS to Cost Per Token

NVIDIA advocates for "cost per token" as the primary economic metric for AI infrastructure, replacing "FLOPS per dollar." This shift moves the focus from computational inputs to business outputs, requiring full-stack optimization across hardware, software, and networking to lower enterprise AI inference TCO.

Microsoft Other High Signal 2026-03-13

Microsoft Foundry Integrates Fireworks AI for Enhanced Open Model Inference Platform

Microsoft integrates Fireworks AI inference service into Microsoft Foundry, offering high-performance open model access with pay-per-token and provisioned throughput unit billing, and supports bring-your-own-weights to streamline enterprise deployment and operations.

Cisco Other Medium Signal 2026-03-09

Cisco Reveals Enterprise AI Tool Usage Patterns and Security Risks via DNS Telemetry

Cisco analyzed generative AI tool usage via secure access and DNS telemetry, revealing ChatGPT dominance and malicious domain impersonation risks. The approach demonstrates network traffic monitoring for AI tool assessment, providing actionable methodology for security teams.

NVIDIA Other Medium Signal 2026-03-04

NVIDIA Extends CUDA Tile Programming Model to Julia Language

NVIDIA introduces its CUDA Tile high-level GPU programming model to the Julia ecosystem via the cuTile.jl package. This move aims to lower the barrier to high-performance GPU kernel development by abstracting low-level thread and memory management with a tile-based data model, while maintaining high syntax and performance parity with the Python version.

Huawei Other 1970-01-01

Huawei Ascend 910C Trains 1.6T-Parameter MoE Model: First Full Pipeline on Domestic AI Chips

Huawei, in collaboration with research institutes, completed full-parameter post-training of DeepSeek-V4-Pro (1.6 trillion parameters, MoE) on an Ascend 910C cluster. Key metrics: stable 1,500 steps on 1,000 cards, 30% compute utilization, 14% operator efficiency gain, zero reliance on foreign GPUs. This marks the first end-to-end trillion-parameter training loop on domestic chips.

Reports

Filter