MoE - AI Infrastructure Intelligence Search

Other Other 2026-07-20

Moonshot AI Launches Kimi K3: 2.8T Parameter Open-Source MoE Model at $3/$15

Moonshot AI unveils Kimi K3, a 2.8T parameter open-source MoE model with 896 experts (16 active), native vision, and 1M context. API pricing at $3/$15 input/output per million tokens undercuts rivals. Open weights release July 27. GPU capacity exhausted within 2 days. Arena score 1679 tops Fable 5.

Other Other 2026-07-19

Alibaba Launches 2.4T Parameter Qwen3.8-Max MoE Model with 0.2x Pricing

Alibaba released Qwen3.8-Max-Preview, a 2.4 trillion parameter MoE multimodal model with 1M context window. It launched Qoder platform and Token Plan with aggressive discounts up to 0.2x, significantly reducing inference cost. The company claims it is second only to Anthropic Fable 5, marking China's AI entry into dual-track of parameter arms race and open-source competition.

AMD Other 2026-07-16

AMD与OpenAI达成6GW算力供应历史性协议 1.6亿认股权证可获10%股权股价盘前涨35%

...

NVIDIA Other 2026-06-16

NVIDIA Blackwell Sweeps MLPerf: NVLink and NVFP4 Redefine AI Training Economics

NVIDIA Blackwell dominates MLPerf Training 6.0, submitting across all seven benchmarks including MoE workloads. GB300 NVL72 delivers up to 1.6x faster training than GB200, with fifth-gen NVLink unifying 72 GPUs as one giant GPU. NVFP4 low-precision training and massive scale (8,192 GPUs) set new industry standards.

NVIDIA Other 2026-06-15

NVIDIA Bets on World-Action Models: Control Shifts from VLM to Video Backbones

NVIDIA's blog introduces World-Action Models (WAMs) as a paradigm shift from VLM-based VLAs. WAMs leverage pretrained video/world-model backbones to jointly predict future states and robot actions, aiming to bridge the language-to-action grounding gap. This could redefine robot foundation model training but raises concerns about inference cost and latency.

NVIDIA Other 2026-06-13

NVIDIA GB300 NVL72 Delivers 20x Agentic Coding Efficiency, Setting New Inference Benchmark

NVIDIA's GB300 NVL72 achieves 20x more concurrent coding agents per megawatt than H200 on the new AA-AgentPerf benchmark, leveraging 72-GPU NVLink fabric, MXFP4 kernels, and MoE optimizations. This first standardized agentic inference benchmark redefines data center capacity planning for AI agents.

NVIDIA Other 2026-06-13

NVIDIA AgentPerf Benchmark: Blackwell Ultra Delivers 20x More Agents per Megawatt vs Hopper

NVIDIA and Artificial Analysis unveil AgentPerf, the first benchmark for agentic AI workloads. Results show the GB300 NVL72 platform delivers up to 20x more concurrent agents per megawatt than the HGX H200 when running DeepSeek V4 Pro, using real coding agent trajectories to measure throughput and responsiveness.

NVIDIA Other 2026-06-12

NVIDIA and SK Hynix Lock Down HBM4/5 Roadmap, Cementing Vera Rubin Supply Chain

NVIDIA and SK Hynix sign a multi-year agreement to co-define HBM4 production and HBM5 pre-research for Vera Rubin GPUs. Samsung also enters HBM4 supply as a second source. The deal elevates SK Hynix from vendor to co-developer, potentially creating a de facto memory standard barrier that marginalizes Micron and others.

NVIDIA Other 2026-06-11

NVIDIA Optimizes Google's DiffusionGemma for 1,000 tok/s Parallel Text Generation

NVIDIA optimizes Google DeepMind's DiffusionGemma, a diffusion-based text model generating 256 tokens per step in parallel. On a single H100, it achieves 1,000 tok/s, with deployment via NIM and NeMo. This breaks the sequential token bottleneck, slashing serving costs and latency for real-time AI.

NVIDIA Other 2026-06-08

NVIDIA's UK Sovereign AI Play: From Chip Vendor to National Infrastructure Controller

NVIDIA partners with the UK government to deploy sovereign AI infrastructure via Isambard-AI (5,400 GH200 superchips) and the Sovereign AI Fund, backing local startups. This move establishes a national AI control plane, locking compute into NVIDIA's ecosystem and bypassing traditional hyperscalers like AWS and Azure.

NVIDIA Other 2026-06-04

NVIDIA Nemotron 3 Ultra: A MoE-Based Control Plane for Cost-Efficient AI Agent Orchestration

NVIDIA launches Nemotron 3 Ultra, a 550B-parameter MoE model (55B active) purpose-built for AI agent orchestration. Featuring Multi-Teacher On-Policy Distillation (MOPD) and a Hybrid Mamba-Transformer architecture, it achieves 5x throughput and 30% cost savings on tasks like SWE-bench, signaling a shift of reasoning control to a layered agent system.

NVIDIA Other 2026-06-02

NVIDIA DGX Spark Update: One-Click Local AI Agents, Multi-Node Cluster for 400B Models

At Computex 2026, NVIDIA updates DGX Spark with NemoClaw for one-click local AI agent setup, 2.6x throughput boost for Qwen3.6-35B via vLLM optimizations, and Sync cluster assistant to connect 2-4 nodes over ConnectX-7 200Gbps RoCE, enabling local deployment of large models and multi-agent pipelines.

AMD Other Medium Signal 2026-05-07

AMD Backs SPEC CPU 2026 Benchmark, Emphasizing Open, Trusted Performance Measurement

AMD published a blog endorsing the upcoming SPEC CPU 2026 industry benchmark, emphasizing the critical role of open, reproducible CPU performance standards for customer infrastructure decisions in the AI era. The new benchmark updates its application suite and strengthens support for bare-metal cloud environments and parallel computing.

Google Other High Signal 2026-05-06

Google Launches Gemma 4 Open Models, Accelerating Local AI Agent Deployment

Google released the Gemma 4 open model family under Apache 2.0 license, introducing MoE architecture for the first time. It aims to deliver high-performance AI agent capabilities directly to mobile and edge hardware, reducing reliance on cloud clusters and enabling new local, private AI applications.

AMD Other High Signal 2026-05-06

AMD and OpenAI Contribute MRC Protocol to OCP for Scalable AI Networking

AMD, in collaboration with OpenAI, Microsoft, and others, contributed the MRC (Multipath Reliable Connection) protocol, designed for large-scale AI training, to the Open Compute Project (OCP). AMD co-authored the specification and has already deployed MRC on its programmable Pensando DPU/NIC products, positioning its networking technology as a key enabler for resilient and adaptive AI infrastructure.

AMD Other High Signal 2026-05-06

AMD and OpenAI Introduce MRC, a Next-Gen Transport Protocol for AI Training

AMD, in collaboration with OpenAI, Microsoft, and other industry leaders, has released the specification for the Multipath Reliable Connection (MRC) protocol. MRC addresses performance bottlenecks of RoCEv2 in hyperscale AI training clusters through intelligent packet spraying, selective retransmission, and network-signaled congestion control, aiming to improve bandwidth utilization and job resilience.

NVIDIA Other 2026-05-05

NVIDIA Extreme Co-Design: Vera Rubin Platform Targets Agentic Inference TCO Inflection

NVIDIA unveils an extreme co-design stack for agentic systems, featuring Vera Rubin NVL72, NVLink 6, ConnectX-9, BlueField-4, and Spectrum-X. By disaggregating inference, optimizing KV cache management, and deploying low-latency fabrics, it aims to break the throughput-interactivity tradeoff, making high-context token processing economically viable.

AMD Other Medium Signal 2026-05-04

AMD Showcases Heterogeneous Computing Strategy for Enterprise AI with Dell

At Dell Technologies World, AMD highlighted its heterogeneous computing portfolio, aiming to match the right compute engine to specific enterprise AI workloads, while emphasizing hardware-based security and manageability. This signals a shift in AI infrastructure from generic solutions to fine-tuned, scenario-specific deployments.

AMD Other High Signal 2026-04-30

AMD Proposes New AI Infrastructure Networking Paradigm: From Lossless Fabrics to Intelligent Endpoints

AMD published a blog outlining seven key questions for building large-scale AI infrastructure, arguing that traditional lossless Ethernet or InfiniBand architectures face cost and complexity bottlenecks. It advocates shifting network intelligence and reliability functions from expensive, specialized switches to intelligent NICs, enabling reliable transport over standard (potentially lossy) Ethernet to reduce TCO and simplify operations.

NVIDIA Other High Signal 2026-04-29

NVIDIA Launches Nemotron 3 Nano Omni, Targeting AI Agent Perception Layer

NVIDIA released the open-source multimodal model Nemotron 3 Nano Omni, featuring a 30B-A3B hybrid MoE architecture. It unifies vision, audio, and language processing into a single model, designed to act as the 'eyes and ears' for AI agents. It claims to eliminate latency and context fragmentation from multi-model collaboration, achieving up to 9x higher throughput while maintaining interactivity, thereby reducing AI agent deployment and inference costs.

Reports

Filter

Moonshot AI Launches Kimi K3: 2.8T Parameter Open-Source MoE Model at $3/$15

Alibaba Launches 2.4T Parameter Qwen3.8-Max MoE Model with 0.2x Pricing

AMD与OpenAI达成6GW算力供应历史性协议 1.6亿认股权证可获10%股权股价盘前涨35%

NVIDIA Blackwell Sweeps MLPerf: NVLink and NVFP4 Redefine AI Training Economics

NVIDIA Bets on World-Action Models: Control Shifts from VLM to Video Backbones

NVIDIA GB300 NVL72 Delivers 20x Agentic Coding Efficiency, Setting New Inference Benchmark

NVIDIA AgentPerf Benchmark: Blackwell Ultra Delivers 20x More Agents per Megawatt vs Hopper

NVIDIA and SK Hynix Lock Down HBM4/5 Roadmap, Cementing Vera Rubin Supply Chain

NVIDIA Optimizes Google's DiffusionGemma for 1,000 tok/s Parallel Text Generation

NVIDIA's UK Sovereign AI Play: From Chip Vendor to National Infrastructure Controller

NVIDIA Nemotron 3 Ultra: A MoE-Based Control Plane for Cost-Efficient AI Agent Orchestration

NVIDIA DGX Spark Update: One-Click Local AI Agents, Multi-Node Cluster for 400B Models

AMD Backs SPEC CPU 2026 Benchmark, Emphasizing Open, Trusted Performance Measurement

Google Launches Gemma 4 Open Models, Accelerating Local AI Agent Deployment

AMD and OpenAI Contribute MRC Protocol to OCP for Scalable AI Networking

AMD and OpenAI Introduce MRC, a Next-Gen Transport Protocol for AI Training

NVIDIA Extreme Co-Design: Vera Rubin Platform Targets Agentic Inference TCO Inflection

AMD Showcases Heterogeneous Computing Strategy for Enterprise AI with Dell

AMD Proposes New AI Infrastructure Networking Paradigm: From Lossless Fabrics to Intelligent Endpoints

NVIDIA Launches Nemotron 3 Nano Omni, Targeting AI Agent Perception Layer

Reports

Filter

Moonshot AI Launches Kimi K3: 2.8T Parameter Open-Source MoE Model at $3/$15

Alibaba Launches 2.4T Parameter Qwen3.8-Max MoE Model with 0.2x Pricing

AMD与OpenAI达成6GW算力供应历史性协议 1.6亿认股权证可获10%股权 股价盘前涨35%

NVIDIA Blackwell Sweeps MLPerf: NVLink and NVFP4 Redefine AI Training Economics

NVIDIA Bets on World-Action Models: Control Shifts from VLM to Video Backbones

NVIDIA GB300 NVL72 Delivers 20x Agentic Coding Efficiency, Setting New Inference Benchmark

NVIDIA AgentPerf Benchmark: Blackwell Ultra Delivers 20x More Agents per Megawatt vs Hopper

NVIDIA and SK Hynix Lock Down HBM4/5 Roadmap, Cementing Vera Rubin Supply Chain

NVIDIA Optimizes Google's DiffusionGemma for 1,000 tok/s Parallel Text Generation

NVIDIA's UK Sovereign AI Play: From Chip Vendor to National Infrastructure Controller

NVIDIA Nemotron 3 Ultra: A MoE-Based Control Plane for Cost-Efficient AI Agent Orchestration

NVIDIA DGX Spark Update: One-Click Local AI Agents, Multi-Node Cluster for 400B Models

AMD Backs SPEC CPU 2026 Benchmark, Emphasizing Open, Trusted Performance Measurement

Google Launches Gemma 4 Open Models, Accelerating Local AI Agent Deployment

AMD and OpenAI Contribute MRC Protocol to OCP for Scalable AI Networking

AMD and OpenAI Introduce MRC, a Next-Gen Transport Protocol for AI Training

NVIDIA Extreme Co-Design: Vera Rubin Platform Targets Agentic Inference TCO Inflection

AMD Showcases Heterogeneous Computing Strategy for Enterprise AI with Dell

AMD Proposes New AI Infrastructure Networking Paradigm: From Lossless Fabrics to Intelligent Endpoints

NVIDIA Launches Nemotron 3 Nano Omni, Targeting AI Agent Perception Layer

AMD与OpenAI达成6GW算力供应历史性协议 1.6亿认股权证可获10%股权股价盘前涨35%