MoE - AI Infrastructure Intelligence Search

Apple Partnership High Signal 2026-04-27

Apple-Google Multi-Year Partnership Confirmed: Gemini to Power New Siri

Apple and Google confirm multi-year partnership with Google Cloud as preferred provider. Google is building a custom 1.2 trillion parameter Gemini model for Apple, 8x Apple's current cloud model. Siri will gain Gemini capabilities in 2026 with iOS 27. Privacy architecture unchanged—Gemini runs on Apple-controlled servers with data protection guarantees. Device compatibility limits exclude hundreds of millions of older iPhone users.

NVIDIA Other High Signal 2026-04-15

NVIDIA Shifts AI Infrastructure Metric from FLOPS to Cost Per Token

NVIDIA advocates for "cost per token" as the primary economic metric for AI infrastructure, replacing "FLOPS per dollar." This shift moves the focus from computational inputs to business outputs, requiring full-stack optimization across hardware, software, and networking to lower enterprise AI inference TCO.

Google Other High Signal 2026-04-03

Google Launches Gemma 4 Open Models, Targeting Edge Inference and AI Agent Architecture

Google introduces the Gemma 4 open model family, with four sizes from 2B to 31B parameters, emphasizing breakthrough intelligence-per-parameter and native support for agentic workflows, multimodality, and long context. The small models are engineered for edge devices, aiming to bring frontier reasoning to mobile and IoT scenarios.

Google Other Medium Signal 2026-04-03

Google Launches Gemma 4 Open Model Family

Google introduces Gemma 4 open model family with four size variants, optimized for edge and mobile devices. The series supports multimodal processing, long context windows and 140+ languages under Apache 2.0 license.

NVIDIA Other High Signal 2026-03-12

Nvidia Launches Nemotron 3 Super for Agentic AI Inference Optimization

Nvidia releases Nemotron 3 Super, a 120B parameter model with hybrid MoE architecture combining Mamba and Transformer layers, delivering 5x throughput improvement. Designed for multi-agent workflows with 1M token context window to prevent task drift. Open weights and cloud deployment lower enterprise adoption barriers.

NVIDIA Other 2025-06-01

NVIDIA RTX Spark and Nemotron-3 Ultra: AI Control Shifts from Cloud to Personal Edge

NVIDIA launched RTX Spark personal AI supercomputer (co-developed with MediaTek) and Nemotron-3 Ultra open-source model at GTC Taipei 2026. The N1X chip delivers 1 PFLOPS local AI compute, bringing LLM inference to PCs. This marks NVIDIA's pivot from cloud GPU vendor to edge AI infrastructure monopolist, redefining the PC as an AI-native device.

Huawei Other 1970-01-01

Huawei Ascend 910C Trains 1.6T-Parameter MoE Model: First Full Pipeline on Domestic AI Chips

Huawei, in collaboration with research institutes, completed full-parameter post-training of DeepSeek-V4-Pro (1.6 trillion parameters, MoE) on an Ascend 910C cluster. Key metrics: stable 1,500 steps on 1,000 cards, 30% compute utilization, 14% operator efficiency gain, zero reliance on foreign GPUs. This marks the first end-to-end trillion-parameter training loop on domestic chips.

Research Other 1970-01-01

Z.ai GLM-5.2 Open-Source: 744B MoE, 1M Context, MIT License as Geopolitical Shield

Z.ai releases GLM-5.2: 744B MoE with 40B activated parameters, 1M input and 131K output context, under MIT license. Released one day after Anthropic Fable 5's government takedown, it offers a downloadable, unbanable alternative with Anthropic API compatibility for zero-code migration, giving enterprises a sovereign AI option.

NVIDIA Other 1970-01-01

SGLang 0.5.13 Delivers 25x MoE Inference Speedup via Predictive Routing and Sparse KV Cache

SGLang 0.5.13 introduces two-stage MoE routing prediction and sparse KV cache, achieving a 25x inference speedup on NVIDIA GB300 NVL72. Benchmarks on A100 show 65% throughput gain, 40% latency reduction, and 62% lower routing overhead. This optimization directly attacks the core bottleneck of MoE inference, potentially reshaping AI inference economics.

Reports

Filter