O
OpenAI
2026-06-20
Technology Integration Impact: Major Conf: 75%

Subquadratic Claims Quadratic Attention Breakthrough: Independent Benchmarks Confirm 52x Speedup at 1M Tokens

Summary

Miami startup Subquadratic releases independent benchmarks for its SubQ model, claiming 52x faster than FlashAttention at 1M tokens and up to 1000x compute reduction via Subquadratic Sparse Attention (SSA). Skeptics question if it's a fine-tune of existing models; full architecture remains unpublished.

Key Takeaways

Subquadratic emerged from stealth claiming the first LLM without quadratic attention, with a 12M token context window. Its core innovation is Subquadratic Sparse Attention (SSA): dynamically selects only relevant token subsets for each query, performing exact attention on that sparse set, with the selection mechanism itself being sublinear—unlike DeepSeek Sparse Attention's quadratic indexer.
Independent benchmarks show: 52x faster than FlashAttention at 1M tokens; up to 1000x compute reduction at 12M; RULER 128K score 95%; MRCR v2 at 1M 65.9%; SWE-Bench Verified 81.8%. API currently offers 1M token window; 12M is research-only.
Key skepticism: AI engineer Will Depue suggests SubQ is a sparse fine-tune of Kimi or DeepSeek, meaning base training costs remain quadratic. No formal rebuttal, no peer-reviewed paper, no MMLU/GPQA benchmarks.

Why It Matters

Hidden control shift: SSA's selection mechanism may degrade to near-linear at extreme context lengths, transferring the bottleneck from attention matrix to indexing—a subtle control point transfer. Encirclement of GPU vendors: Sparse attention breaks alignment with NVIDIA's Tensor Core optimizations, forcing inference toward general CPUs or custom sparse accelerators. Concealed training cost: If SubQ is a fine-tune, training remains quadratic; enterprises adopting based on inference cost may face lock-in without full stack efficiency. Lack of paper means no independent verification of training phase—a critical transparency gap.

PRO Decision

【Vendors (competitors)】 : NVIDIA should invest in native sparse attention hardware (e.g., Hopper Next sparse Tensor Cores) and push FlashAttention team to reproduce SSA benchmarks. Anthropic and OpenAI must integrate similar sparse attention in next-gen models (Claude 4, GPT-6) or acquire SubQ to neutralize the threat.
【Enterprises】 : CIOs/architects must demand full training compute graph (pre-training FLOPs) and independent MMLU/GPQA benchmarks from SubQ. Avoid single-benchmark procurement; prefer open-source alternatives (Mamba, linear attention) for multi-cloud portability.
【Investors】 : Look past PR—SubQ's $500M valuation hinges on unverified training efficiency. Wait for arXiv paper or independent replication. Long-term, bet on sparse hardware startups (Groq, Cerebras) and linear attention architectures (RWKV, Mamba) for transparent tech paths.

Source: WWWhatsNew / WSJ
View Original →

Get 3-5 key AI infrastructure signals weekly →

💬 Comments (0)