What is the impact level of this intelligence?

This intelligence is assessed as having Minor impact on enterprise technology decisions.

NVIDIA 2026-06-16

Product Launch Impact: Minor Conf: 85%

Lexar Offloads AI Models to SSD: DRAM Cut 40%, Latency Remains Hurdle

Q: Why is this NVIDIA update important for enterprises?

Lexar's move is a defensive play against **DRAM incumbents** (Samsung, SK Hynix) and a flanking maneuver against **NVIDIA's GPU memory dependency**. The control point shifts from DRAM capacity to the **SPU controller and software stack**, creating a storage ecosystem lock-in: users must stay with Lexar's proprietary hardware and optimizations. The **TTFM latency** (2-8 seconds) is downplayed but fatal for real-time AI (chat, inference). **PCIe Gen5** bandwidth and unknown **Tail Latency** of the SPU controller pose risks. No comparison with **NVMe over Fabrics** or other offloading methods. This solution trades latency for cost, suitable only for batch/offline workloads.

Summary

Lexar unveils AI Storage Core SSD with a custom SPU DRAM-less controller and software stack, offloading LLMs to NAND Flash. It runs Qwen 3.5 122B on 32GB DRAM at 15.6 tokens/s (3x improvement), but TTFM latency of 2-8 seconds hinders real-time use.

Key Takeaways

Lexar's AI Storage Core SSD uses a proprietary SPU (Storage Processing Unit) DRAM-less controller and software stack to offload LLMs to NAND Flash, drastically reducing DRAM requirements. Internal tests running Qwen 3.5 122B show 15.6 tokens/s on 32GB DRAM vs 5.2 tokens/s on 128GB DRAM with traditional setup. For 122B on 32GB, Llama.cpp crashes while Lexar delivers 4.4 tokens/s. With 64GB DRAM and 256K context, only Lexar runs at 19.3 tokens/s.

However, TTFM (Time to First Token) is 2 seconds at 2K context and 6-8 seconds at 4K. Lexar claims support up to 400B parameters but with very low throughput. The SSD uses PCIe Gen5 and hot-swappable M.2, but latency remains a fundamental limit of NAND vs DRAM.

Why It Matters

Lexar's move is a defensive play against DRAM incumbents (Samsung, SK Hynix) and a flanking maneuver against NVIDIA's GPU memory dependency. The control point shifts from DRAM capacity to the SPU controller and software stack, creating a storage ecosystem lock-in: users must stay with Lexar's proprietary hardware and optimizations.

The TTFM latency (2-8 seconds) is downplayed but fatal for real-time AI (chat, inference). PCIe Gen5 bandwidth and unknown Tail Latency of the SPU controller pose risks. No comparison with NVMe over Fabrics or other offloading methods. This solution trades latency for cost, suitable only for batch/offline workloads.

PRO Decision

Vendors: Competitors (Samsung, WD, Micron) should develop standardized AI offloading using NVMe ZNS or Open Channel SSD with native framework compatibility, avoiding proprietary SPU lock-in. Invest in low-latency NAND (SLC cache, ZC-Roller) to outperform Lexar.

Enterprises: CIOs must conduct zero-trust audit: measure TTFM and Tail Latency under real workloads. Beware of SPU controller depreciation and demand cross-platform portability. Prefer DRAM expansion or GPU memory offloading unless batch-tolerant.

Investors: Lexar's tech is a survival move for storage vendors, not a breakthrough. Watch for OEM deals and patents, but long-term risk: latency ceiling and standardized counterattacks from rivals. If TTFM stays above 1 second in 2 years, product becomes niche.

Source: Techpowerup

View Original →

Get 3-5 key AI infrastructure signals weekly →

Summary

Key Takeaways

Why It Matters

PRO Decision

💬 Comments (0)