What is Cloudflare Our billing pipeline was suddenly slow. The culprit was a hidden bottleneck in ClickHouse?

Cloudflare 2026-05-14

Conf: 0%

Cloudflare's Trio of Patches Breaks ClickHouse Partition Bloat Lock Contention

Summary

Cloudflare's billing pipeline slowed after a partitioning change to (namespace, day) in ClickHouse, causing massive lock contention from exploding part counts. Three patches—shared lock, deferred vector copy, and binary search—cut query latency by >50% and decoupled performance from part count.

Key Takeaways

Cloudflare stores over 100PB in ClickHouse. Its Ready-Analytics system uses a single massive table partitioned by day, but per-namespace retention needs forced a change to (namespace, day) partitioning in Jan 2025. Despite expecting no impact (queries filter by namespace), billing jobs slowed in Mar 2025. Investigation revealed that total part count grew to 160k per replica, causing severe contention on the MergeTreeData mutex: every query planner thread acquired an exclusive lock and copied the entire parts list. CPU flame graphs showed 45% time in filterPartsByPartition, but real-time traces showed >50% of query duration was waiting for that mutex. Three patches: 1) switched to std::shared_lock for concurrent reads; 2) deferred vector copy with a shared cache; 3) binary search on sorted namespace to skip irrelevant parts. Query latency dropped >50% and decoupled from part count. Patches merged upstream (PR #85535, v25.11).

Source: blog

View Original →

Get 3-5 key AI infrastructure signals weekly →

Summary

Key Takeaways

💬 Comments (0)