C
Cloudflare
2026-05-14
Technology Integration Impact: Important Strength: Medium Conf: 95%

Cloudflare Optimizes ClickHouse Partitioning, Reveals Hidden Bottlenecks in Massive-Scale Data Architecture

Summary

Cloudflare addressed a critical performance degradation in its billing pipeline caused by a partitioning change in its petabyte-scale ClickHouse analytics platform. Through deep performance profiling, they identified lock contention and vector copying bottlenecks in the query planner. The company contributed three key optimization patches upstream, significantly improving query performance in high-concurrency, high-partition-count scenarios.

Key Takeaways

Cloudflare's 'Ready-Analytics' platform serves hundreds of internal applications with a single massive table holding over 2PiB of data. To enable per-namespace data retention policies, they changed the partition key from `(day)` to `(namespace, day)`, causing a massive increase in the number of data parts per table.
Initial performance issues manifested as slowed billing aggregation jobs, but standard metrics (I/O, memory) were normal. Using built-in `tracelog` for flame graphs revealed 45% of CPU time spent filtering partition IDs. Deeper 'Real' flame graphs showed over half the query duration was spent waiting for a mutex (`MergeTreeData`) protecting the list of all parts.
Cloudflare implemented three progressive optimizations: 1) Changing the query planner's exclusive lock to a shared lock to eliminate contention; 2) Avoiding copying the entire parts vector by implementing a shared cache; 3) Introducing binary search for namespace filtering, leveraging the sorted partition key. These patches were contributed upstream to ClickHouse (PR #85535).

Why It Matters

This reveals unexpected performance pitfalls at the intersection of data partitioning strategies and database kernel internals when building ultra-large-scale multi-tenant analytics platforms. Cloudflare's optimization practice sets a benchmark for handling database lock contention and planner efficiency in high-concurrency, massive-partition scenarios, advancing the maturity of open-source OLAP databases for heavy enterprise workloads.
Source: blog
View Original →

💬 Comments (0)