C
Cloudflare
2026-05-28
Architecture Shift Impact: Major Strength: High Conf: 85%

Cloudflare Unveils Unified Data Platform and AI Agent Architecture, Demonstrating Cloud-Native Data Stack Closure

Summary

Cloudflare detailed the construction of its internal unified data platform, Town Lake, and AI data agent, Skipper. The platform, based on Apache Trino, R2 (Iceberg), and DataHub, provides unified SQL access to disparate data. Skipper, as an AI agent, enables natural language queries and is deeply integrated into Cloudflare's own product ecosystem (Workers AI, R2, etc.).

Key Takeaways

To address internal data sprawl, sampled data, external dependencies, and discovery challenges, Cloudflare built the unified data platform Town Lake. Its core is the Apache Trino query engine for federated queries across Postgres, ClickHouse, and Apache Iceberg tables on R2. R2 Data Catalog (Iceberg) manages data lifecycle, DataHub serves as the metadata catalog, and Lifeguard provides dynamic access control based on D1.
The platform adopts a 'default-closed' governance model: new tables require scanning by Skimmer (a PII detector using Workers AI) and manual review before querying; PII columns are redacted by default. Transformer is an ELT engine built on Workflows.
The AI agent Skipper, built on top, allows natural language queries. It leverages multiple layers of context—DataHub metadata, human annotations, code-derived knowledge, curated data models, and live Trino introspection—to generate and execute SQL, returning charts or dashboards. Skipper's tools are accessible via an MCP server or 'Code Mode' (executing JavaScript in a Worker isolate).

Why It Matters

This signals a systematic shift in the control layer of data platforms. Control is moving from dispersed DBAs, analysts, and their specific tooling (SQL, pipeline knowledge) towards a unified platform layer and AI interaction layer composed of the cloud vendor's native services (R2, Workers AI). Value shifts from managing complexity and specific skills to providing a secure, easy-to-use, intelligent data consumption interface. By 'eating its own dog food,' Cloudflare demonstrates its product stack's capability to build enterprise-grade data and AI platforms. This is not just an internal efficiency gain but a critical validation of the depth and completeness of its cloud service strategy, aiming to define the future enterprise intelligent data plane.

PRO Decision

[Vendors] Other cloud vendors (AWS, Azure, GCP) and data platform players (Snowflake, Databricks) need to evaluate the depth and nativity of their AI agent integration with underlying data services, as Cloudflare sets a benchmark with an internally-driven, fully self-stack-built closed-loop example, raising the competitive bar.
[Enterprises] Enterprise tech decision-makers should re-evaluate their data architecture roadmap, paying attention to the 'integrated data platform with AI agent' offerings from cloud vendors, which may simplify operations but deepen vendor lock-in, requiring a trade-off between efficiency and control.
[Investors] Investors should focus on vendors capable of deeply and natively integrating AI agent capabilities into their core product stack. This 'self-contained loop' capability could be a key differentiator for cloud providers to expand TAM and increase stickiness.

Source: blog
View Original →

Get 3-5 key AI infrastructure signals weekly →

💬 Comments (0)