The Iceberg ecosystem today (Anders Swanson)
What can data teams realistically expect when attempting to run on top of Iceberg in production?
The data industry is moving towards open standards. The migration towards open standards throughout the data ecosystem is happening rapidly despite all the oxygen getting sucked out of the room from the rapid progress of AI and agents.
The dbt Labs data team is moving to an all Iceberg lake with a mix of compute engines to power transformation, analytics, and agentic experiences. The team has been able to move quickly towards this architecture because the entire ecosystem has been laying the groundwork for years. All of it’s coming together to make this new open world a reality fast.
On this episode, Tristan discusses the reality on the ground for data practitioners. Where’s the Iceberg ecosystem today? What can practitioners realistically expect when attempting to run on top of Iceberg in production?
Tristan is joined by Anders Swanson, a developer experience advocate at dbt Labs. Anders has spent a lot of time over the years navigating open-source data ecosystems and tracking their progress.
They unpack the open standards shift, define the core building blocks (query engines, object stores, catalogs), and dig into why external catalogs have become a fourth namespace tier across platforms. Anders outlines a pragmatic, phased adoption model for Iceberg integrations, explains why metadata performance and resiliency are hard requirements, and clarifies why vended credentials exist and what they solve.
Please reach out at podcast@dbtlabs.com for questions, comments, and guest suggestions.
The call for papers is open for dbt Summit 2026. We invite data practitioners, platform leaders, and executives to share real stories of how data gets done at the world’s largest gathering of dbt community members. If you ship fast, reduce costs, improve trust, or bring governed AI to life, the dbt community wants to hear from you.
Coalesce is now dbt Summit. Join the world’s largest gathering of dbt users, where data leaders and practitioners come together to shape the future of data analytics and AI.
Listen & subscribe from:
Key takeaways
Tristan Handy: I wanted to have you on because of work you’ve been doing internally to summarize the state of the Iceberg ecosystem. We’ve talked about Iceberg a bunch lately with folks deep in specific parts. Your work is more of an overview: where we’re at with platform integrations, what’s easier now than a year ago, and what’s still hard. Before we dive in, I want to define a few terms. When you say “query engine,” what do you mean?
Anders Swanson: It’s the thing that does your work. When you issue a CREATE TABLE or a SELECT statement, it’s what returns data or stores it somewhere for later.
Object store.
It’s the cloud service where you can store an object. An object is anything: a blob.
Catalog.
In this context, a catalog knows what tables and views exist and where they are, and how you can fetch or write to them.
Let’s talk internal versus external catalogs.
An internal catalog is what you get by default in a system like Snowflake or SQL Server. An external catalog is more like another directory, often managed by a different system. As you connect more disparate platforms, you can’t assume one system controls everything.
The complexity comes from duplication. How do you make namespaces unique? Can you plug in many external catalogs?
Abstraction matters. A common pattern emerging is one‑to‑one mapping of an external catalog into a database. That pushes a move to a four‑part namespace: catalog, database, schema, identifier. Spark moved toward this; Databricks Unity Catalog and Snowflake‑style catalog link approaches are in this family.
So the downside?
The devil is in the details, especially metadata performance and resiliency. For example, information schema listing. Users expect listing tables to be fast and reliable. In a federated world, if listing tables takes five seconds, users blame the vendor they’re using—even if the external system is slow. DuckDB draws a line by not mixing external catalog tables into information schema listing today. Snowflake’s catalog link databases appear to cache or mirror metadata so it feels as performant as native tables.
With catalog link databases, Snowflake is doing mirroring.
Yes. Mirroring exists in different flavors across platforms. Delta is sometimes seen as “simpler” because metadata can live in object store, but as soon as you want multiple engines writing, you still need a real catalog.
Sharing across multiple platforms adds another layer. What’s the state of platforms reading and writing to the same Iceberg catalog?
There are phases of integration.
Phase one is the naive approach: you have Parquet and JSON in object storage, and an engine reads it. Reading is easier than writing. You can get a toy example working.
Then you run into versioning and “what’s latest.” The next phase is connecting to an Iceberg REST catalog so engines can ask for the latest table version without users thinking about paths.
Phase three is schema‑scale: it’s never just one table. You need discovery of new tables, keeping schemas up to date, and eventually things like multi‑table transactions.
This maps to dbt Mesh and cross‑platform mesh. Producer vs consumer.
A consumer‑led model requires the downstream team to create pointers (DDL) to external tables. It’s operationally messy. Producer‑led is cleaner: the producer writes to the catalog and it’s just there, immediately queryable downstream.
Are platforms there yet?
Some support writing directly to external catalogs. When it works, it’s great, but there are still kinks. We’re retrofitting race cars designed for isolation to be interoperable without losing performance.
Identity is one of the hairiest issues. Vended credentials.
Vended credentials solve the “two keys” problem. You authenticate to the catalog, the catalog tells you where data lives, but then you need separate object store credentials to read files. Vended credentials means the catalog vends short‑lived credentials so you can access the object store location without managing separate keys.
That doesn’t solve user identity and grants.
Correct. Vended credentials isn’t global authorization. Identity and access across platforms is still hard. Ideally you grant access once and it works everywhere, but enterprises have different identity providers and platforms have different permission models. Today, admins often have to configure grants separately in each platform.
Is this mission creep?
The goal is to reduce how many people have to think about storage details. Big tech had whole data platform teams solving reliability problems in Hive‑era lakes. Iceberg reduces that toil dramatically, but the long tail is still auth, mirroring, and cross‑platform governance.
How does this reshape data teams?
Analytics engineering abstracted a lot of work. Data engineering has also been simplified by replication/orchestration vendors. What remains is the open ecosystem complexity: identity, object store policies, and cross‑platform connections. Many enterprises already have teams with these skills (infra as code, Terraform, Snowflake management), but others will need to grow into them.
Are vendors embracing Iceberg in good faith?
The goodwill and collaboration in the past 18 months feels unprecedented. We’re getting “more problems” because we solved prior ones. The industry aligning on standards feels like F1 teams standardizing components so they can innovate elsewhere.
In your internal writeup about Iceberg, you quoted Wolf Hall: “The making of a treaty is the treaty. It doesn’t matter what the terms are, just that there are terms, it’s the goodwill that matters. When that runs out, the treaty is broken, whatever the terms say.” Explain the relevance here.
When I joined dbt, it was taboo to mention one partner to another. Now vendors openly acknowledge mutual customers and invest in interoperability. On the Iceberg repo you see competitors collaborating on proposals. The goodwill is the standard.
Wrap us up with three things you’re excited for next year.
Push‑based catalog updates so platforms can subscribe to changes rather than repeatedly listing and polling. Progress on the small files problem so Iceberg works better for smaller data too. And more platforms supporting writing directly to external catalogs, unlocking producer‑led sharing and cross‑platform mesh.
Chapters
00:00:00 — Intro: why open standards are accelerating
00:01:20 — What practitioners can expect from Iceberg in production
00:05:00 — Lightning round: query engine, object store, catalog
00:06:20 — Internal vs external catalogs
00:09:30 — The “four-part namespace” and catalog-link style abstractions
00:11:30 — The downside: metadata performance, resiliency, and caching
00:17:10 — Sharing across multiple platforms: reality and tradeoffs
00:19:10 — Iceberg integration phases (1: naive table, 2: REST catalog, 3: schema-scale)
00:24:10 — Producer vs consumer model and cross-platform mesh
00:29:10 — Identity and “vended credentials”: what it is and what it isn’t
00:33:30 — The hard unsolved part: grants and global identity across platforms
00:37:00 — Is this mission creep? What Iceberg is optimizing for
00:39:50 — How roles on data teams evolve in an open ecosystem
00:43:40 — Are vendors genuinely aligned? Why Anders is optimistic
00:46:50 — “The making of a treaty is the treaty”: goodwill as the standard
00:51:50 — Three things Anders is excited for next year
This newsletter is sponsored by dbt Labs. Discover why more than 80,000 data teams use dbt to accelerate their data development.


