The Analytics Engineering Roundup

I built a (very small) agent swarm.

Tristan Handy — Tue, 26 May 2026 15:58:51 GMT

I recently got nerd sniped while recording a podcast episode. You’ll hear the episode in the coming weeks, but it’s with Jordan Tigani of Motherduck. Toward the end, I quoted a line from one of his recent posts back to him and asked him what he meant by it. Here’s the line:

When it all shakes out, my bet is that there ends up being one form factor that people settle on. It will consist of an agent swarm for data management backed by a query engine for doing the actual analytics. Agents can handle change and adapt the system in real time. They can prepare insights directly for users.

The question I asked him: “what do you mean by ‘an agent swarm for data management’?” Because any time I see the term “agent swarm” I immediately find myself being skeptical. I mean of course it sounds so fun! Who wouldn’t want an agent swarm doing their data management? But … what did Jordan actually mean, and how much substance is there to this?

First, I’ll tell you my paraphrased version of Jordan’s answer. Then I’ll tell you about the (micro) agent swarm that I built and what I learned.

Jordan’s Answer

I’m probably going to get this a bit wrong but hopefully in the direction of charity. My understanding of Jordan’s answer is essentially this.

Data management is just a series of a whole variety of small tasks.

Profile a column
Document an object
Debug a pipeline

Etc. You could probably list 20+ things like this without having to think very hard. What we all do as data practitioners is we combine our context and skills and loop through a bunch of tasks that look … basically like this.

The idea of data-management-as-agent-swarm is simple: create a set of agents that have the right skills and context to do each one of these tasks, and then turn them loose on your data stack. Rather than being reactive and fixing problems as they are identified, these agents should be always observing the things that they are responsible for, proactively detecting problems, and then proactively fixing them. While this likely starts with humans in the loop, but over time it evolves to being autonomous.

There are really two immediate, gut reactions that most of you will have to this idea. Many of you will think “that’s reductive, there’s more complexity happening in my day-to-day job than this.” But many others will think “yeah that feels obvious, of course that is the way this will go. GLHF to anyone who thinks they’re going to be paid to write column documentation in 2027.”

The difference between those two reactions says more about your default internal attitude toward AI and less about the usefulness of data management agent swarms. I will confess that I am, constitutionally, more on the ‘AI optimist’ side—I find it both compelling and likely that we will live in such a world in the next 12-24 months. Or at least some of us will.

But I am also self-aware enough at this point to not get overly attached to a fleeting obsession. I have been proven wrong by the universe too many times.

So, rather than employing soaring rhetoric to attempt to get you excited about agent swarms, I will resort to pragmatism: let’s experiment!

Nerd-sniped into building a proof of concept agent swarm

After this conversation with Jordan, I became somewhat obsessed with this idea. It wasn’t brand new—certainly others have hypothesized the same thing. But this was the time that it got lodged in my brain. I decided that I wanted to see just how hard it would be to at least get started constructing something like this and whether or not it could provide real utility.

Fortunately, I already have the basic infrastructure set up. I have become at least modestly adept with my agentic scaffolding and workflow, so I could get out of the gate quickly. And I didn’t have infinite time for this experiment; I time-boxed it at ~8 hours of calendar time (for now).

But what to tackle first? According the definition I proposed earlier, there are a whole heterogeneous set of tasks that make up ‘data management’. And you gotta start somewhere.

The place that seemed to make the most sense to me was straight up data observability.

What I learned

The first thing I learned was that most of the agents in this type of multi-agent system, regardless of what they are responsible for, are going to need a fairly consistent set of context:

standard profiling information (i.e. descriptive statistics for columns)
dbt metadata
query history

(There are probably a few more things you could imagine adding to this list, like git commit history and production run logs, but I didn’t need those yet.)

Of my 8 hours allotted to this task overall, I spent a big chunk of it simply building up this context. I used Codex for this, with GPT-5.5 default fast. Pointed it at our internal analytics dbt project. Focused it on two tables to start, a fact table with one row per run ever executed in the dbt orchestrator, and a dimension table with one row per paying customer.

It was fairly straightforward to build the actual profiler, and the quality was high. Codex had no problem building some scaffolding to use dbt show to execute queries (using dbt as a connection broker like this eliminates the need for multiple connection pathways to the data warehouse). GPT-5.5 made good decisions about standard profiling metrics that should be collected. It spun up a bunch of workers to execute all of these queries, it wrote a markdown file with all of the results (both table- and column-level context). I then enriched this with dbt metadata and query history data. All fairly straightforward, even if it did take a bit to get it all wired up.

Then, I’m reading through the markdown files produced as an output, and I realize that the very first use case for all of this context gathering could actually be: filling in documentation! dbt’s “description” fields have become the standard way of storing narrative descriptions of tables and columns, and having gathered a bunch of information about these objects it felt obvious that the first step should be distilling all of that (voluminous, often-dry) context into something useful.

So the very first “useful” thing I built was a description agent. Find descriptions that were blank (or low quality) and update them using all of this assembled context. The most compelling thing about this use case is that, in improving descriptions, we are improving our ability to build subsequent agents, as they will all use these descriptions as their own context.

The first cut of this was … fine but not great. Straightforward objects got documented well, but that’s not exactly adding a ton of value. So I honed in on a specific column that I personally didn’t have any knowledge of and said “if the agent can successfully describe this column, I’ll be happy.” The name of the column was is_duplicate_customer and it’s boolean. The first generated description was exactly as you would expect: something like “is this row duplicative of another row in this table.”

Of course, that’s useless. Why would there be duplicate records / why would we keep both? What should I do with this record—should true values be filtered out? Tell me something useful!

So I did a few iterations to see what I could do better. First, I made sure that the context markdown file included the associated dbt code for each column. Second, I gave the documentation agent the ability to get curious, ask questions, generate SQL to answer them, and execute that SQL against the warehouse. Here is the response that Codex gave me after this iteration:

In research mode, the agent now proposes focused follow-up questions, writes safe read-only SQL, runs it through dbt show, stores the results, then uses that evidence in the final descriptions.
The IS_DUPLICATE_CUSTOMER result is now much better:
out/documentation/docs_20260522T110012_fc958985/suggestions.md:189
It now explains that duplicates come from concurrent Stripe and Metronome accounts sharing the same customer_id, and that the model filters to keep the non-Metronome record.

I won’t bore you with more details than that because they’re very idiosyncratic to our payments business logic, but the point is that the full text is exactly the type of response I would have hoped for. It’s not simply expanding the column name into a longer string of text, it is gathering a bunch of information, thinking, asking followup questions, and synthesizing the results into concise prose that can then be stored and used later for subsequent agentic processes.

In the grand scheme of things, this might feel small. I mean, how much time and energy do you really spend on documentation day-to-day?

But I think this is kind of the point. The idea of the data management agent swarm is not to solve some grand challenge, some truly hard thing. It is that all of our jobs are composed of this whole litany of small things, none of which are that hard. And if we decompose them, teach agents to do each one, and then somehow coordinate these agents on how to work together, well… that could be really useful.

The reason that not every object has a high-quality description today isn’t that we’re incapable of writing them, it’s that there is a tremendous amount of contention for the time of humans with the knowledge and skills to write them. If you, task by task, eliminate this constraint, I think we will start to see the data systems that we have been building over the past 10+ years skyrocket in quality and utility—which is exactly what we need if they are going to be the foundation for agentic systems.

More optimistic than when I started

In 8 hours I was able to build something actually useful. Certainly not ready for production, but at the same time with a few key insights and patterns that provided real value. I fully plan on deploying this internally, and pull on the thread further by building out more tasks.

After this experience, I’m more bullish than ever that this will be a reality we all live in in the not-too-distant future. I honestly don’t think there are any truly hard computer science problems to solve here, just a combination of existing technology, experimentation, and domain expertise. Very doable.

I’m excited to see more working code and empirical results. If you’re doing work in this space, ping me.

- Tristan

What data agent benchmarks do and don't tell us

Jason Ganz — Sun, 17 May 2026 13:40:19 GMT

I spent the last week at AI Council (formerly Data Council).

Last year’s conference was for me, a very strange experience as it represented the seemingly irreconcilable grafting together of two very different types of systems - data infrastructure, where we painstakingly assemble workflows, best practices and highly specified technical specifications, with LLMs, AI and agents, a rapidly changing, highly uncertain and highly strange technology.

This year’s AI Council proved that the inherent tension isn’t going anywhere, but at the same time we’re starting to understand how we can, and can’t merge these two systems. Where they’re the same, where they’re different and where they rhyme.

Everyone is an AI Infra Company now

One of the most obvious takeaways is that just about every company at AI Council (very much including dbt Labs, represented by yours truly) sees themselves as an AI infrastructure company now. From the tiniest startups to the breakout stars to the established players, everyone is building toward becoming an infrastructure layer. This took a couple of different shapes, from context provider (we fetch / index / search over the data context that makes your agents more performant) to workflow orchestrator (we kick off / manage / schedule the agents that perform useful actions) to compute (we provide the inference for your agents or the data operations they perform) it’s pretty clear that there’s a set of emerging lanes here that are similar but different from the lanes in the CDW era and everyone is looking to see where they fit.

I spent an evening talking to a founder who is building a database for storing physical logs from robotics fleets. He walked me through ingestion rates, the cardinality of his query workload, the strangeness of dealing with timestamped sensor streams that arrive from a moving piece of hardware. At no point in the conversation did we need to figure out which side of the AI-versus-data line his company sat on. Nobody at the conference seemed to need that line.

The signal of the shift is everywhere if you look for it. There are now databases being designed for LLM-shaped workloads from scratch, rather than being retrofitted from older analytical stores. LanceDB is the example I keep coming back to when I describe this to people, an AI-native multimodal lakehouse with a fresh Series A in the bank and a storage format designed for blob reads and embeddings rather than columnar BI (listen to Tristan’s conversation with LanceDB CEO from last year here).

The data and analytics track, which a year ago was the part of the building that felt most stubbornly insulated from the AI noise, was this year the part where the shift had landed most fully.

How good are these things anyway?

A particular special interest of mine is benchmarking how good agents are at doing data work - both answering natural language questions on top of your data as well as building out your dbt projects and data pipelines.

Benn and I gave a talk on the state of benchmarking agents and also why current efforts at benchmarking (even ours) fall flat when compared to actual performance of your agents.

I started with a run through of significant moments in benchmarking from the last several years.

Juan Sequeda et al’s 2023 paper which showed that LLMs can answer questions on top of data and that ontologies outperform text-to-SQL
Our initial replication which showed that the dbt Semantic Layer increases reliability for LLM-generated queries
Our 2026 rerun that showed near perfect performance on the Semantic Layer and substantial improvement in text-to-SQL
Our release and writeup of ADE-bench which goes beyond the answering of data questions and for the first time benchmarks the ability for coding agents to actually build your data pipelines

That brings us to today. What we know is that on well-specified tasks where the information to answer them is there, models are good and getting better. And thanks to Opeyemi Fabiyi’s excellent recent experiment we actually know what specific activities give the biggest uplift in how well agents can operate on top of your dbt project.

But if I’m being honest, it still feels like there’s more we don’t know than we do know about how good these systems are in practice today. There are important ways that we are both underestimating and overestimating their practical utility. And I’m going to get there - but I want to talk about the other analytical benchmarking talk from the conference because it also helps fill in an important piece of the puzzle.

Izzy Miller also built a benchmark and it’s brilliant (although not accessible to the public just yet). His thesis is that to actually measure how good agents are at doing analytical work, you can’t just ask them a question, you have to see how they perform in a real-life environment. Even ADE-bench, which measures messy real-life data problems, is not an iterated game - it’s a series of disconnected tasks. Izzy built something different - a full, 90-day simulation of a business complete with tasks that build off of each other.

This gets at one of the two most important ways real-life deployed agents operate differently than how we’re benchmarking them today - statefulness. In most benchmarks, learning from your mistakes, making a mental note and then solving the problem differently the next time is considered cheating. But in the real world, where we’re actually trying to solve real problems we call that … being good at your job. By having a benchmark that is a simulation that runs over time, you are able to see if an agent can learn from its mistakes, fix underlying issues and get better over time. This is much closer to how data agents are going to be operating in the real world.

There’s one more thing though that is challenging to benchmark for your data agents, and it gets back to everyone’s favorite word - context.

When I run a query in Cowork today, it has access to:

Everything that’s in my dbt project

GitHub
Slack
Notion
And more (task management like Jira, email, customer support tickets etc)

Remember - Tristan tells us that soon, agents are going to be consuming dramatically more data than humans. But they aren’t going to just be in isolated environments, they are going to do it connected to all of your other systems.

They’re going to check your dbt project, query your database, look at the commit history on GitHub, read the Slack thread where the last similar request happened.

My guess is we’re dramatically underestimating how good these systems can be by testing them in sandboxed environments, when in reality we’re getting much closer to building an integrated ecosystem in every organization. Obviously there are governance and security issues to figure out here, but the benefits of having OBPOC (one big pile of context) are too big to ignore.

Optimizing Agent Workflows

If last year’s conference was about the will-they/won’t-they of analytics and AI, this year’s is making it clear that the answer is a resounding yes. The agentic revolution is well underway and teams everywhere are racing to adopt these systems. They are going to keep getting better, the infrastructure, the harnesses and the models.

One of the big things I’m going to be watching over the remainder of the year is a shift towards token efficiency - the topic of a fantastic panel moderated by Bryan. For most organizations, we’re still distinctly in the experimentation phase of AI and I’m actually more positive on token leaderboards than most, in that winning the cultural shift towards actually using models for more things is one of the most important things you can do right now.

But these things have a physics and spend is already at atmospheric levels. Even if you don’t expect spend to go down (I don’t), in a world where we remain compute constrained for the next several years at least, there is going to be demand for more agentic work than can be done by the biggest, most expensive models. And that’s where optimization comes in.

This is a very common pattern in dbt - you start your project as views with your marts materialized as tables, and then shift towards incremental models to reduce the work your database is doing. Critically - you get the same data you would otherwise, but smart optimizations get you there much cheaper.

I expect this to start to become a major theme for data teams later this year - not just delivering high quality agents, but delivering them in a way that maximizes impact while minimizing data warehouse and token utilization.

So where are we now?

Data agents are good and getting better. We know that improvements like organizational context and memories improve their output, but we don’t have any great mechanisms for tracking them. And meanwhile, we’re building the systems that will create massive numbers of workflows at our business in the future and we should be thinking not just about building the best agents, but building them efficiently.

The computers talk to us now

Jason Ganz — Sun, 10 May 2026 15:00:58 GMT

It’s very easy to get lost in the details. How does new agent framework X perform on coding benchmark Y? Should you use the biggest frontier model for your tasks and have a smaller model call in the big model as an advisor? What’s the deal with the goblins?

We live in a world of ever increasing noise and more and more different threads pulling at our attention. I know this because not only am I uniquely susceptible to having my attention pulled, but I also spend a lot of my time thinking about how to get your attention, dear reader. And I do that with the noblest of goals, of course. To inform you on the state of analytics engineering. To tell you about the latest in data agent benchmarking.

But you multiply this by 100, 1,000, 10,000 and it becomes kind of hard to focus on the big picture.

And the big picture is that the computers talk to us now!! Beyond talking to us, they’re doing real and useful work. Writing code, answering data questions, connecting our systems together. They’re getting better rapidly.

If you go back even five years, basically everyone would have told you that we’re decades away from being able to have a real conversation with a computer. To ask it to write usable code. To answer questions for us.

But we’re here and it’s happening! I know it probably sounds like I have some deeper point that I’m getting to, but actually the most important thing is the obvious thing that’s staring us straight in the face. The computers talk to us now!

It’s easy for me to get lost in the day to day - in fact the title of this roundup is taken from a conversation I had with a friend recently. I asked him how his new job was going. He just looked at me, paused for about five seconds and said “Jason, the computers talk to you now”. And then repeated it about five times. Was it a little annoying? Yes. But it got the point across - most people, even though we are talking about AI ad nauseam are not fully absorbing the weight of the fact that the computers talk to us now.

Our systems were not built for a world in which computers talk to us

I was reminded of how fast this has happened the other day.

We have been working on updating a piece of dbt internals that has not been seriously revisited in a couple of years. Drew Banin originally wrote some of it back in 2020. The code does what code does. It is well-considered. It has been quietly running, somewhere in someone’s project, every single day since.

The thing that struck me was the design lens it was written through. In 2020, Drew was thinking about how a human analyst would interact with that code. Where the documentation would land. How the affordances would feel. What the error messages should say to a developer staring at a terminal at 4pm on a Tuesday. That is the right design lens for 2020. It is the only design lens that makes any sense in 2020.

And now, in the same span of time it took us to circle back and revisit it, the audience has expanded. We are still designing it for humans, of course. But we are also designing it, very seriously, for robots. For agents reading the docstrings as part of their context window. For coding tools generating dbt projects on the fly. For natural-language interfaces translating English into configuration and configuration back into English. The set of entities, human and otherwise, who interact with that code has grown in a way that was not on anybody’s whiteboard the first time around.

The joke I have been making about this, which is not really a joke, is that it took us long enough to come back to that file that we crossed an entire era while it sat there. Drew wrote it for analysts. We are revisiting it for the computers that can talk to us now.

That is a small story about a small piece of code at one company. It is the same story playing out, all at once, in approximately every serious software project in the world, on a timescale none of us have lived through before.

“But it doesn’t really understand you”

You can argue with the framing in any number of ways, and people do, all the time. I have sat with the objections long enough to want to walk through them, because dispatching them is part of the point.

You could say: well, SQL is a language for talking to computers. We have been talking to computers for decades. True. The conversation has just been a bit one-sided. The computer understood SELECT, and you understood that it understood SELECT, and that was the entire vocabulary. The new thing is not that we have a way to address the machine. It is that the machine has acquired something close to general comprehension of what we mean.

You could say: they are not really talking. They are doing very fancy averaging over a very large pile of matrix multiplications. Also true, and also not the most interesting frame. The flight is real even if you can describe the wing in terms of differential equations. Decomposing what the model does into linear algebra does not change what it does on the other side of the screen, which is to read your data, write the code, take the action, and answer the question.

You could say: the demos are misleading and the systems break in production. I want to take this one more seriously, because it is partially correct, and it is the form of skepticism that has aged best. We have spent a lot of time in the Roundup walking through where the systems still fail, where the assumptions go quietly wrong, where the correctness boundary sits on the other side of someone’s tacit knowledge. All of that is true. None of it changes the underlying fact. The systems are not perfect. They are also not the point. The point is that they are here, that they are useful enough to use, and that the slope they are on is steep.

I can hold all of those caveats in my head and arrive in the same place. You can, today, ask a computer to do real, substantial, previously-skilled work on your data, and it will do a meaningful version of it, and the version it does will be better in a month than it is today. That is the level at which the change is operating. And that is the level at which I think it deserves to be discussed once in a while.

And to feel the awe associated with that.

Living in Future Shock

What I keep landing on, when I let myself sit with the larger zoom out, is that we are living inside something close to a continuous state of future shock, and we have started to treat that as normal.

Future shock was Alvin Toffler’s term for the disorientation that happens when too much change arrives too quickly for the social and psychological structures around it to absorb. He was writing in 1970, worrying about color television and the moon landing, which is funny in retrospect. The underlying observation holds, though. There is a speed at which change can arrive that exceeds the human bandwidth for digesting it. And when you exceed that bandwidth, you do not feel awe and you do not feel terror. You feel a low-grade numbness you have to actively wake yourself up out of.

I think a lot of us are walking around inside that numbness right now. The computers talk to us now is so true and so strange that it has stopped reading as strange. You hear someone show off a working agent doing previously-skilled work and you nod and ask follow-up questions about cost. You watch a finance partner sit down with Claude Code and write a week’s worth of models in an afternoon, and the most surprising thing about the conversation is that you are not surprised. You read a sentence like “the computers talk to you now” and you have to be reminded, by a friend who does not work in data, that it is a sentence worth pausing on.

There is something a little funny about how casually we have all absorbed a fact that, six years ago, would have been the lead story in every newspaper for a week. That casualness is not a moral failing. It is what humans do. We adapt, fast, and that’s great.

But also we remember. And we should remember this is highly unusual.

This is going to change how we work, how organizations work, how the economy works

The contingency runs through every conversation I have these days. The career trajectory of an analytics engineer entering the field in 2026 is not the trajectory of one who entered in 2018. The shape of a data team in five years will not be the shape of a data team today, and I know this because the shape of a data team today is not the shape of one I worked on five years ago. The documentation we wrote for ourselves two years ago is now being used to drive measurable increases in agent efficiency.

None of that is a forecast. It is a description of the floor under our feet. And I think the people who do their best work in the next stretch are going to be the ones who can continue to feel the weirdness and awe that these systems should evoke, while at the same time approaching them rigorously, methodically and empirically. The frameworks will change. The models will change. The agent harnesses will change. The fact that the computers talk to us now will not change. It will only become more true.

Jason

BI’s Second Unbundling

Tristan Handy — Sun, 03 May 2026 11:03:06 GMT

Have you built a dashboard, or even just a chart, in Claude Code? I have. And I’m not alone. It is turning out to be an incredibly common modality.

Anthropic recently launched live artifacts in Claude. You can now build a working, data-connected dashboard in a 15-minute conversation, no BI tool required.

Leave aside, for a second, the correctness part—there has been plenty of ink spilled on the topic of creating trustworthy analytical outputs from an LLM—and let’s just focus on the analytical workflow for a second.

The ‘create charts in Claude’ workflow is better in a bunch of ways than the ‘hit lots of buttons in a chart builder inside of a BI tool’ workflow, but it’s clearly still in its infancy. Very few companies (although I’ve spoken to a couple!) are deciding to trash their BI tool in favor of a purely vibe-coded approach.

So the question I want to ask is: what direction is this headed in? What should “next generation BI” look like? (Is there even such a thing as a BI tool in this world?)

Let’s start with the historical perspective.

BI has always been a bundle

When I started in data, BI tools were full-stack. Everything happened inside one product: data ingestion, transformation, compute, caching, semantics, visualization, identity. The BI tool was the data stack. MicroStrategy, Cognos, etc: they’re not just visualization tools, they’re integrated data platforms.

Then the modern data stack happened. From ~2015 to 2022, the infrastructure layers of that BI bundle got pulled out and turned into purpose-built infrastructure. Compute went to the Big 5. Ingestion went to Fivetran. Transformation went to dbt. The BI tool was left with: visualization, interactive analytical interfaces, semantic definitions (sometimes!), identity and access management, and web hosting. You could probably squint and see a few more, but I think the simplification works for our purposes.

That’s still a lot. But it’s much less than it used to be. This was the Copernican Revolution of BI—the universe just doesn’t rotate around it any more.

If 2015-22 was the first unbundling of BI, the second one is happening right now.

A false start, and then a real one

Two companies have been predicting the second unbundling for several years. IMO they were basically-right but early.

Both Evidence.dev and Hashboard have been building “BI as code” for a while. In the briefest possible terms, I would describe both products as SQL plus markdown, version controlled, that deploys like a web app.

To my knowledge, both have gotten only modest traction, despite working on the problem for years. My read is that the authoring environment of a BI tool (point-and-click, immediate visual feedback) was native to the analyst workflow in a way that hand-coding YAML simply was not. Defining dashboards in code is vulnerable to the same problem as designing charts in Matplotlib; there’s just something really weird and unpleasant about writing 20 lines of config to make a scatter plot.

But it turns out that, maybe, both tools were just ahead of their moment. Because now, the front end of everything is turning into coding agents.

Analysts are shifting left. More and more, their primary interface is an agentic coding environment, not a drag-and-drop GUI. When you spend your day in Claude Code, “generate me a dashboard YAML file and render it” starts to feel more natural than opening a new tab in your browser and clicking a bunch of buttons.

Reconstituting BI as front end engineering

My read on the technical solution: the next generation of BI looks a lot like the modern frontend ecosystem. Looking at the same components we identified before and how I think they map:

Visualization → React charting libraries
Interactive controls → pluggable React components
Semantic definitions → MCP servers provided by infrastructure vendors: dbt’s MCP server, Snowflake Semantic Views, etc.
Database connectivity → ADBC / Arrow Flight
Hosting → Vercel / Cloudflare

But the availability of these discrete components isn’t enough. Yes, you could plug something together like this in an afternoon. But, as a data analyst, I have zero interest in navigating the React ecosystem, choosing between charting libraries, selecting a date picker component, and all of this other random-crap-that-is-not-about-my-business-problem. Yes I want the power of all of that—assuming it’s all OSS, it would let me have a lot of control I don’t currently have inside of my current-generation BI experiences. But I don’t want to get bogged down at the outset. Just let me get to work!

This is where there is a gap. The tech all exists, and the new usage pattern (agent as front-end) is becoming clear. What’s missing is an integrated, usable solution that has been purpose-built for analysts.

This is what I’ve been thinking a lot about. What does this minimum viable next-generation BI tool look like? I think it includes:

a dashboard format spec, with agent skills on how to write it
dashboard files, declared in YAML, likely living alongside your dbt project
a lightweight renderer that reads that YAML and produces an interactive page

You write the YAML via your agent. The renderer turns it into an interactive experience. This is closer to what Hugo and Jekyll did ~15 years ago. Known as “static site generators”, you use them to write content in a defined format (Markdown files with YAML frontmatter), point a renderer at them, and off you go.

Here’s the format, here’s the renderer, you write the content.

The hard part

There’s one piece of BI that doesn’t collapse easily into the frontend ecosystem: identity.

If you were doing data work in the 2010s, you remember the Jupyter notebook problem. Notebooks were brilliant for analysis and a nightmare for sharing. You’d build something genuinely useful, then spend three times as long figuring out how to get it in front of people who didn’t have a Python VM on their machines, didn’t have their own database credentials, and had no idea what a kernel was. The gap between “I built this” and “other people can use this” was enormous.

BI tools solved that gap. They built permissions models, row-level security, SSO integrations, etc. They gave you a URL you could send to the regional leader and know that she’d see her region’s data, not the whole company’s.

This is non-trivial. It’s not what we typically think of when we think about BI, but it’s a lot of work and it’s genuinely useful. Basically every BI tool has had to build this same functionality.

dbt intentionally punted on this. We said: if you’re doing transformation work, you need credentials in the underlying database, and those credentials define your identity and what you can do. That’s a convenient shortcut at the transformation layer, but it doesn’t work at the presentation layer. Most BI users don’t have an account on the underlying data platform. While there are arguments for why that should change, it’s been this way for multiple decades at this point and I’m not sure we should bank on it shifting.

So: in a world where agents become the front end for development, identity continues to be a persistent moat. I don’t think that necessarily puts current BI tools in a great position to maintain their position. Someone will need to act as the identity provider in this new world but, like many platform shifts, the board has been overturned and we’ll have to see where the pieces land.

Software engineering started eating data a decade+ ago; it just took a little while to get to the presentation layer. But, IMO, this will be a change for the better. There are SO many workflow improvements that front end engineers take for granted that will be tremendously helpful in the day-to-day workflow of a data analyst. I’ll have more to say on that in a future issue.

- Tristan

A Dispatch from the Jagged Frontier of Analytics Engineering

Jason Ganz — Sun, 26 Apr 2026 11:03:14 GMT

We’ve been very big picture over the last several roundups. Agents will be the primary consumers of data, it’s time to move up the stack, etc etc etc. That’s all still true, but I think it’s also important in this time to be extremely grounded in what the reality of the current moment actually is.

So today I’m going to go over, firsthand, the current areas where coding agents succeed and fail in complex analytics engineering tasks. For this I’ll be walking through a hands on case study, but if you’re interested in how we’re doing this from a benchmarking perspective I recommend reading Benn’s recent post on building a better data agent benchmark.

It all started when I got on a call with Benoit earlier this week to talk through some modeling work he’s been doing as part of a deep dive on how to improve the dbt MCP server. I asked Benoit to spend some time with our event data and work on sessionizing MCP usage as well as categorizing the different sessions, so that we could get a sense of what the patterns of engagement are.

This was a thorny task and it hit right on the edge of “a seasoned analytics engineer is sped up by the agents quite a bit, but the agents certainly could not do it out of the box and if done naively could have actually caused some real issues”.

It’s a great illustration of Ethan Mollick’s jagged frontier as applied to analytics engineering: the idea that LLMs have uneven distributions of how good they are at various tasks and that it’s nonobvious where the demarcation line is. The output looks confident and plausible on both sides of it, which means you can’t tell the model was on the wrong side of the line until you go check. That general shape is true for basically every field people are using LLMs in right now.

What is interesting for AE work is that the specific shape of the frontier looks different from what it looks like in, say, software engineering. The peaks sit in different places, the troughs sit in different places, and the ways you work around the gaps require data intuition.

The shortest version of what Benoit and I saw on our call: today’s models are very good at the standard nuts and bolts of analytics engineering work. Yet they break down when there are unknown unknowns in the source data, when the solution is outside their current frame of reference, and when performing complex operations across a large DAG.

What works great today

The strongest thing models do in AE work right now is the kind of modeling task that fills the middle of most of our weeks. Benoit’s current project is a good illustration.

A little context. Benoit has been digging into the data we have available to us from the dbt MCP server, trying to figure out how people are actually using it. What flows show up most often, which ones are long, where tool calls cluster, where users drop off. One of the first things you need for an analysis like that is sessionization: grouping the individual tool calls into units of related activity so you can reason about them as coherent sessions instead of as an undifferentiated stream of events. The raw data doesn’t ship with that grouping, so you have to derive it from the timing of the calls, setting a threshold on how much idle time counts as the break between one session and the next.

Sessionization is a very standard dbt modeling exercise. You order events per user, compute the time gap between consecutive events, flag a gap above your threshold as a session boundary, and then propagate a session ID down the ordered list. It needs window functions, some lag and lead logic, and a handful of CTEs stacked on each other. It isn’t complicated, but it is fiddly to write, and getting one CTE subtly wrong cascades into garbage for the rest of the model.

Benoit’s agent wrote the whole thing in one shot. In his words, the LLM “one-shotted it with the five CTEs needed to do the lead, the lag, add the session ID, and everything.” The SQL was clean, the logic held up, and the result was a usable model he could build on.

The pattern here goes well beyond sessionization. The same model that one-shots a sessionization build will also one-shot a slowly changing dimension table, a deduplication model, a rolling-window aggregation, or a reasonable first cut of staging models from a well-described source. The tasks that sit in the middle of an AE’s day, the ones where you already know what you want to build and the work is mostly getting through the fiddly SQL of actually building it, are, in April 2026, within range of the agent doing most of the keystroke work for you.

What changes when that becomes true is surprisingly hard to appreciate until you’ve spent a week in it. For better of for worse, you start writing models you wouldn’t have bothered with before. You try three versions of a transformation instead of committing to the first one that comes to mind. The middle of your workday starts to feel exploratory in a way it hasn’t in a long time.

And the modeling speedup is only part of the shift. Once you have an agent that knows the shape of your dbt project, a bunch of adjacent work collapses in the same direction. Data profiling, the kind where you’d normally write a quick throwaway query to check nulls or look at a distribution, drops from a three-minute exercise to a ten-second one. Benoit: “I can just ask it, hey, this column looks weird, check how many nulls there is in this stuff, and if there is a date from which it started to get null or not. I would have written this query in three minutes. Well, it does it in ten seconds.”

Schema exploration becomes a conversation rather than a query-writing exercise, and cross-referencing against other models in the project, if you ask, becomes something the agent just does. At one point on our call Benoit mentioned to the agent that a mystery ID might be a service token rather than a user ID, and it went and found the completely separate service tokens model in the project, confirmed the hypothesis, and traced it back to the account making the call. None of that is magic, but all of it changes the feel of doing the work.

Beyond the edges of the jagged frontier

Now the other side of the frontier.

The thing that tied together the areas where the agent struggled during Benoit’s work was that the task required something the agent couldn’t get from the code or the docs in front of it. Sometimes that was company-specific knowledge about how our systems are actually shaped, which doesn’t live in the dbt repo at all. Sometimes it was reconciling data across sources that had grown up with slightly different assumptions and nobody had ever forced into agreement. Sometimes it was the kind of judgment call about what a piece of data actually represents that a thoughtful human analyst would pause on, and that the agent, right now, doesn’t.

The example Benoit and I kept coming back to is a user ID story. We have an internal event pipeline that ingests events from tools like the dbt MCP server and lands them in our warehouse. This service emits events tagged with a user ID. Our analytics layer, on the other hand, derives a different user ID by hashing the raw ID with tenant context, because we run a multi-tenant deployment and the raw IDs aren’t globally unique across tenants. Two systems, same field name, different meaning.

The agent didn’t know this, and it couldn’t have. Nothing in the code said so, and nothing in the docs said so either. The knowledge lived with engineers who had worked on the APIs, or with people on the account and support teams who had seen the collisions show up in production. Tacit knowledge, the kind that is invisible until it breaks.

The SQL the agent wrote was clean, the joins were wrong, and the tests all passed. The bug only surfaced when Benoit asked the agent to look at the distribution of user IDs in the data and a small number of IDs turned out to be appearing with wildly anomalous frequency.

The lesson here goes well past the specific bug. Benoit put it cleanly on the call: “the LLM might take some assumptions that, if I had written the code myself, I would have thought about. Maybe I would have stopped and said, okay, I need to check this. But when the LLM wrote it, it looked fine.” The SQL being correct is just one part of the data being correct. Agents are extremely good at the first axis and variable at the second, and the gap between “the code runs” and “the data is right” is where data incidents tend to live. (Of course, if you want to get guaranteed deterministic answers from an LLM - there is a way to do that!)

Some other things to watch out for when doing data work with today’s models:

Cost awareness. Benoit’s agent ran a few LLM-powered model functions without asking him, burning a meaningful number of tokens in the process. It had no sense that those particular calls were expensive, and no instinct to check before running them. That’s small inside a single session and not small in aggregate across a team over a quarter.

Tool-setup friction. Benoit needed to check something in Datadog. If he’d had the Datadog MCP server connected, the agent could have done the search for him, but connecting it would have taken maybe ten minutes and doing the search manually took five, so he did it manually. The local cost-benefit math on proper infrastructure setup tilts against you every time, and six months later you realize you never built any of the connective tissue you were supposed to build. I think a lot of us are quietly making that trade right now.

Shifting left, shifting right

Ok so here’s where we land as of today:

Writing models is easier and faster, perhaps much faster (Benoit estimates a 2x to 3x speedup although as always repeat after me “self reported productivity speedups are often unreliable and need to be verified with other mechanisms”). But it’s hard to believe a speedup isn’t there.

So what does an enterprising AE do?

According to Benoit, the you should anticipate shifting leftwards or rightwards in the DAG “the left part of the DAG requires more knowledge of the company’s data systems. The right part of the DAG requires more understanding of the business, and what should ARR be, and how we should consider a user active or not.”

This is not to say that you abandon the core analytics engineering work! That’s incredibly valuable work that is getting easier to do and your first priority should be doing the things that are easy and high leverage right now.

It is to say that you should know where the likely challenges will land you as you start to get the “easy” stuff under wraps. And of course, the easy stuff still contains plenty of complexity that we’ll write about in the future (reducing model bloat, ensuring your agents follow AE best practices, spend management etc).

So what is Benoit looking for next to make complex analytics engineering work easier?

In six months he’d like to see agents that:

…check their own assumptions about data—or surface them to a human!—before acting on them.
…are aware of which of their tool calls are expensive and ask before making them.
…have lower friction connecting to all of your data across your stack since the more information they have, the more useful they are.

A lot of things moving very quickly! Would love to hear how this compares to hands on reports from all of you. Thanks for reading.

Five things I believe about the future of analytics

Tristan Handy — Sun, 19 Apr 2026 11:01:35 GMT

Here’s what I’m thinking about right now. Over the past few weeks, a handful of beliefs have crystallized for me about where the analytics world is heading. Not data pipelines, not infrastructure, specifically analytics.

I wanted to get them out of my head, so this post is a bit of a brain dump. I connect them all at the end with some conclusions.

Enough preamble. Let’s go.

Thing 1: Analysts are going technical

Vibe coding and coding agents are pulling analysts onto the command line and into IDEs. That’s always been software engineers’ territory. Analytics engineers splintered off into it over the past decade. Now data analysts are joining the party. Our finance team is using Claude Code to build Excel models. Our data analysts are drafting initial analyses there and spending less time in our BI tool every day.

Why is this happening? Two things at once: you no longer have to actually write code to benefit from these tools, and the return on the time invested is enormous.

But data consumers aren’t following—at least, yet. They’re staying in tools that are native to them. For us that’s our BI tools, Notion, and Claude Desktop & Cowork.

The bifurcation is real. Before, it was: data engineers and analytics engineers in technical tooling, analysts and business users in the BI tool. Now everyone except consumers moving into technical tooling. This is an important shift.

Thing 2: Data usage will explode

Think about jobs-to-be-done in three layers: platform, pipelines, analysis. The biggest AI-related disruption is coming at the analysis layer. In fact, it’s not even fair to call it the ‘data analysis’ layer; I have been thinking of it more as the ‘data usage’ layer.

There are real benefits to agents in the pipeline world—the cost of building new pipelines goes down, refactoring gets better, observability improves. All very good, and things that we’re actively working on / shipping.

But these changes pale in comparison to the disruption that will happen at the usage layer.

Over the past decade, there’s been a massive investment in data infrastructure. And it worked. Data infrastructure (platform and pipelines), previously the bottleneck, got dramatically better.

But we revealed a new bottleneck. We didn’t actually make data analysis much better. Analysis is mostly just…thinking a lot, and we hadn’t solved thinking yet.

Lots of ink has been spilled about this under the heading of “ROI of data”—the frustration that all this investment in infrastructure hadn’t unlocked as much value as we all might have wanted. But of course, all of the layers of the stack have to work together to unlock business value.

But if analysis requires thinking, well, that bottleneck just vanished.

Thing 3: Analytic agents are happening now, and moving fast

People are, right now, building agents to do analytic work. And it’s working. We’ve unlocked this at dbt Labs over the past six weeks and the pace of improvement is remarkable.

We’re not alone. Meta recently published a detailed look at their internal analytics agent, which went from a weekend prototype to a company-wide tool used by thousands in roughly six months. OpenAI has built their own too, covering the full analytics workflow: discovering data, writing SQL, publishing notebooks. Our friends at Ramp built Ramp Research.

These are no longer experiments. They’re in production today, and they are being widely adopted because they work.

Thing 4: Agents consume dramatically more than humans

But the data agents that we see in production today are still largely responding to direct requests from humans. That is going to change; agents are going to be put in charge of optimizing processes, and will be given access to data tools to help them.

Imagine I build a Ralph Wiggum agent whose entire job is to scan our dbt usage data looking for product opportunities. It’s going to run a lot of queries, far more than I ever could. It can generate hypotheses faster than I could, write queries faster than I can, follow chains of reasoning faster than I can, and work around the clock. It can also run arbitrarily many copies of itself, whereas there is only one of me.

The queries that this agent executes are interesting because not only is the agent writing the query, but the agent is generating the entire thought process, from hypothesis to test to conclusion. No human in the loop. The only time a human sees the output is when an agent finally says “hm, this is interesting enough to tell someone about.”

I’m going to call these queries agent-initiated, because the agent actually initiates them, it is not simply responding to a direct human request.

The best data point for this that I have today is the absolutely exploding volume of dbt MCP server calls. Usage of the dbt MCP server, a critical piece of infra for data agents, has been growing 50% month-over-month every month since its launch early last year. But even without this data point the conclusion seems obviously true to me.

So: when will agent-initiated queries to the data lake surpass human-initiated queries? I believe some companies have already crossed that threshold. I’m fairly convinced that within 12 months, it will be common. And the line won’t stop there—I think it’s entirely possible we see 100x more agent-initiated than human-initiated queries within 36 months. That might even be conservative.

The implications for this are massive.

Thing 5: Harnesses are a leverage point

A recent paper out of Stanford, *Meta-Harness: End-to-End Optimization of Model Harnesses*, makes a point I think is under-appreciated: the performance of an LLM system depends not only on the model weights, but enormously on the harness—the code that determines what information is stored, retrieved, and presented to the model. And the impact is not small:

Changing the harness around a fixed large language model (LLM) can produce a 6× performance gap on the same benchmark.

Meta-Harness is a system that automates the search over harness designs. IMO the most important finding for the data ecosystem: vertical-specific harnesses (harnesses tuned to a particular domain) significantly outperform generic ones in that domain.

So: a harness designed for data work, with deep knowledge of your warehouse schema, your dbt project, your business definitions, will likely outperform a generic coding assistant by a wide margin. And we can tune these harnesses automatically.

My conclusions

Agents will be the primary consumers of analytic data within 12 months. Design your infrastructure for that now.
Data analysts won’t disappear—their impact is going up significantly. But the job is changing. The analysts who thrive will be the ones who start building and operating agentic analytic systems rather than continuing to ship dashboards. With coding agents, this doesn’t require a whole new set of technical skills, but it does require a reconceptualization of the role. This is similar to this take of SWEs that I strongly agree with.
Analytic data will (sometimes!) still need to be formatted for human consumption—but the creation of those assets is changing. Because all the creators have moved out of the BI tool into technical tooling, the creation of human-facing assets will follow. They will be built from within Claude Code, Cursor, or similar tools that get purpose-built for analytics.
The workflow of a data analyst is going to start to look a lot more like the workflow of a front-end software engineer. Because the systems they’re building are exactly that—the UI layer on top of a massive data-driven reasoning engine. There’s a lot more to say here; I’ll return to it in a future issue.

3.5 years into the current AI wave and things are truly getting fun. Hope you’re enjoying yourself too :)

- Tristan

How to Actually Move Up the Stack

Jason Ganz — Sun, 12 Apr 2026 12:00:59 GMT

Last week I wrote about why analytics engineers are being called, again, to move up the stack. The piece tried to make the case that the work many of us have been doing for the past several years was, in retrospect, preparation for exactly this moment. The primary response was - ok that sounds great. What do I actually do, right now, to set myself up for the coming wave.

Before I get into any of the practical stuff, the caveat I want to repeat throughout this piece: nobody knows how this is all going to play out - and in fact the smartest people have very wide bars on the possibility space for the next few years. We know this train is going somewhere, but exactly where is unclear.

The following is not a perfect plan but it is what I have seen work, mostly from watching people I trust, partly from things I have stumbled into myself. The best set of moves I currently know how to make, offered in that spirit. So let’s get into it.

Timing matters

These transitions tend to have a sweet spot.

If you try to push too early, you have to drag everyone else with you, and that is exhausting and often unrewarding (hopefully we’re past this bit in most orgs).

If you wait until everyone else has done it, the easy wins are gone. The window where it is just slightly early is the window where it feels like surfing a wave instead of getting overtaken by it. (My team has gotten a little sick of me scheduling meetings called “ride the wave” where we prototype AI demos but you know what it works).

We are in that window right now. And it is moving faster than any of the previous ones, which means the window is also narrower than the ones before it.

What that means in practice is a mindset shift, and the mindset shift comes before any of the tactical stuff. It also means making the emotional transition, while this is fun and exciting, it is also scary and highly uncertain. Take the time to sit with that and reflect in it, then decide your action plan.

Your job, starting now, is to move up the stack and figure out how to be impactful in the coming paradigm.

Not as a side project or a 10% time experiment after you finish the your ticket queue. As the actual point of what you are doing. If you are lucky, your organization will recognize this and give you space to do it.

The honest truth is that, in most cases, nobody is going to walk over to your desk and tap you on the shoulder and say we have decided you should spend a quarter learning how to build agentic systems on top of our data. It mostly does not work like that. It almost always requires some amount of courage, or initiative, or both. It means carving out time. It means making this your priority even when it is not the official priority.

Finding Signal in All the Noise

Just as important as committing to doing this is finding good information inputs that will actually help you accomplish this transition.

The signal to noise ratio out there right now is not great If you go looking for “best practices for AI in data,” you will find a thousand posts of varying quality. The patterns and best practices for actual data work in the agentic era are still being built. They are very much still being figured out and still up for debate. The signal is real but it is sparse, and you have to work to find it.

Some of what I would point you at: the OpenAI in-house data agent post, the Ramp data agent writeup, and a small handful of others. Read them carefully. Not skim. Actually read them, and ask yourself what the people who wrote them did differently from what your team is doing right now. Consume everything they put out publicly. Be prolific in how you absorb their work. And then build your version of what they did, scaled to whatever your situation allows.

Depending on the size of your organization, the resources you have, and the political surface area you control, “your version” might look very different from theirs. That is fine. The point is not to copy. The point is to internalize the pattern of how good work in this space gets made, and then, to apply it locally.

The One Thing You Cannot Skip

At the end of the day, nothing beats hands on experience which is why you need access to real production data, with best-in-class agent tooling on top of it, and you need it now.

If you cannot get an agent pointed at production data in your current role, your first priority is either to fix that internally or to find a role where it is possible. I am not saying that lightly. I know that internal security and access controls exist for very good reasons, and I am not suggesting anyone try to route around them. But I am saying that the experience of working with a real data agent on real data is so different from reading about it, or watching demos of it, or playing with toy datasets, that until you have done it you are essentially flying blind on the most important question of the next few years.

Push to get the access. Make the case. Find the security-approved path. Get the budget for the tools. If after a lot of effort you still cannot get there, take that as serious data about the environment you are in.

Case studies in finding projects with an edge

This also sounds a little abstract - I want to give you some examples of how I’ve applied this exact formula over the past few years to move myself, my team and all dbt users up the stack and prepare us for this moment.

I am not a product manager, but it is my job to make sure that you all have the best tools you can have in order to ride the AI wave. So I’ve been keeping my eyes and ears open for things that we at dbt can do to help dbt users adopt AI. By making it my job to do so, even when it wasn’t obvious, by making sure I had the right information input and then by acting locally when the information made it clear it was time to move.

First - it was by proving that dbt is useful for connecting language models to your data to ask business questions.The semantic-layer versus text-to-SQL work we have been prototyping at dbt got a much sharper external reference point when a paper got published putting numbers around the same intuitions our team had been chasing, and we suddenly had something concrete to benchmark against.

Next it was by reading a blog post from Ethan Mollick Now is the Time for Grimoires - it changed how I thought about prompt-as-artifact, and led me to work on building dbt Assist, the first official copilot for dbt.

Next was the first prototype of the dbt MCP server. I vibe-coded it on a weekend because I had been tracking MCP for a while, and then I saw to a talk at Swyx’s AI Engineer conference about MCP, and somewhere in the middle of that talk I realized: we need this, and we need it now. There was no notification. There was no Slack message from leadership saying it was time. The signal came from being in the room, paying attention, and trusting the prickle on the back of my neck when I felt it. And then being fortunate that talented engineers across the company picked up my half-baked prototype and turned it into a real product growing exponentially to this day.

The same thing happened with skills. I saw a talk on Claude’s skills feature, also at one of Swyx’s events, and a month later we launched dbt agent skills.

The pattern, if there is one, is this: immerse yourself in the work, in the community, in the writing, in the talks, in the experiments other people are running. And then when something clicks, when you feel that we need this and we need it now feeling, trust it and act on it. Get involved in the conversation. Build the thing internally. Post about it publicly. Submit conference talks. Submit meetup talks. Write the LinkedIn post.

The flywheel of finding interesting ideas or patterns, doing the work to apply that to where you are as best you can and then sharing the work is, I think, one of the most important career moves available to anyone during periods of high change.

There are a huge number of specific things you could be working on. The dbt MCP server. Agent skills. Building an analyst agent. Building a dbt-native harness on top of Codex. Building automated CI workflows. There are many, many more, and the list is growing every week. The specific project matters less than the fact that you are doing one.

Once you get in the game everything feels different.

Of course it’s not that simple

I want to bring in something that Salim wrote in response to last week’s piece, because it gets at one of the important differentiators between the last phase change and this one. Quoting from the comment directly:

If you were a passionate analytics engineer before the existence of coding agents with the boring work included, the future should be just as exciting. At least, I feel that way. But this time, the transition has one difference in my opinion, which I think makes it harder: it requires a mindset change across the entire company, not just the data team.

When dbt came along, you could largely adapt your own workspace in isolation, and the external environment in the company did not need to move with you to a large extent. Moving up the stack with agents is different. If the goal is to democratize data across the organization, make everyone a data person, and free the analytics engineer for high value work, then the whole company has to rethink how it operates internally. For example, a business update shared by the head of marketing in an all hands, previously held in an analyst’s head, now needs to be captured in a format an agent can consume. This organizational knowledge management problem is everyone’s job in the company. Similarly, it will not be enough to deploy a data agent to Slack, but make sure that every stakeholder has a base understanding of how to ask a question to the agent. These problems are not actually related to any context engineering problems that we have mostly been talking about in the data community.

So change management will be the biggest barrier to capturing the value of the agentic era for data, and a visionary data team will not move an organization alone.

This is a very fair point. Analytics engineering has always involved organizational change, of course. It created a whole new job title, a career ladder, an org structure. That was hard!. When dbt came along, you could adapt your own corner of the world in relative isolation. The marketing team did not have to change anything about how it operated for you to start using version control on your transformations.

The agentic transition is different. If the goal is to truly democratize data, to make every employee a data person, and to free analytics engineers for higher-value work, then the whole company has to rethink how it operates.

These are not problems that look much like the context engineering work the data community has been talking about. They look like culture work. They look more like organizational design and change management. And change management may very well end up being the biggest barrier to capturing the value of all of this. A visionary data team does not move an organization on its own.

I do not have a clean answer for this. I am not sure anyone does yet. But I think the people who figure out how to work this dimension, the people who can build the agentic data systems and help their organizations metabolize the change at the same time, are going to be doing some of the most important work in the industry.

Charting the future, together

Before I close I want to come back to the thing I said at the top.

The level of uncertainty in this moment is extraordinarily high. Higher than it has ever been in my career, by a margin that makes my head spin a little when I sit with it. I do not want any of this piece to read as I figured out how to do this in the dbt transition, so now you can do the same thing here. I am not saying that. I do not think anyone knows how this plays out.

What I am saying is something more limited. If your goal is to set yourself up as well as possible for a world in which analytics engineers are deeply integrated with agentic workflows, then in aggregate, from what I have seen, this is the highest-leverage set of moves I currently know how to recommend. It is not a guarantee. It is the best bet I can offer.

And if you find yourself having to choose between clearing the ticket queue and spending a morning building something with agents on real data, I think the second one compounds in ways the first one does not.

It’s the bet I’m making myself.

We have been here before, in a smaller way. We moved up the stack once already, and the things we learned along the way, the modeling instincts, the systems thinking, the hard-won understanding of how organizations actually use data, all of that came with us and made the next thing possible. I think that is going to be true again.

The wave is here. It will not wait for us. And the work of figuring out what comes next is some of the most interesting work I have ever gotten to do. I hope you find a way to do some of it too.

Jason

Appendix - what I’m reading to stay up to date on AI

It’s important to have a lot of input, to recognize the strengths and weaknesses of various commentators and learn over time how to sensemake across them. Here are my sources, ordered from most measured to most speculative, but all sources I consider high quality for the niche they occupy.

Ethan Mollick - One useful thing - for high level, accessible overviews of the AI landscape

AI daily brief - For solid daily analysis of the latest AI news from an enterprise perspective

Anything from the AI engineer conference or associated properties

METR and Redwood research for technical research on AI capabilities including the personal writings of their team’s including Ajeya Cotra

The Cognitive revolution podcast - practical conversations with people across the AI industry, focus on people working at the forefront of interesting problems

Hyderdimensional by Dean ball for reflections on AI progress and what it means for policy and the longer arc of history. Particularly recommend the most recent post on Mythos

Don’t Worry About the Vase - good collection of relatively high signal information from across the internet - a lot of content but still more manageable than trying to track it all yourself

Andrew Curren on X - breaking news and theorization from industry insiders. Shares rumors and speculation but as far as I can tell one of the more accurate accounts to do so

Moving Up the Stack: Analytics Engineering in the Age of Agents

Jason Ganz — Sun, 05 Apr 2026 13:05:35 GMT

How would you feel if you were looking at the website of a potential new employer and you read this?

“We believe that all team members should seek to replace themselves on an ongoing basis by building processes, technology, and documentation that obviate their existing work. We have an abundance mindset: there is always more, and more valuable, work to do. Moving up the stack presents growth opportunities for both the individual and the team.”

In today’s climate, defined by AI anxiety, it strikes a very specific chord. Maybe even ominous.

And you might be surprised to hear that has been one of the core values of dbt since 2016. We call it “moving up the stack”. I think it’s really easy to read the first half of the value, about attempting to obviate your existing work and miss that the second half is just as important - that doing so creates growth opportunities for the individual and the team.

The goal of moving up the stack is not ominous, it’s not to make humans irrelevant. It’s to empower them, to allow them to solve creative problems in new ways. It’s based on a fundamental belief that people are smart, the world is complicated and we all have so much more good work we could be doing, given the right support.

Moving up the stack is most relevant when a role is hitting a phase change, a threshold point during which the new version of the role is fundamentally different from before, usually precipitated by a new technology. We’re at such a point right now. But before we talk about that, I want to tell the story of the last time we moved up the stack as a profession.

Analytics Engineering Everywhere

Almost exactly five years ago, I wrote a blog called Analytics Engineering Everywhere. It was a post about how I was pretty confident that within 5 years, the principles behind dbt would be the worldwide standard for how data work is done. I’m generally nervous about making big sweeping predictions but:

I was very confident that this was correct
Five years felt impossibly far away, so if anything went wrong it would be future Jason’s problem to deal with
But mostly it was that I believed in dbt

I had such strong conviction because I’d seen firsthand the way that analytics engineering entirely reshaped my job and it was just obvious that this was a better way to do things.

And now five years later, I’d say that most of the article held up pretty darn well. The massive transition in data work that the early analytics engineers were seeing, pretty much happened. You can see it in the adoption numbers: dbt now has over three million daily downloads and will have been downloaded for the billionth time at some point this month.

And you can see it in the community. Literally millions of humans have improved their impact, increased their salary and helped the world get better at using data.

So in one sense, I feel pretty good about my prediction for five years out. But—and this is a big but!—I missed an even more important point.

It turned out that there was an even bigger wave following directly behind as LLMs emerged to totally reframe how every knowledge worker is thinking about the future of their work.

Now is the time to make sure that data practitioners are best positioned to ride this wave and be set up for success in the agentic era, even as the tasks, skills and value drivers in data work fundamentally will change in the near future.

It’s time to move up the stack again.

Data work has already changed unrecognizably in the past decade

I know because I lived it once.

Pre-dbt, I was working at a tech startup as a data analyst. My job, more or less, was to handwrite a whole lot of SQL queries. We had a set number of weekly and monthly reports, and we mostly did those, with some occasional larger investigative or experimental projects. Just a few years out of school, it was a great way to get a deep, hands on understanding of how a business operates and how all the pieces fit together. It was nice, if a bit comfortable.

Until one day - it got very uncomfortable. Our CEO asked me to pull metrics for a board meeting. Not the usual handful of charts. Way, way more than we’d ever had to pull before - essentially every data point you could imagine. Our system simply wasn’t set up to operate at that scale. So I went home and spent two weeks writing SQL queries fourteen hours a day. Hundreds of queries, each one meticulously hand-built, each one requiring me to hold an enormous amount of context in my head simultaneously. It was the most intense, highest-drudgery period of my professional life. Honestly it was kind of fun in a masochistic way. And when it was over, I came back and said one thing: we can never, ever do that again by hand.

That search for a better way led us to dbt. I read the viewpoint and I was hooked. Within a couple months of adopting dbt, I had more or less automated my entire job up to that point.

Years of accumulated skills (knowing our databases inside out, knowing how to pull every report, knowing the quirks and workarounds) all of it, automated. That intense flurry of work for the board meeting turned out to be the last time in my career I would ever use that particular skillset.

But here’s the thing: I hadn’t become useless. I had become more valuable. I had moved up the stack. Because now instead of spending my days crafting artisanal SQL scripts, I was building and maintaining our dbt project. I was able to spend my time working on data driven experiments and process changes to improve the business, because I wasn’t spending all day writing queries.

And with the rise of agents, we’re once again being asked to move up the stack. I’m not claiming this is a perfect or even neat parallel; I, like many of us, have real fears about the labor market and indeed the basic social contract as we’ve known it so far continuing to hold under such dramatically changing winds. But the thing that I am very confident in is that the best way to navigate this, for individuals, and for our industry, is to focus on moving up the stack. Automating what we do now and finding bigger, bolder things to take on.

This is the right thing to do in that it’ll make us more effective in our roles, but I also believe that the best thing each of us can do to help smooth this transition is to determine how to effectively navigate the new technological landscape we find ourselves living in. Who knows, it might even be fun.

The new world is already here

This is not about some distant future state. AI usage is exploding. From time to time I pull this graph up and just stare at it. This is, to put it lightly, not a normal growth trajectory.

This is from February. It’s now much higher

A big part of the reason for Anthropic’s explosion in growth has been the meteoric rise of Claude Code for software engineering. Regular readers of this newsletter will recognize the ongoing theme here - tooling and best practices come first to software engineering and then to data.

To get this working for data requires the ability to interact with data systems at scale, which requires an additional layer of capabilities and tooling.

That being said, there’s already strong movement on agent adoption in the data world.

Hex has stated that more than 50% of their new cells are created by agents. Think about that for a second. The tool that analysts live in, the environment where the actual analytical work happens, is already half agent-driven. The dbt MCP server has been growing usage by 40% month-over-month and is starting to become a central piece of data infrastructure, with agents consuming dbt projects as context across a remarkable range of use cases. dbt Agent skills let you package up expertise. Forward looking companies like Ramp are deploying agentic analysts to exponentially increase the value of their data.

These systems work. They work well today and they’re getting better fast. There are real questions left to answer (does anyone have opinions about the role of a Semantic Layer in all of this?) but fundamentally you need to do a lot of mental gymnastics to not believe that these systems are going to fundamentally reshape data work.

Five years ago, it was obvious it was a matter of time until everyone was using dbt. Today, it’s obvious that it’s a matter of time until everyone has agents at the heart of their data work. But unlike last time, this transition will not take 5 years.

The question then - if we’re moving up the stack, what are we moving to?

But that’s not actually one question. It’s about 100. Questions like:

Why does knowledge even need curation in a world of agents?
What does an analytics engineer actually do in a world where AI writes SQL?
How do we maintain institutional knowledge about our data models if AI is generating them?

There’s a lot to say about each of these, as well as a lot to build. Now is the time to start deeply thinking about the future of data work and to start building the systems, processes and teams that can support it. I’ve talked to many of you already doing it and it’s incredible to see the things that this Community is building.

Check back soon for more dispatches from myself, Tristan and others as we chart the brave new world together.

Agent Skills: Disseminating Expertise

Tristan Handy — Mon, 30 Mar 2026 13:58:33 GMT

A few weeks ago I pointed Claude, equipped with our new migrate-to-fusion skill, at a real, decently-sized dbt Core project running 1.10 and told Claude to do its thing.

It performed the entire migration with zero help from me; Fusion compiled and ran flawlessly.

I sat there for a second after it finished. That skill encodes hundreds, maybe thousands of hours of collective human experience across our team and the community: the edge cases, the config quirks that trip everyone up, the judgment calls about what to deprecate and what to preserve. Things you’d only know if you’d done multiple migrations. All that, now in 12kb of markdown, callable by any agent that supports skills.

And that’s just a drop in the bucket. The rest of the skills we shipped encode a decade of best practices expertise that built up across the entire dbt community over the course of the past decade.

I recognize that this is not a well-formed question but … what does that mean? It feels big, important. We’ve built hundreds of hours of training and certification content, written hundreds or thousands of pages of documentation, all for humans. And certainly, we haven’t replicated the expertise of a human analytics engineer…yet. But it’s a lot more than nothing, too.

I’ve been sitting with that question ever since that migration, and I don’t have a complete answer. But I have some thoughts.

What We Built

Agent skills are bundles of prompts and procedural guidance that AI agents — Claude Code, Cursor, Copilot, Codex, etc. — load dynamically when you ask them to do relevant work. They’re not documentation. They’re not MCP tools. They’re something in between: encoded expertise that an agent can load and apply without you having to explain how to do a task every time you open a new session.

We’ve shipped 8 dbt-related skills so far. I haven’t used them all, but our team has—from solutions architects to resident architects to the DX team that did most of the work to build them—and the overall feedback is that my experience is typical. They work, often shockingly well.

I think that at least part of that is that skills are optimized for a different reader than anything we have written for before. When you’re writing for agents, you can be significantly more declarative (“do this”) whereas when writing for humans you have to preserve a lot more space for individual opinions and tastes. The former, combined with current models’ performance, just produces really excellent results.

What Doesn’t Exist Yet

Eight skills is a start. It’s great, and I’m pumped. But there is certainly a lot more to do. Here are some things we haven’t even scratched the surface on yet:

Development workflow
Code review, dbt Mesh, exposures, metadata
Data modeling, deeper technicals
Snapshots, Python models, warehouse optimization, open table formats
Data modeling best practices
Auditing for consistency, detecting duplication

These are just the incredibly obvious ones and I’m sure you can think of many more. If any of these is something you’ve spent real time on and developed opinions about, the repo is open for contributions.

Skills + MCP vs. Skills + CLI

If you’ve already set up the dbt MCP server, you’re probably wondering how skills relate. Same? Different? Complimentary?

Short answer: MCP and skills are different things; they’re both useful; the relationship between them is pretty interesting. The original narrative was “they’re complementary: MCP helps with tool calling and skills help with expertise.” And that’s not wrong, but it’s insufficient.

The problem with that perspective is that it underemphasizes a real tradeoff. Simon Willison put it more bluntly, titling his October 2025 post on the subject: “Claude Skills are awesome, maybe a bigger deal than MCP.“ The drawback of MCP he pointed to was token consumption, as it injects full tool schemas whether or not they’re relevant, making every interaction less efficient.

The alternative approach, for developer-oriented products is skills + CLI. Benchmarks across 75 runs show CLI agents completing tasks at 1,365 tokens versus MCP agents at 44,026, almost entirely because the MCP server injected all 43 tool schemas in the Github Copilot MCP server into every conversation regardless of whether they were used. The CLI approach won on cost by 10–32x for these tasks, and hit 100% task completion versus MCP’s 72%, and adding an 800-token skill file to the CLI agent reduces tool calls by a third and latency by a third on top of that.

Of course, that’s all in a single fairly constrained study, and there are plenty of reasons why that may or may not apply in other contexts. The point is that the right way to do tool-calling is currently a bit up in the air; it will take some time to figure out best practices more definitively.

The Skills-Package-Manager Problem

The skills distribution layer is, let’s say, nascent. Lots of folks see the opportunity and are building similar products simultaneously, and there just hasn’t been convergence on requirements yet. This is fun to watch—we’re watching the infrastructure for a new category get built in real time.

There are a bunch of “skills package managers” out there but from what I can tell there are three that are in the lead: Vercel / skills.sh, Tessl, and SkillsMP.

My read: this infra is primarily done outside the context of the model providers (multi-platform benefits are real) and there doesn’t necessarily need to be convergence. The pre-AI analogy is npm, Homebrew, apt, PyPI: there has never been package manager convergence and I don’t think that needs to change.

The more interesting question to me is whether dbt’s package manager should build in native skills support. There’s an active discussion in the dbt-core repo right now proposing exactly that, essentially, dbt deps would install both packages and skills in one command. I kinda love the idea of dbt packages bundling their own skills—install dbt_utils and get the skills that teach your agent how to use those macros correctly. Zero-friction onboarding, skills as a first-class part of the project dependency graph.

At first glance, that feels neat. But the longer I think about it, the more it feels … pretty effing transformative. Imagine referencing dbt-datavault and not only getting a bunch of macros but also an entire set of best practices that your agent can automatically deploy.

I find this direction compelling and I imagine we’ll likely move in this direction, though (standard disclaimer) this isn’t a commitment. We’ll share more as we think it through, and please feel free to weigh in on the above discussion.

What I am confident about: native dbt skills package management and shared registries like Tessl and skills.sh aren’t competing. We list dbt-agent-skills on both and I don’t expect that to change.

Technical Knowledge vs. Best Practices Knowledge

If all of the above was this big download of background info on where we’re at with skills, this is the part that I’m genuinely curious about. What’s the role of “traditional” product documentation moving forwards? Training? Certification? We have invested a ton of time / energy / resources into building the expertise of an entire ecosystem of analytics engineers; will companies like us still do that in the future? Should they?

Here’s an interesting indicator: MSFT recently built a pipeline that automatically converts Azure product documentation into agent skills, continuously updated when the docs change. This is neat and serves a real need. But I think there is something missing in this approach.

Documentation typically answers one question: how does this product work? It tells you the syntax, the parameters, the valid inputs. That’s important but it’s not all that skills can do.

What documentation typically doesn’t tell you: how should you use this product? When should you reach for this feature versus that one? What does a well-structured project look like three years in? What are the patterns that seem fine today but create tech debt? What are the traps that experienced practitioners warn each other about in Slack but that never make it into the reference docs?

That second kind of knowledge—let’s call it best practices knowledge—is part of what we’ve tried to encode in dbt-agent-skills. Not just “here is the syntax for a unit test” but “here is how you should think about when to write a unit test, what assertions are worth making, and how to structure tests so they give you signal without slowing your CI down.”

Microsoft may not see best practices as their responsibility. That’s probably fair: they’re in a fundamentally different position than most software vendors. Auto-generating skills from docs may make sense for them, although over time I wonder if it doesn’t start going the other way around. In that world, skills, authored for agents and more empirically testable, get written first.

What This Is Really About

Here’s the thought I keep coming back to: just as the dbt community came together over the past decade to figure out the best practices of analytics engineering—what a good model looks like, how to structure a project, when to use a snapshot, how to write a test that’s actually worth running—I think it will come together over the coming year(s) to distill that knowledge into agent skills.

And IMO this skill-ification represents meaningful progress for us as a community. Best practices encoded in a skill propagate faster than best practices in a blog post. Disseminating knowledge in a blog post includes a tremendous amount of friction, where every single human reader has to do the work of reading, updating their mental model, and practicing the new skill. Distributing skills to an agent is frictionless.

They’re also forkable. There doesn’t have to be one right answer, and each divergent perspective is one that can potentially “win” in the open marketplace of ideas. It’s open source, but instead of OSS software, it’s OSS expertise.

dbt Labs has always had a value we call “moving up the stack.” The exact text: “We believe that all team members should seek to replace themselves on an ongoing basis by building processes, technology, and documentation that obviate their existing work. We have an abundance mindset: there is always more, and more valuable, work to do. Moving up the stack presents growth opportunities for both the individual and the team.”

Agent skills are one of the most direct expressions of this value I’ve ever seen. They push expertise—syntax, design, experience—down into the agent layer. That frees the human practitioner to operate at the top of their license: asking the questions that matter, interpreting results, making the judgment calls that can’t (yet) be “skill-ed”.

As always, I welcome your thoughts. And if you build dbt-specific skills, please send them my way.

- Tristan

SQL, Typescript, and Agents

Tristan Handy — Sun, 22 Mar 2026 10:38:45 GMT

Sorry for the radio silence here. It’s been a good little while since I wrote. The fact that there may have been a few things going on notwithstanding, I did miss the regular routine and discipline. Excited to be back, and I’m going to try to keep a more consistent pace moving forwards. There is so much going on, so much to say.

Onwards.

- Tristan

===

I’ll be honest…a few weeks ago I didn’t know that much about TypeScript. I’m not a big front end guy, and now with vibe coding I don’t imagine I’ll ever learn to write JavaScript. But someone on our team made a comment to me recently—essentially, “dbt’s Fusion engine is doing for SQL what TypeScript did for JavaScript,” which wormed its way into my brain.

It turns out…it’s a really good comparison. And today I wanted to spend some time exploring this thought. Because I think it matters a lot to both the dbt developer ecosystem for humans but even more so for agents.

The TypeScript Story

In October 2012, Microsoft released TypeScript 0.8 after two years of internal development. Anders Hejlsberg, the architect, was modest about what success would look like. His stated goal: “Maybe we’ll get 25% of the JavaScript community to take an interest — that would be success.”

The JavaScript community’s reaction was, roughly: “Why would I want this? JS works fine. I don’t need types. Types are what you use in Java, and nobody is having fun in Java.”

They weren’t wrong. JavaScript is flexible, fast, forgiving; you prototype quickly, throw things together, ship. That dynamic nature isn’t a flaw to be corrected. Hejlsberg’s team knew this, which is why TypeScript was a superset: you could opt in gradually, one file at a time, without torching the ecosystem you’d built.

Why types matter (even if you’ve never declared one)

If you’ve written mostly SQL and Jinja, you may have never declared a type. SQL doesn’t ask you to (except, of course, when you create columns!). You write select revenue from orders and the database figures out the rest at runtime. This feels like a feature (and it is!). But it also means no tool can look at your code before execution and know much about it.

That’s what types buy you: pre-runtime knowledge. When a language knows that orders.revenue is a decimal and orders.customer_id is an integer, tools can tell you at write time whether you’re doing something nonsensical: averaging a customer ID, joining on a column that doesn’t exist in the downstream model, passing the wrong type to a function. More importantly, it means IDEs can offer autocomplete that actually understands your schema, catch errors the moment you type them, and perform refactoring without guessing at what each reference means.

Coming back to Javascript: developers have adopted types, and TypeScript, not because of some I-should-have-included-types-from-the-beginning mea culpa from Brendan Eich. Rather, the TypeScript tooling ecosystem just outstripped that of Javascript and developers migrated as a result. Types made large codebases refactorable without fear, developers moved faster, stayed in flow state, and spent less time on stupid crap. The magic, as Hejlsberg put it, was making TypeScript “feel like JavaScript, but with superpowers.”

So: the types weren’t the point, they were just what made the tooling possible. Today TypeScript is the most-used language on GitHub, with adoption that went from 12% of developers in 2017 to 37% in 2024.

SQL Is Living the Same Story

SQL has been around since the 1970s. It’s the most widely-used language in data by a significant margin. And like JavaScript before TypeScript, it has thrived on flexibility: you write queries without defining schemas upfront, without a local compiler. Just write the SQL, hit run, see what comes back.

That super-simple DX is a big reason SQL is everywhere.

It’s also exactly why SQL tooling has historically stayed thin. Decades in, most SQL editors still offer little more than syntax highlighting and basic autocomplete, while JavaScript developers got full Intellisense. Without type information, tools can’t know what a column reference means, whether a join is valid, or whether a function exists in the target dialect. This is why AI agents working with SQL today hit the same ceiling JavaScript developers hit as their codebases scaled: at a certain level of complexity, you need language features that make your development loop both safer and, as a result, faster.

SQL has historically had none of this. Write a transformation, run it against the warehouse, get rows back or an error. The loop is slow, expensive, and blind to structural problems before runtime. An agent generating SQL has no way to know if what it wrote is correct until it hits production data.

Fusion: Mature Language Features for SQL & dbt

dbt’s Fusion engine is, at its core, the TypeScript transition applied to SQL. A real SQL compiler that parses, understands, and type-checks SQL across multiple warehouse dialects before anything runs. It uses the Arrow type system from drivers through adapters into the compiler and runtime, producing a logical plan via static analysis for every rendered query in a project.

(Just the idea of layering a single type system across all existing data platforms is a fascinating problem that I have every confidence we don’t yet fully understand the significance of! But I digress…)

And Fusion ships with a language server. Real-time error detection. Autocomplete that understands your models and columns. Hover insights. Inline lineage. Refactoring that propagates changes to downstream automatically. Everything that made TypeScript-in-VS-Code feel categorically different from JavaScript-in-a-text-editor.

Author’s note! Fusion is officially going GA in the next ~2 months. Public Preview has been highly productive and we’re now in the final process of pre-GA refinement. It’s ready for production deployments today (over 3k projects running it in prod today) but if you’re waiting for GA that’ll come soon.

The Agentic Development Loop

Ok, cool. That is great. But I don’t actually write my own code anymore since Opus 4.6. So: do I actually care about language DX?

I do. And you should too.

AI coding agents work in loops. Write something, check whether it’s right, fix it, check again. The quality of those loops is what determines whether agents produce reliable output or garbage.

Spotify’s engineering team documented this directly: without feedback mechanisms, “the agents often produce code that simply doesn’t work.” What makes their agents reliable is a verification loop — compilers, formatters, tests — running after every change. Agents can confirm they’re on the right track before committing.

Anthropic agrees: in December 2025, Claude Code shipped native LSP support. When Claude Code modifies a file, it can query the language server for diagnostics from a language server in milliseconds, with type errors, undefined references, and structural problems flagged before anything runs. The tight feedback loop allows agents to move faster, with higher trust, and use fewer tokens. Same model, better infra >> better performance.

A language server and type system give SQL agents the first part of that loop: fast, structural feedback on whether what they wrote is valid. Meaningful improvement over the current state. But structural soundness is necessary, not sufficient. Correctness requires not only structural correctness, but also semantic correctness.

Agents Need Tests!

dbt tests come in two types: data tests (designed to test underlying data) and unit tests (designed to test code). And they can be written with two specific intents:

structural tests: designed to test that data adheres to technical specifications, like uniqueness and referential integrity
semantic tests: designed to test business logic, like making sure debits equal credits

Most dbt projects focus heavily on structural data tests. We analytics engineers love testing to make sure that our models generate data that follows obvious structural rules. And this is good insofar as it goes. But it’s not enough for agents. Wes McKinney talked about this in his recent Python is Dead. Long Live Python! podcast: without semantic tests, agents have no signal for whether what the agent wrote is actually correct. You get structurally valid code that is logically wrong.

Unfortunately, most dbt projects have very few unit tests defined. This is not shocking; unit tests (pre-agent) took a long time to build, and they weren’t something that analytics engineers were in the habit of doing (first introduced in 1.8, in May of 2024). But this is, IMO, one of the biggest friction points to getting agents that can safely operate inside of large, complex data repos. Tests asserting semantic correctness would make long-running data coding agents exciting rather than stressful.

Of course, for a great agentic development loop the tests also have to run really damn fast. And we’ll have a lot more to say on that soon! 🧪🧪

For now, we believe that Fusion’s compiler and LSP unlock an absolutely next-level set of capabilities for both humans and agents. And we’re pushing hard to improve the entire development loop with tests that are easier to author and run fast.

This is the most fun I’ve had writing dbt code for … I don’t know. A long time. I hope you’re having as much fun as I am.

- Tristan

The Iceberg ecosystem today (Anders Swanson)

Dan Poppy — Sun, 08 Mar 2026 13:02:53 GMT

The data industry is moving towards open standards. The migration towards open standards throughout the data ecosystem is happening rapidly despite all the oxygen getting sucked out of the room from the rapid progress of AI and agents.

The dbt Labs data team is moving to an all Iceberg lake with a mix of compute engines to power transformation, analytics, and agentic experiences. The team has been able to move quickly towards this architecture because the entire ecosystem has been laying the groundwork for years. All of it’s coming together to make this new open world a reality fast.

On this episode, Tristan discusses the reality on the ground for data practitioners. Where’s the Iceberg ecosystem today? What can practitioners realistically expect when attempting to run on top of Iceberg in production?

Tristan is joined by Anders Swanson, a developer experience advocate at dbt Labs. Anders has spent a lot of time over the years navigating open-source data ecosystems and tracking their progress.

They unpack the open standards shift, define the core building blocks (query engines, object stores, catalogs), and dig into why external catalogs have become a fourth namespace tier across platforms. Anders outlines a pragmatic, phased adoption model for Iceberg integrations, explains why metadata performance and resiliency are hard requirements, and clarifies why vended credentials exist and what they solve.

Please reach out at podcast@dbtlabs.com for questions, comments, and guest suggestions.

The call for papers is open for dbt Summit 2026. We invite data practitioners, platform leaders, and executives to share real stories of how data gets done at the world’s largest gathering of dbt community members. If you ship fast, reduce costs, improve trust, or bring governed AI to life, the dbt community wants to hear from you.

Submit a talk

Coalesce is now dbt Summit. Join the world’s largest gathering of dbt users, where data leaders and practitioners come together to shape the future of data analytics and AI.

Listen & subscribe from:

Key takeaways

Tristan Handy: I wanted to have you on because of work you’ve been doing internally to summarize the state of the Iceberg ecosystem. We’ve talked about Iceberg a bunch lately with folks deep in specific parts. Your work is more of an overview: where we’re at with platform integrations, what’s easier now than a year ago, and what’s still hard. Before we dive in, I want to define a few terms. When you say “query engine,” what do you mean?

Anders Swanson: It’s the thing that does your work. When you issue a CREATE TABLE or a SELECT statement, it’s what returns data or stores it somewhere for later.

Object store.

It’s the cloud service where you can store an object. An object is anything: a blob.

Catalog.

In this context, a catalog knows what tables and views exist and where they are, and how you can fetch or write to them.

Let’s talk internal versus external catalogs.

An internal catalog is what you get by default in a system like Snowflake or SQL Server. An external catalog is more like another directory, often managed by a different system. As you connect more disparate platforms, you can’t assume one system controls everything.

The complexity comes from duplication. How do you make namespaces unique? Can you plug in many external catalogs?

Abstraction matters. A common pattern emerging is one‑to‑one mapping of an external catalog into a database. That pushes a move to a four‑part namespace: catalog, database, schema, identifier. Spark moved toward this; Databricks Unity Catalog and Snowflake‑style catalog link approaches are in this family.

So the downside?

The devil is in the details, especially metadata performance and resiliency. For example, information schema listing. Users expect listing tables to be fast and reliable. In a federated world, if listing tables takes five seconds, users blame the vendor they’re using—even if the external system is slow. DuckDB draws a line by not mixing external catalog tables into information schema listing today. Snowflake’s catalog link databases appear to cache or mirror metadata so it feels as performant as native tables.

With catalog link databases, Snowflake is doing mirroring.

Yes. Mirroring exists in different flavors across platforms. Delta is sometimes seen as “simpler” because metadata can live in object store, but as soon as you want multiple engines writing, you still need a real catalog.

Sharing across multiple platforms adds another layer. What’s the state of platforms reading and writing to the same Iceberg catalog?

There are phases of integration.

Phase one is the naive approach: you have Parquet and JSON in object storage, and an engine reads it. Reading is easier than writing. You can get a toy example working.

Then you run into versioning and “what’s latest.” The next phase is connecting to an Iceberg REST catalog so engines can ask for the latest table version without users thinking about paths.

Phase three is schema‑scale: it’s never just one table. You need discovery of new tables, keeping schemas up to date, and eventually things like multi‑table transactions.

This maps to dbt Mesh and cross‑platform mesh. Producer vs consumer.

A consumer‑led model requires the downstream team to create pointers (DDL) to external tables. It’s operationally messy. Producer‑led is cleaner: the producer writes to the catalog and it’s just there, immediately queryable downstream.

Are platforms there yet?

Some support writing directly to external catalogs. When it works, it’s great, but there are still kinks. We’re retrofitting race cars designed for isolation to be interoperable without losing performance.

Identity is one of the hairiest issues. Vended credentials.

Vended credentials solve the “two keys” problem. You authenticate to the catalog, the catalog tells you where data lives, but then you need separate object store credentials to read files. Vended credentials means the catalog vends short‑lived credentials so you can access the object store location without managing separate keys.

That doesn’t solve user identity and grants.

Correct. Vended credentials isn’t global authorization. Identity and access across platforms is still hard. Ideally you grant access once and it works everywhere, but enterprises have different identity providers and platforms have different permission models. Today, admins often have to configure grants separately in each platform.

Is this mission creep?

The goal is to reduce how many people have to think about storage details. Big tech had whole data platform teams solving reliability problems in Hive‑era lakes. Iceberg reduces that toil dramatically, but the long tail is still auth, mirroring, and cross‑platform governance.

How does this reshape data teams?

Analytics engineering abstracted a lot of work. Data engineering has also been simplified by replication/orchestration vendors. What remains is the open ecosystem complexity: identity, object store policies, and cross‑platform connections. Many enterprises already have teams with these skills (infra as code, Terraform, Snowflake management), but others will need to grow into them.

Are vendors embracing Iceberg in good faith?

The goodwill and collaboration in the past 18 months feels unprecedented. We’re getting “more problems” because we solved prior ones. The industry aligning on standards feels like F1 teams standardizing components so they can innovate elsewhere.

In your internal writeup about Iceberg, you quoted Wolf Hall: “The making of a treaty is the treaty. It doesn’t matter what the terms are, just that there are terms, it’s the goodwill that matters. When that runs out, the treaty is broken, whatever the terms say.” Explain the relevance here.

When I joined dbt, it was taboo to mention one partner to another. Now vendors openly acknowledge mutual customers and invest in interoperability. On the Iceberg repo you see competitors collaborating on proposals. The goodwill is the standard.

Wrap us up with three things you’re excited for next year.

Push‑based catalog updates so platforms can subscribe to changes rather than repeatedly listing and polling. Progress on the small files problem so Iceberg works better for smaller data too. And more platforms supporting writing directly to external catalogs, unlocking producer‑led sharing and cross‑platform mesh.

Chapters

00:00:00 — Intro: why open standards are accelerating

00:01:20 — What practitioners can expect from Iceberg in production

00:05:00 — Lightning round: query engine, object store, catalog

00:06:20 — Internal vs external catalogs

00:09:30 — The “four-part namespace” and catalog-link style abstractions

00:11:30 — The downside: metadata performance, resiliency, and caching

00:17:10 — Sharing across multiple platforms: reality and tradeoffs

00:19:10 — Iceberg integration phases (1: naive table, 2: REST catalog, 3: schema-scale)

00:24:10 — Producer vs consumer model and cross-platform mesh

00:29:10 — Identity and “vended credentials”: what it is and what it isn’t

00:33:30 — The hard unsolved part: grants and global identity across platforms

00:37:00 — Is this mission creep? What Iceberg is optimizing for

00:39:50 — How roles on data teams evolve in an open ecosystem

00:43:40 — Are vendors genuinely aligned? Why Anders is optimistic

00:46:50 — “The making of a treaty is the treaty”: goodwill as the standard

00:51:50 — Three things Anders is excited for next year

This newsletter is sponsored by dbt Labs. Discover why more than 80,000 data teams use dbt to accelerate their data development.

Demo on-demand

Apache Iceberg and the catalog layer (w/ Russell Spitzer)

Dan Poppy — Sun, 25 Jan 2026 13:59:27 GMT

In this episode of The Analytics Engineering Podcast, Tristan talks with Russell Spitzer, a PMC member of Apache Iceberg and Apache Polaris and principal engineer at Snowflake. They discuss the evolution of open table formats and the catalog layer. They dig into how the Apache Software Foundation operates. And they explore where Iceberg and Polaris are headed. If you want to go deep on the tech behind open table formats, this is the conversation for you.

A lot has changed in how data teams work over the past year. We’re collecting input for the 2026 State of Analytics Engineering Report to better understand what’s working, what’s hard, and what’s changing. If you’re in the middle of this work, your perspective would be valuable.

Take the survey

Please reach out at podcast@dbtlabs.com for questions, comments, and guest suggestions.

Listen & subscribe from:

Key takeaways

Tristan Handy: You spend a lot of your time thinking about Iceberg and Polaris. Give the audience background on how you found yourself in this niche of high‑volume analytic data file formats.

Russell Spitzer: It’s a bit random. I started at DataStax on Apache Cassandra as a test engineer and quickly got drawn into analytics. I saw big compute clusters and wanted to be involved. A coworker, Piotr, noticed Spark 0.9 and began a Spark–Cassandra connector. That got me into Spark. Over six to seven years I focused on moving data between Cassandra and Spark and into other systems. The interoperability problem across distributed compute frameworks was compelling.

This was pre‑Apache Arrow and pre‑table formats. We were just putting Parquet files everywhere and no one quite knew what they were doing. Pre‑Spark, people explored DSLs like Apache Pig. Eventually the industry converged on SQL for end‑user interfaces.

I later applied to Apple for the Spark team.

Helping build Apple’s Spark infra, or working directly on Spark?

Apple has an open-source Spark team and a Spark‑as‑infra team. I was trying to join the open source team, pushing Apple’s priorities into the project and supporting Spark as a service. During interviews, Anton—another Iceberg PMC—convinced the hiring manager I should join the data tables team, essentially Apple’s Apache Iceberg team.

They ambitiously planned to replace lots of internal systems with Iceberg. Iceberg existed but was early (Netflix started it around 2018/2019; I joined Apple in 2020). At Apple it was Iceberg all the time; convincing teams to move off older stacks, adopting open‑source‑as‑a‑service to save money, and getting onto ACID‑capable foundations. We were successful.

Migrations are hard. How did you make it accessible?

We replaced complicated bespoke reliability fixes with Iceberg. In Hive/HDFS, small‑file problems lead teams to write custom compaction and locking. Removing that toil is a big win. For big orgs, migration is a long‑term investment with ongoing engineering cost. For smaller companies, the key is offloading runtime responsibilities—ideally to SaaS—so engineers aren’t in the loop. Open source limits lock‑in so you can move between systems. Most companies are paid to deliver business value, not to build data infra. dbt is a great example of avoiding hand‑rolled pipeline code. Same logic applies to table/file formats.

Let’s talk Apache governance. What’s a PMC? How do projects run?

Apache projects aren’t owned by one company. Influence is earned by contributing to the community. The PMC governs merges, releases, membership. People move companies; the project stays with them. The goal is to make the project broadly useful. There’s no CEO dictating roadmap and no company can change the license.

Most big projects—Spark, Kafka, Iceberg, Flink—are maintained by employees of companies with vested interests, but governance is consensus‑driven. Vetoes are for technical issues (security, future‑limiting design), not ideology.

Is Iceberg for the top 20 tech companies or for everyone?

Not everyone needs Iceberg. OLTP belongs elsewhere. But for analytics, we should move past raw Parquet partition trees with folder‑name partitioning. In the Hadoop era, lakes were dumping grounds; schema evolution was painful. Many are still moving from CSV to Parquet. Over time, better encodings and table formats become default.

Decoupling compute and storage changes everything versus co‑located HDFS. Defaults tuned for HDFS (like 128MB Parquet files) don’t always hold for S3. We want elastic storage and compute; no one wants to pay for compute because storage grew.

Walk us through Iceberg versions.

v1: transactional analytics—ACID commits instead of fragile Hive/HDFS patterns. v2: row‑level operations—logical deletes via delete files so you don’t rewrite 10M‑row data files to remove one row; later compaction physically purges (key for GDPR). v3: expanded types—geospatial and variant for semi‑structured data; Variant was standardized across vendors and Parquet so everyone can write/read consistently.

v4: two thrusts—streaming and AI. Reduce commit latency, make retries faster under contention. Historically writes took 10–20 minutes, so commit latency didn’t matter. For streaming (writes every minute/five), it does. We’re evolving commit and REST catalog protocols so clients can specify intent (add these files, ensure these exist, then delete those) and let the catalog resolve conflicts server‑side.

On AI: Iceberg doesn’t yet serve some vector/image‑heavy patterns well. We’re exploring changes in Iceberg, Parquet, or both, without breaking existing tables.

Talk about Polaris and the catalog layer.

Polaris is an Apache incubator project (PPMC). Incubation proves we operate like an Apache project (community‑driven, trademarks donated). Iceberg defines the REST catalog spec/client; Polaris implements a catalog that speaks that spec. Many of us work across projects (Parquet, Iceberg, Polaris), which helps align boundaries.

Horizon, Polaris, external catalogs—what’s the story?

We’re simplifying: Snowflake can act as an Iceberg REST catalog, or you can use an external REST catalog. External can be Polaris (managed by Snowflake or self‑hosted) or another REST implementation. Interoperability means everything talks the same REST.

What is Polaris trying to be best at?

A broad, interoperable lakehouse catalog. It can act as a generic Spark catalog (HMS replacement) and aims to support multiple table/file formats. Architectural choices differ (KV vs. relational storage, where transactions live, policy enforcement vs. recording, identity integration). Polaris aims for base implementations that are pluggable—e.g., AWS/GCP/Microsoft identity.

Identity and scope—where does the catalog stop?

There’s a “business catalog” for discovery/listing versus a “system catalog” that must know table layout to govern access. Polaris can vend short‑lived credentials for the exact directory of a table’s files for a load operation; that requires understanding layout. Purely relational metadata often needs to delegate that decision.

Will identity/grants slow broad adoption?

Possibly. But many once‑complex things become default—compressed files, columnar formats, soon encryption. With collaboration (like Variant), we’ll land broadly accepted patterns.

Chapters

00:01:30 — Guest welcome and interview start

00:02:00 — Russell’s path: DataStax Cassandra, Spark connector, interoperability

00:05:20 — Joining Apple’s Iceberg team and early Iceberg momentum

00:06:20 — Why migrations resonated: replacing bespoke Hive/HDFS compaction/locking

00:09:10 — Apache governance 101: PMCs, consensus, and corporate influence

00:15:40 — How decisions land without votes; when vetoes apply

00:18:30 — Who needs Iceberg and where it fits

00:22:20 — Lake → lakehouse and warehouse → lakehouse in the cloud era

00:25:20 — Iceberg versions: v1 transactions, v2 row‑level ops (GDPR), v3 types

00:28:10 — Standardizing Variant across vendors and Parquet

00:31:10 — Iceberg v4 goals: streaming commit/retry improvements and AI use cases

00:33:40 — Commit latency and server‑side conflict resolution

00:37:20 — Polaris as an Apache incubating project (PPMC)

00:39:30 — Iceberg REST catalog spec and Polaris implementation

00:42:30 — Clarifying Snowflake Horizon, Polaris, and external REST catalogs

00:45:10 — What Polaris aims to be best at; pluggable identity providers

00:48:00 — Identity scope: business vs. system catalogs and credential vending

00:51:00 — Will identity/grants slow mass adoption?

00:52:50 — Wrap‑up

This newsletter is sponsored by dbt Labs. Discover why more than 60,000 companies use dbt to accelerate their data development.

Demo on-demand

AI agents and the data lake (w/ Lauren Anderson)

Dan Poppy — Sun, 11 Jan 2026 14:03:05 GMT

One of the interesting commonalities of AI and the data lake is that they both require new thinking around how we manage identity. For AI, the big question is how do agents interact with underlying data? For the data lake, the big question is how do we make open data stored outside the purview of any given data platform act like you’d expect?

In this episode of The Analytics Engineering Podcast, Tristan talks with Lauren Anderson, who leads the enterprise data platform at identity company Okta. Lauren discusses how identity sits at the center of two seismic shifts in data—AI agents and the open data lake—and why central governance and a shared semantic layer are critical. She lays out how analytics engineers and data engineers should divide responsibilities as agents begin to write a growing share of analytical queries.

Take the survey

Please reach out at podcast@dbtlabs.com for questions, comments, and guest suggestions.

Listen & subscribe from:

Key takeaways

Tristan Handy: Before we dive into the current day, can you share a little bit about your background and how you came to the role that you’re in today.

Lauren Anderson: I’ve had a 20‑something year career at this point. I have basically spent my entire career in analytics some way, but my first data job was at a big bank. I won’t name it. There’s only a few big banks you could probably guess. I worked for the finance org and I did compensation planning and administration, with a side of sales tracking and analytics. I was part database analyst, part customer support for people that made a lot more money than I did.

I was there for seven, seven and a half, eight years. Towards the end of it, I became the owner and creator and almost business architect for our brand‑new sales tracking data warehouse. At a very young age, I got to think about how relational databases should come together for the outcome of both analytics and reporting—dashboards and whatnot—but also operations, which was paying compensation every month. It got me super excited about this world of data and being able to architect pipelines and the end‑to‑end flow for real‑world outcomes.

What do you think allowed you to be successful in that era? I often think the things that enabled success then aren’t the same as what make data folks successful today.

When I took it over, we ran compensation out of an Access database. I was new, the person who designed it left, and there wasn’t much documentation. It worked the first month, then broke the second—right before a payroll deadline. I rebuilt it as a long series of SQL queries with inline comments and step‑by‑step checks that produced a clean file. That willingness to throw away the brittle thing and rebuild with clarity and documentation gave me early success. The meta‑skills:ability to learn, take chances, and figure out the best path—still apply, but the technology is completely different now.

You’ve split time at Okta into two stints. How would you characterize the work?

Okta was my first truly B2B company. I realized quickly B2B data is my sweet spot. I love thinking about customers as businesses and how business users interact with our products and features. Okta data is complex—many products, features, and highly configurable use cases—especially with large customers. That variety is exciting. In simpler retail flows you see a lot of the same patterns; in B2B, the variety is the appeal.

What’s your current role?

I lead our enterprise data platform, engineering, and architecture function. For enterprise data used to make business decisions, we own ingestion into the warehouse, transformations, and delivery—dashboards, reverse ETL to third‑party applications, other data stores, and internal apps.

How big is the central function and how do you engage with the business?

We’re about 50 people across data engineering and analytics/data science in a company south of 7,000 employees. We support every business unit. Engagement spans a maturity curve. One end is platform self‑service: teams land data via approved connectors, build transformations in dbt on our implementation, and build dashboards in Tableau we administer. Governance and roles are defined centrally, and teams assign people to those roles. The other end is a white‑glove model where we partner through the full lifecycle—question, discover existing assets, requirements, data work, build, interpretation, validation, and end‑of‑life of the data product. Our sweet spot is the middle: we own enterprise “gold” pipelines for company‑level metrics—monitored and governed—while domains build and later graduate via a path‑to‑production under stronger governance.

Okta is known for identity and security. How does security‑first actually work in practice?

Reinventing controls every time slows you down. We invest in repeatable frameworks. Any new source goes through third‑party risk review, classification, and decisions on masking or exclusions. We help teams through that; after a couple times, they can engage directly with risk while we stay in the loop and monitor. As our classifications and expectations got clearer, review cycles shrank from weeks to days. It’s not all roses—it takes time—but we all operate as security practitioners. That shared mindset builds trust and reduces corner‑cutting.

How much do users need to know?

We don’t expect everyone to know everything. We provide dbt frameworks and minimum testing standards, plus SMEs to guide teams. The culture is to ask when unsure.

Will agents write more analytical queries than humans in the next 12–24 months?

Macro, yes. For us, more like 24–36 months because we’re careful. The key is safe, ethical AI consistent with being a security company.

How are you thinking about agent access?

Central governance. Ideally, agents query centralized, agent‑ready stores. Run governance once: policies, roles for users and for data, tracking and logging on a central plane. The semantic layer is essential. Creating semantic views must get easier and more automated, and semantics should inform policy application.

Why are agents different from humans in access patterns?

Row‑level security to the extreme. Conversational intelligence data should be limited to what the requesting user can access. Aggregations could be broadly accessible with anonymization, but detailed content should remain constrained. You might also limit allowed functions on large unstructured objects. Identity for agents matters—Okta Secures AI looks at distinct identity patterns to secure agents across applications.

Where are you with MCP and agent building?

Early, building support and insight use cases. Progress is fast, but nothing broad in production yet.

How should analytics engineers and data engineers participate?

Analytics engineers should own semantics—tooling, vendor choices, onboarding use cases, and the shared business language. Data engineers should optimize for consistency and scale, notice overlap across agents, and provide a platform others can build on with confidence in governance and security.

Will you standardize an agent development platform?

Yes, in partnership with engineering and shared services. Our current pull skews to the business, so we’re leaning toward accessible, governed platforms that serve both business and engineering with central governance.

Any assumptions you’re rethinking?

Treating everything like a relational model. Many initial agent questions are intentionally simple, where speed and reasonable accuracy trump perfect sophistication. The important thing is to start, observe, and mature.

Chapters

00:02:28 — From bank analytics to owning a sales DW

00:05:00 — Rebuilding brittle Access → SQL with documented checks

00:08:30 — Ops accountability then vs. optimization today

00:11:00 — TripIt, marketing analytics, and moving into tech

00:13:14 — Why B2B data became Lauren’s sweet spot

00:16:00 — Current role: ingestion → transform → delivery at Okta

00:18:10 — Operating models across business units and the path to production

00:22:20 — Security-first in practice: repeatable frameworks over friction

00:24:23 — Third‑party risk, classification, and shrinking review cycles

00:28:00 — Policies, masking, and the need for a central governance plane

00:30:20 — Frameworks for dbt, testing, and SME guidance

00:32:11 — Will agents outwrite humans? Macro yes; Okta timeline nuance

00:33:48 — Central governance and agent access patterns

00:37:19 — Semantic layer as bridge and policy carrier

00:41:00 — Function limits on unstructured data and Okta Secures AI

00:42:35 — Early MCP experimentation and support use cases

00:43:03 — Roles: analytics engineers (semantics) and data engineers (scale)

00:46:10 — Enabling an org-wide agent platform with shared governance

00:47:43 — Solve governance once, serve business and engineering

00:49:30 — Simpler questions first; rethinking relational assumptions

This newsletter is sponsored by dbt Labs. Discover why more than 60,000 companies use dbt to accelerate their data development.

Demo on-demand

Inside Snowflake’s AI roadmap (w/ Chris Child)

Dan Poppy — Sun, 14 Dec 2025 14:06:35 GMT

This season of The Analytics Engineering Podcast is focused on how the current data landscape is impacting the developer experience. Snowflake plays a major role in what that developer experience looks like.

In this episode, Snowflake VP of Product Management Chris Child joins Tristan to unpack Snowflake’s AI roadmap and what it means for data teams. They discuss the evolution from Snowpark to Cortex and Snowflake Intelligence, how to govern agents with row- and column-level controls, and why Snowflake is investing in Apache Iceberg and the Open Semantic Interchange initiative. dbt Labs recently open sourced MetricsFlow, the technology that powers the dbt Semantic Layer, to align with the goals of OSI.

Chris also shares a vision for the next five years of data engineering: fewer bespoke pipelines, more standardization and semantics, and a bigger focus on business context and data products.

Please reach out at podcast@dbtlabs.com for questions, comments, and guest suggestions.

Check out the dbt VS Code extension

Listen & subscribe from:

Key takeaways

Tristan Handy: Where have you spent your time professionally?

Chris Child: I didn’t end up in data on purpose. I found myself here through a series of hops. I was working at Redpoint Ventures and got excited by a company we invested in, RelateIQ. I left to join RelateIQ, building an intelligent CRM. We captured emails and meetings and built profiles of everyone you interacted with. We were acquired by Salesforce. Looking at what sales teams needed, I realized they also needed product usage data, marketing data, and campaign data, with a platform to pull it all together. That led me to Segment. I joined when it was about 50 people. Segment was mostly analytics.js then, loading different JavaScript on your webpage for tracking. We had just built the first warehouse connector to Redshift and got huge usage sending click and user data to Redshift.

The original Redshift connector was a nightmare to work with.

Like many startup things, one engineer built it in a week. Suddenly a ton of people used it, and enterprise customers depended on it. We had to rebuild it several times. You could see the future there. Folks I worked with went on to start companies like Census and Hightouch, thinking the CDP should be built on top of the warehouse, which Segment evolved toward. We also built a Snowflake connector because customers demanded it in addition to Redshift.

It’s funny to think back a decade to how small Snowflake was.

A couple customers demanded it; we built it, and we were sending a ton of data. That led to the realization that a customer data platform is one instance of a data warehouse, and there are others you need. Seeing how fast Snowflake was growing, I wanted to build the next layer of infrastructure.

I joined Snowflake seven and a half years ago. I’ve had three key roles. First, I built areas of the product: the UI, billing, product-led growth engines and free trial infrastructure, and application capabilities for connecting into and building on Snowflake. After Sridhar became CEO, he asked me to reconnect product and sales by leading solutions engineering, reporting to the CRO. Leading a global technical seller org was very different for a product person, but it helped align teams at scale.

About eight months ago, I returned to lead data engineering: how people bring data into Snowflake, how they transform it—spending a lot of time with dbt—and work around Iceberg and interoperability for worlds where not all data sits in Snowflake.

I didn’t realize the path started in investing. Are you a finance person way back?

My undergrad is in computer science. I started programming in fifth grade on an Apple IIe, learned C before high school, and followed that thread. In college I noticed business folks often made the decisions. I wanted to learn that side. After college I joined a consulting firm, then private equity, then an MBA. I realized I didn’t want to be a finance person. I moved to venture as a bridge to building products, but I wanted to build, so I jumped into operating roles.

Tell the story of Snowflake and AI. In the 2010s there was huge demand for easier, scalable, cloud-oriented data solutions. Then 2022 happened, ChatGPT launched, and the world changed. How did Snowflake respond, and where are you today?

Even pre‑2022 we saw customers putting their most important business data into Snowflake, then pulling data out for things they couldn’t do inside: training ML models and other analyses that SQL wasn’t a great fit for. Customers told us they didn’t like losing governance and lineage when data left. We invested in ways to bring more of that work to Snowflake.

Snowpark was the first big step: a runtime for non‑SQL code (Python, Java, Scala) with APIs inspired by Spark, plus capabilities like forecasting. It’s great for some workloads, but most customers don’t train most ML models inside Snowflake yet. We also acquired Applica for document extraction using early LLM techniques, and Neeva for web search based on LLM approaches.

When ChatGPT arrived, we saw two major influences. First, people wanted to chat with data they’d brought into Snowflake and transformed with dbt. That’s hard because LLMs are great with unstructured data and less great at turning business questions into correct SQL. Second, LLMs are very good at writing code, including Python and even dbt code. They’re not perfect for data engineering code yet, but they help.

Our goal is to help customers activate important enterprise data safely in AI models, deploy agents at scale under existing governance, and keep up with exploding data volumes without 10x headcount.

What are the key product pieces—Cortex, Snowflake Intelligence, etc.—in the Snowflake AI stack?

First, you need a great data foundation. That isn’t new: get the data in one place, apply good governance and permissions, know your data, tag PII, and raise the standard of care.

AI raises the bar because agents can expose sensitive data faster than dashboards. OSI (Open Semantic Interchange) work is part of this; LLMs need explicit semantics and cataloging they can consume, not tacit knowledge hidden in downstream tools.

Companies with strong hygiene move faster with AI. Roles matter; if a product manager role has access to certain rows and columns, an agent acting within that role can safely answer questions. Agents can run inside or outside Snowflake, but should assume appropriate roles when querying.

On the AI stack, after the data foundation, Cortex provides higher‑level APIs for unstructured processing, RAG, and structured processing. You can choose models (OpenAI, Anthropic, Mistral, Gemini, Llama, etc.), but most folks don’t want to manage prompts and GPUs. Cortex AI SQL lets you express intent like sentiment filters or fuzzy joins. It’s powerful for exploration but non‑deterministic, so you need care in production. Costs map to tokens at higher abstractions, with budgets and guardrails similar to variable compute in the cloud.

At the top, Snowflake Intelligence is a UI and agent framework. You define agents with access to specific datasets and semantic models, plus gold queries and usage guidance. It looks like a chat interface over your governed data. Inside Snowflake, we’ve deployed a GTM assistant that blends product usage, Salesforce, notes, docs, and content—structured and unstructured—respecting row‑level security for every seller while giving leaders broader access.

Let’s talk open formats and Iceberg. Why lean in when it opens up the data?

Our aim isn’t to lock up data, it’s to help customers get value. Snowflake began as a reaction to Hadoop—betting on SQL at cloud scale with our own formats and catalog because they didn’t exist then. Those proprietary pieces let us evolve quickly. Iceberg is now almost as good, and we’re contributing to make it better.

Openness is a win for customers and expands the universe of data Snowflake can query, run Cortex on, and power Intelligence with. The tradeoff is standards move slower. Variant type support is a good example—we contributed our approach and shepherded it into the v3 spec.

Next up, the community is wrestling with fine‑grained access control beyond table‑level policies. It’s hard and will take time, but the outcome should be better for everyone.

Give us your view on the future of data engineering.

Data volume is exploding, including unstructured data that’s now usable. You can’t hand‑build every pipeline. Demand is also exploding as agents query more things in more ways. Teams must operate at a higher level: automate, standardize, and reduce bespoke pipelines.

Expect more shared semantic models across consumers and packaged semantics coming from systems like SAP. You’ll also build data‑engineering agents to do work and monitor pipelines. The role looks more like architect and manager, allocating budgets, deduplicating work, and—most importantly—deeply understanding the business. The best data engineers shift from code output to data products, with clear semantics and context.

Talk more about context.

The day‑to‑day activity shifts, but the output is still data products. Great data products come with instructions, definitions, lineage, quality expectations, and how to get correct answers to common questions.

We need that context captured where work happens—models, visualization, quality systems—and made available everywhere: catalogs, agents, and UIs. As you build, you should also document, and those semantics should flow consistently into tools like Snowflake Intelligence so agents can reason correctly.

A big part of the challenge is selecting just‑enough context per question.

Chapters

00:01:50 — Chris’s path: RelateIQ, Segment, Snowflake
00:05:40 — Roles at Snowflake: product, solutions engineering, data engineering
00:09:00 — Snowflake and AI: foundations before ChatGPT
00:11:40 — Why keep ML and non-SQL work closer to governed data
00:13:40 — Applica and Neeva acquisitions, enterprise search context
00:14:50 — Two big AI influences: chat with data and code generation
00:16:50 — Scaling agents while preserving governance and cost controls
00:18:40 — Why governance must live at the data layer (roles, rows, columns)
00:22:00 — Inside vs. outside Snowflake: how agents assume roles
00:23:02 — Cortex: higher-level APIs over many LLMs
00:24:06 — AI SQL: joins/where by intent and the non-determinism tradeoff
00:27:40 — Cost models, tokens, and guardrails
00:29:10 — Snowflake Intelligence: agents over a governed foundation
00:32:10 — Open formats and Iceberg: Why Snowflake leaned in
00:36:00 — Standards tradeoffs: variant type and community progress
00:38:40 — Fine-grained access control for Iceberg: thorny but necessary
00:40:40 — The future of data engineering: scale, unstructured data, agents
00:43:20 — No more bespoke pipelines; standardized models, and semantics
00:44:50 — Data engineers as architects and business partners
00:50:00 — Code vs. context: data products and shared semantics
00:53:10 — Capturing context where work happens (models, viz, quality)
00:55:00 — Selecting just enough context for agent reasoning
00:56:30 — Closing

This newsletter is sponsored by dbt Labs. Discover why more than 60,000 companies use dbt to accelerate their data development.

Book a demo

Building a multimodal lakehouse for AI (w/ Chang She)

Dan Poppy — Sun, 23 Nov 2025 14:03:30 GMT

Welcome back to The Analytics Engineering Podcast! Last season, we explored a host of topics on the developer experience (something the dbt Labs crew has been pretty vocal on recently). This season, we’re expanding that theme to look at how the current data landscape is impacting the developer experience. Open data infrastructure is on the rise; AI is pushing teams to rethink how data is modeled, governed, and scaled; and the developer experience is evolving.

In this episode, Tristan Handy sits down with Chang She—a co-creator of Pandas and now CEO of LanceDB—to explore the convergence of analytics and AI engineering.

The team at LanceDB is rebuilding the data lake from the ground up with AI as a first principle, starting with a new AI-native file format called Lance and building upward from there.

Tristan traces Chang’s journey as one of the original contributors to the pandas library to building a new infrastructure layer for AI-native data. Learn why vector databases alone aren’t enough, why agents require new architecture, and how LanceDB is building a AI lakehouse for the future.

Please reach out at podcast@dbtlabs.com for questions, comments, and guest suggestions.

Check out the dbt VS Code extension

Listen & subscribe from:

Key takeaways

Tristan Handy: You’re the founder and creator of the Lance file format and LanceDB. Before diving into vector search and vector databases, tell us about your background.

Chang She: I love talking to analytics engineers because that’s my background. I started about 20 years ago in quantitative finance. As a junior analyst, you do a lot of data engineering and analytics, which got me into open-source Python. I became one of the co-authors of the pandas library—initially to solve my own problem of not wanting to do analytics engineering in Java or VBScript.

You worked for a hedge fund?

Yes, AQR.

Did they know you were contributing to pandas? Hedge funds aren’t known for open source.

My roommate and colleague at the time was Wes McKinney. He showed me a proprietary Python library he was working on. It was life-changing. I started using and contributing. He spent about six months convincing the fund to open-source it. This was around 2010, and they were ahead of the industry in that respect.

I didn’t know pandas started at AQR. That’s fascinating. So much of your circa-2010 analytics work was done in early pandas?

Exactly. We went through several iterations, even debated the name. Because it was a hedge fund, there was a lot of econometrics and “panel data,” so Wes named it “pandas” for panel data analysis.

That origin story isn’t widely known. You then founded two companies, sold one to Cloudera, and were there during an interesting time.

Wes and I created DataPad—cloud BI before cloud BI really took off—and sold it to Cloudera. I spent about four and a half years in the Hadoop “big data” world, where I met my co-founder. He worked on HDFS at Cloudera, and several ex-Cloudera folks are at LanceDB today. After that I moved into machine learning at Tubi TV, working on recommender systems, ML serving, and experimentation/AB testing. That exposed me to embeddings. We dealt with videos, poster art images, and synopses—data that doesn’t fit neatly into pandas or even Spark data frames. That inspired me to build better infrastructure for these data types—what we now call “classical” machine learning—which led to LanceDB.

So that’s our bridge to vectors. You experienced these problems at Tubi, then founded the company. And Tubi used dbt?

Heavily. Thank you for creating it—it was critical to our stack.

Give us a non-technical intro: what are vectors used for?

Many people focus on the latest models and techniques. My perspective: everyone has access to similar models—your differentiation comes from your data and how effectively you connect data to AI. Vectors are a way to represent any kind of data in a form models understand: high-dimensional arrays of floating-point numbers—1,500, 3,000 dimensions, etc. Early statistical models might have a few interpretable dimensions; now you can have thousands where individual dimensions aren’t necessarily interpretable, but the space captures semantics.

Beyond RAG, vectors power internal model representations, recommender systems, and personalization—the original mainstream use case.

Search is also a good use case. How is vector search different from full-text search or Command-F?

Full-text search (e.g., Elasticsearch) returns documents containing the exact terms you searched. If you search for “customer,” it finds “customer/customers,” but might miss “user,” “adopter,” “organization,” etc. Vector search uses dense representations where semantically similar words and documents live near each other in high-dimensional space. Search for “customer,” and you get results that include semantically related terms.

Would you combine vector and full-text search?

Yes—hybrid search. Early RAG demos often used pure vector search for speed. Now enterprises need production-grade relevance. Many combine keyword and vector search with a re-ranking step to reach higher precision/recall.

Early RAG pipelines often chunk text, embed, and call it done. But more thoughtful pipelines do something closer to feature engineering, right?

Absolutely. Thought goes into what you feed the embedding model. For example: add a document- or section-level summary alongside each chunk before embedding; include multimodal features—artistic descriptions, literal captions, tags; create multiple embedding columns (e.g., different prompts/modalities) and search across them with re-ranking. High-quality retrieval requires feature-engineering-like decisions before embedding.

Let’s talk vector file formats (Lance) and vector databases (LanceDB). My crude belief: a vector database is a standard database with additional indexes. True?

Not wrong, but my hot take: with Lance and LanceDB, we’re building a lakehouse for multimodal data that includes vectors. Many “vector databases” are optimized only for vectors and struggle with other data types and workloads. The category needs to evolve—either toward new-generation search engines or new-generation lakehouses. We set out from day one to build the broader lakehouse, not just a vector index.

Outline your AI-enabled data lake vision. I’m familiar with Snowflake and Databricks’ lakehouse. How do you see the world differently?

We assumed everyone would use Parquet and tried for months to support AI workloads—search, training, preprocessing—on it. We couldn’t make it work well. Talking to computer-vision and ML practitioners, no one had something effective. That gave us confidence to build a new format.

In AI you manage vectors, long documents, images, and videos. The first problem is storage. With Parquet, mixing wide blob columns with narrow metadata columns leads to out-of-memory issues due to row-group design. If you shrink row groups to fit blobs, read performance tanks.

Even once data is in Parquet, AI needs random access and secondary indexes. Parquet doesn’t support efficient random row access: retrieving scattered rows forces reading entire row groups. With media, that’s prohibitively expensive—both for search and for training (e.g., global shuffle). Data evolution is also hard: with table formats like Iceberg, backfills often mean copying entire datasets. Copying petabytes of media is a non-starter. These issues motivated Lance.

I have a good mental model of Parquet with structured data. With images or video, do you put them in blob columns?

Yes. We use Apache Arrow types. Images/audio/video are large binary columns. Vectors are fixed-width list columns (e.g., 1,536-dimensional). But Parquet’s row-group mechanics and lack of random access make these workloads painful.

So Lance was the first thing you built. It has solid traction on GitHub. Who uses a file format—users or vendors?

Both. Frontier labs use Lance to store training data—e.g., for image/video generation—replacing stacks like TFRecords, WebDataset, Parquet, and BigQuery. Large tech companies and vendors also build on Lance: Databricks, Tencent, Alibaba, Netflix, NVIDIA, Uber, among others.

Databricks uses Lance?

For parts of their AI-specific offerings.

You’ve raised several rounds—the format is Apache-2 licensed. How do you commercialize?

Our commercial offering is a data platform for large-scale AI production: vector search, data preprocessing, training/serving cache, and an analytics engine for curation and exploration. It supports ML training workflows and AI application development, solving the hard distributed-systems problems along the path. We partner closely with big vendors; we’re generally not competitive because goals and customer bases differ. Cloud providers seek platform consumption; we focus on an AI-optimized data platform for specific workloads and users.

The commercial product is called LanceDB, but you prefer to position it not just as a database.

Right—we’re an AI-native data platform/lakehouse for multimodal data, with Lance as the common format.

How does this space play out over the next two to three years?

Two big predictions. First, multimodal will be 100× bigger—more usage and more data. Audio is exploding; video generation is resurging; robotics is next. Second, our data infrastructure isn’t ready for agents driving search and retrieval.

Let’s unpack both. On multimodal: unlike structured analytics, where every company needs it, multimodal workloads seem concentrated. Do all enterprises really need this?

I think every enterprise becomes multimodal. Take insurance: tons of documents to digitize, extract, search, and analyze; drones capturing images/video to assess risk and improvements over time. Existing businesses become more efficient; AI-native entrants gain structural advantages. Multimodal data underpins both.

It’s a heavy lift. Will every Fortune 500 insurer build these capabilities in-house, or will vendors package them?

Likely both—just like analytics engineering emerged as a role, with adjacent talent re-skilling. We see the same with AI engineering.

What titles are hands-on with your product?

AI researchers and AI engineers. Many app developers building AI features now carry the “AI engineer” title.

On agents: how do their access patterns change platform requirements?

RAG was one-shot: ask, retrieve, answer. Agents iterate: they decompose problems into sub-questions, refine queries and results, and run many steps in parallel. Load skyrockets—humans type slowly; agents can issue hundreds of queries simultaneously. Queries are more varied and selective, and agents are creative in combining modalities and sources: schemas, SQL over structured data, prior analyses and charts, document stores, image/video metadata, etc.

Traditional vector databases aren’t designed for this breadth and scale. If you bolt together multiple specialized systems, your “agent stack” balloons into a maintenance nightmare. Our approach: put all data in one place with a single system that supports vector search, keyword search, filters, key-value lookups, re-ranking, analytics, and efficient random access—on top of an AI-native file format (Lance).

For listeners whose curiosity is piqued, any resources you recommend?

Chang She: Yes—our blog series by Weston Pace, the tech lead for Lance format. It dives into encodings, I/O, and has great reads for analytics engineers: lancedb.com/blog .

Chapters

00:00 – Intro: Analytics meets AI
03:20 – Chang’s background and how Pandas began
06:40 – Lessons from Cloudera and metadata
08:30 – Multimodal data and LanceDB’s origin story
10:00 – Why vector search matters (beyond RAG)
12:00 – What are vectors and why do we use them?
15:00 – Full-text vs vector search
18:00 – Feature engineering in AI use cases
21:15 – Lance format
28:00 – Storage, scale, and the problem with Parquet
35:30 – Building a business on open source
41:00 – Two big bets: multimodal data and agents
46:00 – Every company will become multimodal
50:00 – Agent access patterns will redefine data
54:00 – Why dbt-style workflows matter now more than ever

This newsletter is sponsored by dbt Labs. Discover why more than 60,000 companies use dbt to accelerate their data development.

Book a demo

Agentic coding in analytics engineering (w/ Mikkel Dengsøe)

Dan Poppy — Sun, 07 Sep 2025 12:01:00 GMT

What does agentic coding look like in analytics engineering? Mikkel Dengsøe, co-founder at SYNQ, recently wrote a series of posts on his experiences as an analytics engineer with agentic coding tools. In this episode of The Analytics Engineering Podcast, he walks through a hands-on project using Cursor, the dbt Fusion engine, the dbt MCP server, Omni’s AI assistant, and Snowflake.

Tristan and Mikkel cover where agents shine (staging, unit tests, lineage-aware checks), where they’re risky (BI chat for non-experts), and how observability is shifting from dashboards to root-cause explanations delivered to the right person at the right time. Along the way: practical prompts, why “one model at a time” keeps you in control, and a testing philosophy that avoids alert fatigue while catching what matters.

To see real-world use cases of agentic coding and to learn directly from data and AI leaders, join us at Coalesce 2025 in Las Vegas, Oct. 13-16.

Please reach out at podcast@dbtlabs.com for questions, comments, and guest suggestions.

Listen & subscribe from:

Key takeaways

Can you talk a little bit about your background?

Mikkel Dengsøe: Yeah, so I can start from the beginning. I've been in data for, I think it's coming up to 15 years now, and started my career in data at a Danish shipping company, which was very much zero to one. When I came in, there was no data warehouse, and the only way we could know how many containers were shipped was by an IT guy pulling that out of the system every six months. I then spent two years there building up their data warehouse on SQL Server, which was super fun. After that, I spent five years at Google, which was a very different gear.

That's a natural transition. Just global shipping company straight to Google.

Exactly. And that was very much a hundred-to-end where, in my case, I worked with the ads data and you get a perfectly curated data table that you can work with and everything kind of works. Then after that I joined a company called Monzo. For those who are not familiar, it's a scaling fintech out of the UK and that was very much the one to a hundred. When I joined we were 30 data people, but scaled to a hundred over two years. We had 10,000 dbt models and we built every internal tool under the sun for dbt. Super interesting. And then three and a half years ago I went on to found SYNQ alongside Peter and Steve, which is a data observability platform.

Tell us a little bit more about SYNQ.

We are a data platform that primarily works with companies using tools like dbt already, but have issues going from important data to business-critical data. That might be customer-facing dashboards, machine learning models, or something else. They want better monitoring—we often deploy anomaly monitors—and they also want workflows such as incident management for when things go wrong. We were founded in 2022, so now we're in early stages of working with scale-ups and startups, and now also onboarding enterprises and larger companies. It's been a fun journey.

In your series of blog posts, you went through the modern data stack and said, “What's the most current version of this tool and how effectively can I AI-ify that?” Whether that's using Cursor to build dbt models or using the agent experience inside of Omni—what made you decide to get into this and write about it?

The first part of it is just: it's super fun to tinker with these tools and try them out. It's magic. And we were also building an MCP server at SYNQ, so I had a lot of interest in seeing how it works with others and what we can learn. It was also driven by being able to have conversations with our customers. When they ask about it, being able to speak from the point of view of having actually tried this and seen what works and what doesn't.

The early days of using Redshift were such a visceral experience relative to what came before. If I hadn't interacted with it directly, I wouldn't have understood how big a state change cloud data was. This feels like another one of those moments: if you don't have hands-on experience, you're not going to really get it. Fair?

Spot on. And I think pretty much every data team should be doing this unless they have a very good reason not to. The risk and the stakes can be pretty low if you use it for internal workflows like data modeling and writing tests. You're still in control. I recommend everybody do it.

What tasks did you try to accomplish?

It's three different blog posts: the data modeling part, the testing part, and then exposing it in Omni's AI agent where people can ask questions about the data. There's a fourth post: once the data is live, how can you use the SYNQ MCP to do things like root-cause analysis and planning changes. I started with data modeling. I had raw data from different JSON sources, some XMLs, some profiles—extracted and put into Snowflake—and then did the data model.

So the data was already loaded into Snowflake?

Yeah, exactly. For the data modeling, I started from the sources and then worked through staging, marts, and finally metrics using the semantic layer. Each step looks a little different when you use AI tools because the behavior differs. In terms of tooling, I used Cursor with the dbt-MCP plugged in. If you're not familiar, dbt-MCP lets you, via prompt, interact with dbt tools—execute dbt build, get models, or get everything upstream of a given model—so you can chain work without explicitly doing it.

Cursor + dbt-MCP. What model did you use?

I just used the default in Cursor, which I believe is Claude. There's an important distinction: Cursor is really good at writing code, but it can't execute queries on your behalf. If you want to extract raw data and query Snowflake to get rows out, you have to do that in Claude Desktop. That became key. Early on, as I built models, the first thing I did was get a snapshot of sample data from Snowflake—10,000 rows of a source. I fed that into Cursor and said, “These are examples of what this data looks like.” Using that data, Cursor could model in a clever way. For example, a column called quarter like “2025 Q1”—Cursor understood to translate it into a datetime and do the transformations.

I've used the dbt MCP server a decent amount—less in Cursor, more in Claude Desktop. Your stack was Cursor + Claude models + Claude Desktop. And Cursor cannot directly execute queries in Snowflake, but Claude Desktop can. Is that because there’s tool use Claude has that Cursor doesn't?

I believe so. In Claude Desktop, if you write queries against dbt-MCP, Claude can visualize a graph, show outputs of a SQL statement, etc. Cursor, as far as I know, couldn't. My middle ground was to take sample data out of Snowflake, put it into a CSV, and feed that back into Cursor so it could look at raw data.

As part of its own context window?

Exactly. That was key for my workflow. Then when I wanted to write unit tests, I could use real data examples from the sample. Or when automatically documenting the data, I asked Cursor to specify examples in the docs based on the most common occurrences within a column. Letting Cursor peek at raw data was a core pillar.

It's a little hacky, right? Cursor should really be able to interact directly with Snowflake or Databricks to investigate the shape of the data. Agents should be empowered to do that.

I would say so. There might be a way I didn’t know about, but I patched the gaps by uploading into the context window.

So that's the state of the art today.

Seems so. To be clear, I think the limitation is IDE differences—Cursor vs. Claude Desktop—rather than dbt-MCP itself.

Once you had sample data in context, did you have to suggest conversions, or did it naturally do them?

It got the defaults pretty right, but I guided it on what I wanted from the source data. I wanted control over everything, so I asked it to do one model at a time rather than auto-generate a whole stack. That way I could review each step and stay in control.

Your prompt workflow was “Build me a model with this name that stages the data from this table,” basically?

Yeah. When it proposed code I didn't like, upstream it was usually simple (regex to parse dates, etc.). Downstream, in marts and metrics, I started describing my ideal data product: user jobs-to-be-done and the final output. That’s when Cursor got creative and invented metrics I hadn’t anticipated—like “apartment price relative to time on market.” I pruned ones I didn’t want, but some were good surprises.

Which layer did it help most?

Testing. Modeling was good—especially staging—but testing accelerated significantly. SQL is a bit like English; for simple datasets you can express intent easily. Testing can be much harder and more verbose.

Roughly how much more effective did you feel?

Modeling: multiples faster. It nailed the tedious parts—regex, casting, pass-throughs—so staging/intermediate layers flew. In marts/semantic metrics, the benefit was brainstorming. It helped me think of metrics I wouldn't have.

Did the dbt Fusion engine help?

Yes. Fusion shows lineage and whether a column is pass-through. For example, if a column is pass-through with no transforms, don't add another not_null or unique if there's one upstream. I bounced between the IDE to check this and codified it as a testing strategy. That's already top-10% testing hygiene.

Any MCP feature requests surface?

The more context and tools the agent has, the more it can do. In the fourth post, for root cause analysis, we used the SYNQ MCP. We collect all your Git commits and have history, so the agent could correlate recent code changes with incidents. Requests depend on the job at hand.

Let's move to testing—why was it the most additive?

Testing is hard; many teams don't know how to do it and alert fatigue is common. A huge share of tests we see are not_null/unique, which doesn't reflect real data risks. First thing I did in Cursor for testing was provide our internal testing philosophy as guidelines: test heavily at the source, don't retest pass-through columns, focus on business and metric anomalies in marts. That worked really well. For sources and staging, it generated relevant tests. Then for marts, I asked for unit tests and gave it a thousand sample rows from Snowflake. It wrote very relevant unit tests I’d otherwise spend a lot of time on.

Examples?

Simple ones like: when you pass a string value in the date column, does it transform correctly to datetime and match the expected format? These just worked. Then at the metric level, it looked at raw data and proposed assumptions—like square-meter price should be between X and Y—sometimes segmenting by postcode. Very thoughtful, though I'd replace static thresholds with anomaly monitors so they don't go stale as prices move.

So at least 5× on testing?

At least. Apart from swapping static thresholds for anomaly detection, it nailed testing and did so in a lineage-aware, layer-appropriate way.

Tell me about the BI layer.

Many teams start at the BI layer with a chat interface. I think that's risky because it's used by business users and you only get so many chances before trust drops. I moved into Omni. You create a “topic” (a data model you can join with others) and then specify an AI context: instructions for how the LLM should behave. For example: if a user asks about price, always return square-meter price; never make up fields not present in the mart; if asked about provenance, mention the source. Writing AI context is a new skill for our industry.

Were you using Omni’s AI assistant to create assets faster, or to let users self-serve?

The latter—so users could ask questions instead of going to a dashboard. It could have been any BI tool with similar functionality; we just use Omni internally.

And how was the experience as a consumer?

Amazing when it works, but I'd hesitate to give my VP of Marketing access. It gets things wrong maybe one in five times, and it's not obvious why if you're not a data person. For analysts doing exploratory work, it's great—they can inspect and dig in. I wouldn't replace company-wide dashboards with a chat bot yet. Omni does log freeform queries and feedback, so there's a path to iterate the AI context over time.

The last thing you did was use AI plus SYNQ to monitor production infrastructure. What does observability look like in the future? Historically it's looked like dashboards—Datadog for data pipelines. Is it just more effective monitors, or fundamentally different?

Fundamentally different. We’re heading to a place where observability tools can tell you what's wrong at the right time, with just the right context, delivered to the right person—inside or outside the data team. Done well, there may be few dashboards; instead you get an LLM-summarized root cause delivered from a monitor that might be auto-created. Less “active tool you poke at,” more “proactive explanation.”

Still technical observability (pipelines/data issues), or business observability?

More the former. Teams at the edges—Sales Ops managing Salesforce, engineering teams creating web events—often need to be notified about data issues. Business KPI movements require a different experience for marketers, etc.

Automated remediation?

Gradual. You can imagine an issue occurs without a dedicated test; the system proposes a new test. But 80% of issues come from root systems elsewhere (someone typing in Salesforce), and closing that loop is still hard. In the article’s fourth part, we had a data issue and I asked the SYNQ MCP through Claude Desktop to do root cause analysis. It walked the same steps a data person would: inspect the model, check errors, examine lineage and upstreams, review recent commits, and documented each step to the root cause. That works now.

At the beginning you said there’s no good reason not to use these tools today. What reasons do you hear for not trying?

People are busy. But if you look at a risk curve, lowest risk is modeling and testing—you're in the driver's seat and it boosts productivity. Higher risk is replacing your BI tool with a chat bot; higher still is customer-facing experiences. The first two are hard to argue against.

Enterprise IT approvals might be one blocker—approved models, data access, etc.

True. For example, our MCP can query raw data to detect if an issue happens in a segment, and enterprises might hesitate there. Also, “MCP” as a term can be confusing. But it's actually simple and explainable, not a black box. Setting up dbt-MCP can still feel hacky in enterprises; if it lived natively in cloud environments, it’d be easier to adopt.

You can set it up locally—no permissions/procurement—and just play. We also shipped the MCP server as a remote MCP in cloud, though that introduces auth/permissions considerations.

If I had to pick a persona, it's the analyst. Analysts have had a tough decade: more tools, harder workflows, less time to tinker. MCPs and AI workflows are a turning point. At Monzo, we had a philosophy that you should be able to have an idea on your commute and have it implemented by midday. As we grew to 10,000 dbt models and long CI checks, that faded. I can see a world where this returns. MCPs can help. I'm excited.

I love that. Analytics engineers think “infrastructure, correctness.” Analysts think “idea to validation fast.” Excel was always the analyst’s best friend because it's fast and flexible. MCPs make it easy to plug tools together and get answers quickly again.

One company we work with—Voi, a scooter company out of Sweden—has a strong data leader, Magnus, who is very bought into metrics. Their data team doesn't produce dashboards; they produce metrics. In an AI world with MCPs, flows, and curves, that's a clear decision.

I believe there's no such thing as the wrong BI tool—different tools have different trade-offs. Probably true for models/IDEs too: Claude Desktop vs. Claude Code vs. Cursor—no single “right answer” as long as the underlying context and metric definitions are shared.

Agreed. What really matters across workflows: consistent metric definitions, documentation for columns and fields, and high-quality data. Those foundations matter even more when an LLM is in the loop; you may not have a human sanity-checking every result.

Chapters

00:00 — Tristan’s intro
01:10 — Mikkel’s background: shipping → Google → Monzo → SYNQ
03:08 — What SYNQ does (data observability for business-critical data)
04:15 — Running the experiment
06:23 — Scope: modeling, testing, BI agent, observability
07:17 — Tooling: Cursor + dbt MCP server + Snowflake + Omni
09:38 — Sampling real data into the agent’s context
13:14 — Modeling workflow: one model at a time
15:14 — Where agents help most: testing > modeling
18:10 — dbt Fusion engine: lineage-aware checks, fewer redundant tests
19:50 — Feature requests and root-cause via commit history
20:57 — Testing philosophy: source-heavy, pass-through aware, metric-level
22:49 — Unit tests from samples; thresholds vs anomaly monitors
25:10 — BI agents: great for analysts, risky for broad rollout
31:54 — The future of observability: explain first, dashboards second
36:10 — Adoption curve: safe places to start
40:49 — Analyst superpowers return
42:04 — Metrics over dashboards

This newsletter is sponsored by dbt Labs. Discover why more than 60,000 companies use dbt to accelerate their data development.

Book a demo

Under the hood of Apache Iceberg (w/ Christian Thiel)

Dan Poppy — Sun, 24 Aug 2025 13:03:00 GMT

If you're a data practitioner, you likely understand Iceberg as a user, why it's important, and how it's changing the way that we build data systems. But you may not know a lot about what going on beneath the surface.

There are multiple ways to interface with Iceberg catalogs, multiple versions of the Iceberg REST spec. There's several leading catalogs that implement that spec. All this in an ecosystem that includes companies of all sizes, in proprietary and open-source code, and in academic and commercial contexts.

In a few years, all this ambiguity will be behind us, but right now it's very much evolving in real-time. To get an update on the status of the Iceberg ecosystem and to walk through all the developments, Tristan talks with Christian Thiel. Christian is one of the lead architects of Lakekeeper, of one of the most widely used Iceberg catalogs.

Please reach out at podcast@dbtlabs.com for questions, comments, and guest suggestions.

Listen & subscribe from:

Key takeaways

Walk us through your background

Christian Thiel: I started in natural language processing, then moved into machine learning applications in manufacturing. Like many people, I realized that the biggest barrier wasn’t the algorithms but the data—its availability, quality, and accessibility. That led me deeper into data architecture and engineering, eventually to building Lakekeeper.

What is Lakekeeper, and what are you building now?

Lakekeeper is an Iceberg catalog implementation—a technical requirement for building distributed, composable analytic systems based on Apache Iceberg. But our vision goes beyond that. We see the future in data collaboration and reliable sharing of data, supported by clear contracts.

For listeners new to Iceberg, what makes it so important?

Iceberg allows organizations to store data once, in an open format, and then use the compute engine best suited for each workload. It’s a foundation for building modern, composable data platforms while avoiding vendor lock-in. If there’s one thing that should be open, it’s the data at the center of your platform.

Some folks might say this sounds like Hadoop all over again—lots of open standards that are hard to integrate. Why is this time different?

The ecosystem has matured. Even big vendors like Snowflake and Databricks are embracing Iceberg, which shows there’s a strong shift toward openness. Plus, the tooling and infrastructure are much easier to deploy today. A modern Iceberg setup is far less complex than a Hadoop environment used to be.

Let’s talk about what’s happening under the hood. How does Iceberg work?

Iceberg organizes data using a metadata hierarchy. At the top, there’s a JSON file that stores high-level table information: snapshots, schema, and locations. Below that are manifests and other layers that keep track of files. This hierarchy is what makes things like time travel, atomic transactions, and schema evolution possible.

What about ongoing maintenance?

There are two key tasks. First, expiring old snapshots so you don’t accumulate unnecessary files. Second, compaction—combining many small files into larger ones

Catalogs are another critical piece. What role do they play?

Catalogs manage the top layer of metadata and coordinate transactions. They make atomic updates possible, allow multiple writers, and handle governance—things like access control and multi-table transactions.

How enterprise-ready is Iceberg today?

Very ready. A year ago, there were still gaps, but today, performance and feature parity with native tables on platforms like Snowflake and BigQuery are strong. Governance and authorization models are still evolving, and different catalogs implement them differently, but the core functionality is there.

Speaking of catalogs, how should someone pick between options like Lakekeeper, Polaris, Unity, AWS Glue, or Gravitino?

Christian Thiel: It depends on priorities. Lakekeeper focuses on performance, extensibility, and ease of use. Polaris is developer-focused but less user-friendly. Unity is tightly integrated into Databricks. Glue now supports the Iceberg REST spec, which makes it more interoperable than before. Gravitino is another option aimed at enterprise-scale environments.

Recently, DuckDB announced DuckLake. What’s your take on that?

It’s interesting, but there are two concerns. First, it uses a database schema directly for the catalog, which creates interoperability issues—similar to the early JDBC catalog in Iceberg that the community eventually moved away from. Second, it was built without community involvement, and openness without adoption isn’t really openness.

That said, for heavy DuckDB users, it could offer optimizations that make queries extremely fast, and if the broader ecosystem adopts it, it could become a viable open format.

What’s next for Lakekeeper?

We’re continuing to invest in table optimization, enterprise features, and data collaboration tools. Our vision is what we call the “unbreakable lakehouse,” where contracts and collaboration guardrails make shared data more reliable. Long-term, we see Lakekeeper as enabling truly collaborative, open data ecosystems.

Chapters

00:00 – Introduction
Tristan Handy introduces the episode and the focus on Apache Iceberg.
01:40 – Christian Thiel’s background
From natural language processing to data engineering.
04:30 – Introduction to Lakekeeper
What Lakekeeper is and its role in the Iceberg ecosystem.
06:00 – Why Iceberg matters
How open table formats enable flexibility and reduce vendor lock-in.
11:40 – How Iceberg works under the hood
Metadata hierarchy, catalogs, and how state is managed.
21:30 – Maintenance and optimization
Snapshot expiration, compaction, and keeping tables performant.
24:20 – Catalogs and governance
Access control, multi-table transactions, and security.
31:40 – Enterprise readiness
How Iceberg is evolving for production use in large organizations.
42:10 – Choosing the right catalog
Overview of Lakekeeper, Polaris, Unity, Glue, and Gravitt.
47:20 – DuckLake discussion
Pros, cons, and ecosystem adoption challenges.
52:00 – The future of Lakekeeper
Data contracts, collaboration, and building the “unbreakable lakehouse.”

This newsletter is sponsored by dbt Labs. Discover why more than 60,000 companies use dbt to accelerate their data development.

Book a demo

The pragmatic guide to AI agents in the enterprise (w/ Sean Falconer)

Dan Poppy — Sun, 03 Aug 2025 13:02:53 GMT

What does it mean to be agentic? Is there a spectrum of agency?

In this episode of The Analytics Engineering Podcast, Tristan Handy talks to Sean Falconer, senior director of AI strategy at Confluent, about AI agents. They discuss what truly makes software "agentic," where agents are successfully being deployed, and how to conceptualize and build agents within enterprise infrastructure.

Sean shares practical ideas about the changing trends in AI, the role of basic models, and why agents may be better for businesses than for consumers. This episode will give you a clear, practical idea of how AI agents can change businesses, instead of being a vague marketing buzzword.

Please reach out at podcast@dbtlabs.com for questions, comments, and guest suggestions.

Listen & subscribe from:

Key takeaways

Sean, can you give us the TLDR on your career and what you're working on today?

Sean Falconer: I've always worked at the intersection of data, engineering, and AI. From academia studying computer science, into industry as a founder, then to Google, I worked on conversational systems and privacy/security in AI. Currently, at Confluent, I'm leading our AI product strategy, balancing both technical and go-to-market roles.

You moved from being deeply technical into marketing and sales. What drove that transition?

I was forced into it as a founder. Initially uncomfortable, but it taught me huge respect for marketing and sales. I had to learn by making many mistakes, eventually building out entire marketing and sales functions. I realized how challenging and critical these roles are.

You were at Google before ChatGPT launched. Did you foresee the transformative nature of these technologies?

Honestly, no. Having seen earlier disappointments in conversational AI (like Microsoft's Alice), I was skeptical initially, even as ChatGPT emerged. It wasn’t obvious we'd soon experience this revolution.

You’ve written about three waves of AI. Can you describe these?

Yes. Wave one was predictive AI, traditional ML models trained for specific tasks like fraud or spam detection—effective but rigid. Wave two introduced generative AI, or foundation models, trained on vast general datasets, flexible but lacking specific business context. The third wave, agentic AI, involves AI systems that can reason, dynamically choose tasks, gather information, and perform actions as a more complete software system.

Do foundation models replace traditional ML methods?

Sometimes they can, but it doesn’t always make sense. An LLM might do sentiment analysis well enough, but a traditional model may be more efficient and cheaper. Think of using an LLM as cutting steak with a chainsaw—possible, but unnecessary.

Let's clarify "agents." What makes software truly agentic?

It’s software that can dynamically decide its own control flow: choosing tasks, workflows, and gathering context as needed. Realistically, current enterprise agents have limited agency to ensure reliability. They're mostly workflow automations rather than fully autonomous systems.

You mentioned a spectrum of agency. Is this similar to autonomy in self-driving cars?

Exactly. Highly autonomous agents are appealing but not practical yet. Most enterprise success stories involve structured workflows with clearly defined boundaries.

Why have agents taken off more in enterprises than consumer apps?

Enterprises have many well-defined, high-value tasks perfect for automation. Consumer scenarios demanding high agency—like planning complex trips—are still too unreliable. Enterprises can benefit significantly even from limited agentic capability.

Is an agent just a microservice?

In many ways, yes. An agent functions like a microservice with extra capabilities (using LLMs for decisions). Deployment considerations like state management and long-running tasks differ slightly, but fundamentally it’s similar.

What tools and frameworks help build effective agents?

Start with frontier models like GPT-4 or Claude. Frameworks include LangChain, Microsoft Autogen, and CrewAI. But for real-world deployment, treat it as rigorous software engineering with observability, scalability, and robustness in mind.

Are organizational barriers bigger than technical challenges?

Yes. AI efforts are often mistakenly tasked to data science teams rather than cross-functional software teams. Successful companies create dedicated teams blending software engineering skills and data expertise to build reliable agentic systems.

What pitfalls should teams avoid?

Avoid building monolithic agents. Break systems into smaller, well-defined units in a multi-agent architecture. Use event-driven frameworks to avoid rigid, hard-to-maintain dependencies.

Chapters

[00:00] Introduction: What's all the hype about agents?
[01:10] Meet Sean Falconer: A journey from engineer to AI strategist
[04:10] Learning marketing as an engineer-founder
[05:50] Inside Google's AI efforts before ChatGPT
[09:00] What does it mean to run AI strategy?
[10:45] Three waves of AI: Predictive, Generative, and Agentic
[16:30] Will foundation models replace traditional ML?
[18:30] Defining agents clearly: Beyond the buzzword
[22:00] The spectrum of agency: From controlled workflows to open-ended tasks
[25:30] Why agents fit better in enterprises than consumer apps
[28:00] Agents as microservices: A practical view
[35:00] What tech stack is needed to build effective agents?
[37:50] Organizational challenges in adopting agents
[39:30] Models that are favorites for developers
[43:30] Why software engineers are best placed to build agents
[46:00] The technical stumbling blocks in building agents
[48:00] Concluding thoughts: Beyond POCs to production agents

This newsletter is sponsored by dbt Labs. Discover why more than 60,000 companies use dbt to accelerate their data development.

Book a demo

How Amazon S3 works (w/ Andy Warfield)

Dan Poppy — Sun, 20 Jul 2025 12:02:56 GMT

In this season of the Analytics Engineering podcast, Tristan is deep into the world of developer tools and databases. If you're following us here, you've almost definitely used Amazon S3 it and its Blob Storage siblings at Microsoft and Google. They form the foundation for nearly all data work in the cloud. In many ways, it was the innovations that happened inside of S3 that have unlocked all of the progress in cloud data over the last decade.

In this episode, Tristan talks with Andy Warfield, VP and senior principal engineer at AWS, where he focuses primarily on storage. They go deep on S3, how it works, and what it unlocks. They close out talking about Iceberg, S3 table buckets, and what this all suggests about the outlines of the S3 product roadmap moving forward.

Please reach out at podcast@dbtlabs.com for questions, comments, and guest suggestions.

Listen & subscribe from:

Key takeaways

Operating systems, garage sales, and Xen

Tristan Handy: You’ve done a lot over the last 20 years. Before we get into specifics, can you just share a little about your journey as a software engineer?

Andy Warfield: I just like playing with computers. I studied computer science in Ontario for undergrad, then moved to Vancouver for grad school, then to the UK for a PhD. I worked on operating systems, low-level stuff. I got to work on a hypervisor called Xen, which ended up being used by a lot of cloud providers, including Amazon.

After that, I did a couple of startups, one around Xen. Then I became a professor at UBC, teaching operating systems, networking, and security. Later, I did another startup in storage, and eventually I joined Amazon.

Now I have this highfalutin role—VP and engineer—working across S3, other storage services, and now a bunch of analytics services too. I get to cause trouble in lots of different parts of the cloud.

VP slash distinguished engineer—does that mean you just get to march around telling people how to improve their stuff?

People love that! I’d say about half the time I’m causing trouble—starting things and encouraging new ideas—and the other half I’m helping teams dig out from those ideas. Sometimes I take over a team if we’re doing something especially interesting or innovative, just so I can be closer to the action.

That sounds like a pretty good gig if you can get it.

It’s amazing. I’ve been here nearly eight years, and I still love this job.

The rise of virtualization and the origin of Xen

I want to talk about Xen. You said you were always interested in operating systems, which is kind of a niche fascination. What drew you in?

When I was a kid, we didn’t have much money, so I built computers from garage sale parts in Ottawa. In high school, I found this federal government warehouse that sold off old equipment. I started a little business buying pallets of hardware for cheap, fixing them up, and reselling.

It was chaotic—but I learned a lot. I dealt with machines like IBM DisplayWriters with 8-inch floppy disks and massive dot-matrix printers. Getting them working meant diving into their software and systems.

Eventually I played with Linux, hacked on the kernel, and that all led me into OS research and development.

Tristan: So what is a hypervisor, and why did virtualization become so important in the 2000s?

Andy: There were two big drivers: server utilization and isolation.

Companies had racks full of 1U servers, most of which sat idle most of the time. But they couldn’t share workloads because apps weren’t isolated well—config conflicts, shared resources, etc.

Virtualization allowed multiple operating systems to run on the same hardware, with isolation. It also let you consolidate servers, which had big cost and efficiency benefits.

There was also a technical challenge: x86 processors weren’t designed to be virtualized. That made it a really interesting research problem. We wanted to see if it could even be done—and done efficiently.

Tristan: And Intel eventually started building virtualization support into the hardware?

Andy: Exactly. Our work on Xen and similar projects showed it was possible. That pushed Intel and AMD to add features like VT-x, which made it easier and more performant to run hypervisors.

Tristan: How did AWS end up using Xen?

Andy: I wasn’t part of those internal conversations, but the story goes that a small startup in Cape Town, South Africa, was building a control plane for Xen. That team got picked up by AWS and became the basis for EC2.

Understanding Amazon S3

Tristan: Let’s switch to S3. I think a common mental model is that S3 is just a big pool of SSDs. But that’s clearly not the whole story. How do you explain what S3 actually is?

Andy: That’s one of my favorite questions.

Early on, S3 was like a storage locker. You’d rent space to stash things you didn’t need right away—backups, static files, CDN origins. Latency wasn’t great, but durability and availability were.

Things really changed when the Hadoop community built S3A—an adapter to let Hadoop use S3 instead of HDFS. Suddenly, we had people doing real analytics on S3. The system had enough drives to support massive parallel reads.

Today, workloads are way more demanding. Performance, consistency, and latency matter. We’ve been evolving the system constantly to meet those needs.

Tristan: Are we talking about billions of hard drives?

Andy: I can’t share exact numbers, but yes—it's a lot of hard drives. Some of our largest customers have data spread across millions of drives. And most drives are shared across multiple customers.

Tristan: And these aren’t SSDs?

Andy: Mostly spinning disks, actually. Hard drives are terrible at latency, but they’re cheap and good for bursty workloads. Spreading your data across many disks lets you take advantage of parallelism.

S3’s durability, performance, and scale

Tristan: Let’s talk about S3’s durability promise: 11 nines. How do you achieve that?

Andy: We use erasure coding—a form of RAID-like redundancy that lets you split data into parts and parity blocks. Then we store those shards across different availability zones.

We constantly monitor for failures. Disks die all the time, so we have fleets of processes repairing and maintaining durability. It’s not static. It’s a living system.

Tristan: You must have incredibly precise failure models.

Andy: We do. We track failure rates, temperature sensitivity, vendor behavior—everything. That allows us to be proactive and surgical in how we manage risk.

From Parquet to Iceberg to S3 table buckets

Tristan: I want to talk about table formats. Parquet is everywhere now. And then we got Hive Metastore, then Iceberg. Why did S3 launch table buckets?

Parquet is great, but it’s just files. Customers kept asking for more structured semantics: schema evolution, upserts, ACID transactions.

We saw Iceberg adoption grow rapidly—especially among our largest analytics customers. But they were struggling with operational complexity: too many small files, custom compactors, brittle catalogs.

So we launched S3 table buckets to bring native Iceberg support to S3. That includes:

Automatic compaction
A REST catalog
High-performance access

We wanted to make it easier to treat Iceberg as a storage primitive, not just an analytics backend.

So this is a shift in philosophy—S3 isn’t just object storage, it’s now table-aware?

Exactly. Historically, S3 was just where you stored objects. Now, we’re thinking more about what those objects mean.

We also launched S3 object metadata tables—a way to semantically describe and query your object store, especially useful for AI workloads using retrieval-augmented generation (RAG).

The future of open data and S3

What does the future of S3 look like? Where’s this going?

We’re headed toward more structure, more semantics, and more performance.

Inference workloads are scaling fast. AI models are hitting S3 hundreds of thousands of times per second to do vector lookups. That’s changing how we think about indexing, metadata, and latency.

We want to make S3 the best place to do open, flexible, high-scale data work—from tables to training data to retrieval.

Chapters

[01:42] Meet Andy Warfield

Andy shares his background, including startups, professorship, and his current role as VP & Senior Principal Engineer at AWS.

[05:10] From garage sales to hypervisors

Andy describes his early passion for hardware, OS development, and the origin story behind the Xen hypervisor.

[08:50] Why virtualization took off in the 2000s

Exploring why isolation, utilization, and technical curiosity fueled the rise of hypervisors.

[14:30] Xen vs. VMware and the road to AWS

How Xen became the default for EC2 and the technical differences between virtualization approaches.

[17:35] The origin of EC2 and S3

How a team from Cape Town helped launch AWS compute—and the early days of cloud services.

[20:00] What is S3, really?

Andy breaks down the mental model behind S3: not just object storage, but a scalable data platform.

[22:49] How many drives? More than you think

Why S3 storage spans millions of drives—and how AWS uses scale to deliver performance.

[28:10] The 11 nines durability model

Inside S3’s approach to reliability, failure tolerance, and background repairs using erasure coding.

[32:00] Tail latency and engineering for bursty workloads

Why slow requests matter, and how S3 teams optimize for streaming, AI, and analytics use cases.

[35:20] Iceberg, metadata, and table buckets

The emergence of Apache Iceberg as a table format—and AWS’s new structured storage approach.

[38:00] Why S3 added a REST catalog and compaction

How AWS is simplifying the operational burden of working with Iceberg at scale.

[40:00] A new mental model for object storage

S3 is no longer just about storing files—it’s about managing semantics, lineage, and trust.

[44:00] Looking ahead: S3, RAG, and semantic metadata

How S3 is preparing for the next wave of AI, inference, and context-aware applications.

[47:20] Is Iceberg ready for enterprise?

Andy shares thoughts on enterprise readiness, performance tradeoffs, and real-world adoption of table formats.

[49:05] Wrap-up and reflections

Tristan and Andy reflect on the conversation and where data infrastructure is headed next.

This newsletter is sponsored by dbt Labs. Discover why more than 50,000 companies use dbt to accelerate their data development.

Book a demo

It is time to take agentic workflows for data work seriously

Jason Ganz — Sun, 29 Jun 2025 12:53:17 GMT

Last week, I cleared two hours on my calendar to do a deep dive into the current state of agentic development for data work.

Specifically, I gave myself a challenge - could I go from a never-before-seen dataset to a production-ready Semantic Layer using a combination of tools:

An agentic coding CLI (I used Claude Code for this experiment)
The dbt MCP server
A terminal interface (in this case Warp)

Before we go any further, if this is at all interesting to you, I suggest that instead of reading my findings here that you sit down and try this yourself. I'm quite confident you'll find it both illuminating and worth your time.

We'll get to my findings in a bit. Long story short - it was successful enough that it shifted my thinking about the near-term trajectory of data work.

But first, let's talk about why experiments like this matter so much right now.

Sensemaking in the age of AI

You've probably been hearing some variant of these takes multiple times a day:

"An agent is just an LLM run in a loop"

"AI agents are coming to replace white collar work"

"I don't even know what an AI agent is, this is just marketing hype"

And about a billion more. All of these represent our collective attempts at sensemaking in this unique technological moment. But honestly, the noise can be so overwhelming that it's tempting to just tune it all out and wait for the dust to settle.

I don't think that's an option for data practitioners. Instead, we need to develop our own internal compass for sensemaking - and that means getting our hands dirty.

To do great data work is to be a great sensemaker. My theory of sensemaking requires holding two paradoxical skills in tension:

Build strong mental models about the world and use them to take decisive action
Constantly scan for misalignments between your models and reality, then adjust accordingly

Organizations and institutions need time to metabolize change and adjust their mental models. There's a physics to it. And that physics takes time.

But when the underlying reality is changing rapidly, the best thing you can do is go make direct contact with that reality. Don't wait for the consensus to form - go see for yourself.

Because things are not the same as they were even 6 months ago:

We've gotten the first wave of models optimized for agentic work (OpenAI’s O3, Claude 4 and Gemini 2.5)
We've started building real infrastructure to connect these models to our systems (MCP and other emerging protocols)
LLM-based coding has shifted from autocomplete to actual agents (something longtime Roundup readers saw coming)

That’s a bunch of big changes! It can sometimes feel like keeping up with everything here is a full time job. And with my last couple months being pretty tied up with other things I felt like I owed it to myself to set aside some time and go deep here.

The experiment: Two hours from zero to Semantic Layer

I chose the weather source dataset on the Snowflake marketplace precisely because it was both interesting and completely unfamiliar to me. I booted up Warp (dbt MCP server already configured - that might add additional time here) and got started.

In two hours, I went from raw data to a working dbt project 1 with:

Documented source definitions
Tested data models
A functional Semantic Layer with queryable metrics

It felt incredible. A bit unbelievable. Of course this was just a simple project and nothing in here would be particularly difficult for an experienced analytics engineer - but it would have taken a whole lot of time and effort.

Some observations from the process:

The experience was exhilarating. Watching an abstract goal decompose into concrete tasks, then seeing those tasks execute in real-time feels like witnessing something total new. It was also addicting - in this interface has a “just one more level” feeling of playing a great video game.

The cognitive load is different. It was cognitively demanding but not in the same way that coding is cognitively demanding - I have a sense that I’d be able to sustain longer blocks of “pairing” with Claude code before getting mentally depleted than normal coding.

The tools aren't optimized for data work yet.

It first attempted to build out a bunch of models that depended on each other, but didn’t check if the first model actually ran. Then there was an error part of the way through it’s dependency and we had to do a bunch of unthreading.
It’s competent at writing SQL (and dbt-style SQL). I don’t expect this to be the bottleneck for AI augmented development.
It is not very good at understanding what columns or models it has access to at a given time - I expect this to be an area where the models will be most useful when assisted by deterministic tooling.

What this proved (and didn't prove)

This experiment convinced me that agentic workflows have moved beyond “pure speculation” and into “definitely worth exploring and net useful for many teams today”. It feels pretty similar to the earlish days of coding assistants like copilot. Not yet for every team but definitely for some and on a steep acceleration curve.

This was just a simple experiment and I walked away thinking just as much about what I don’t know as what I learned.

I still don't know if my models are logically sound (validation would take as long as building)
Enterprise-scale datasets might break this approach entirely
The actual utility of what I built remains untested
And even with all of this, there are just as many organizational bottlenecks that the data team face as technical. What implications does this have there (if any).

But here's the thing: in two hours, I accomplished what would have taken me at least a full day manually - not just the modeling, but documentation, testing more. That is worth paying attention to.

Your move

When facing a question as vast as "How will AI reshape data work?", it's easy to get paralyzed. But the answer isn't in think pieces or Twitter debates - it's in running experiments.

My mental model shifted because I made contact with reality. Right now, data teams not using agentic workflows are doing just fine. But things are moving fast. It’s worth it, at the very least to get a sense of what the state of the world is here and think about how you might adapt to it.

So here's my challenge: Block two hours next week. Pick a dataset you don't know. Try to build something real with these tools. Report back and let me know.

The future of data work is being written right now, in thousands of small experiments by practitioners who refuse to wait for the dust to settle. If you're reading this, you have the expertise to contribute to our collective sensemaking.

What will you discover when you stop reading about AI and start building with it?

There’s a lot that I’d improve here for a production project - making this public to show a checkpoint for where I got in a timeboxed experiment