I did something wrong.
I try really hard and go into every conference with an open mind about what I’m going to learn. Tabula rasa. Blank Slate. Beginners Mind. This is actually a really important part of being able to continually grow and develop your analysis of the industry rather than getting stuck in familiar mental grooves.
But for this year’s Data Council, I have to admit I went in with a preconceived take on the newsletter I wanted to be sending out today.
“I’ve been to a whole lot of data conferences that talk about the intersection of data and generative AI”, I’d write triumphantly, “but this was the first one I’ve been to where data and AI felt truly integrated, where the worlds finally converged”.
And you know what? It was true. You couldn’t throw a stone in the convention hall without hitting a booth for AI-assisted data development or using your data in agent systems.
GenAI applications, after all, aren’t just running on models trained on massive datasets built and maintained with many of the tools and open source libraries created by the people and organizations at Data Council. Their usage and utility also depends on strong infrastructure, as Martin has told us.
We saw a lot of very cool data + AI infrastructure at Data Council!
Bauplan, fresh off their recent fundraise, walked us through the minimum viable data platform
The Snowflake booth showed how Cortex Agents can sit in your database and perform useful work
Lloyd Tabb gave a great walkthrough of Malloy and repeatedly emphasized the benefits of writing LLM-based analytics queries with a Semantic Layer as opposed to going straight to SQL
Jacob ran a session on vibe-coding your data engineering workflows
MCP was the talk of the town, with notable MCP servers being discussed by ClickHouse, Motherduck and yours truly.
And then of course we had Elias discussing SDF + dbt and walking through a new bit of data infrastructure that I believe is going to play a significant role in the story of how data + Gen AI fit together, the development of the new dbt engine - Rust-based, type-aware and ready to validate your SQL queries are dialect accurate and governed, whether they are written by a human or a machine.
So in a certain sense, I am walking away from this Data Council feeling like the worlds of generative AI and traditional data infra are closer together than ever.
But in another, deeper sense, I’m not.
A familiar kind of weird and a new kind of weird
Three years ago, in his reflections on Data Council, Drew had one request: “Keep Data Council Weird”. At the time, we were wondering if the ecosystem was becoming too vendor+VC driven and hoping that we’d still maintain our spunky outsider energy.
Well, I have to be honest with you, this Data Council felt pretty darn weird.
Partly, it felt weird in a familiar way. I asked Drew if this year felt weird and here’s what he told me:
The venue - a masonic temple - was gorgeous and unlike any conference venue I’ve been to before. My legs hurt from walking up and down 4 flights of carpeted stairs. I watched Elias’s talk from a parapet (is that even the word?)1 in a column adorned theater. I think I saw a crucifix. The bathrooms had couches in them. Scott B and I talked about our skincare routines. I saw a lot of old friends and former coworkers. I befriended [redacted]. My beef with [redacted] grew even deeper. I had a top 3 all-time cheeseburger and a bottom 3 all-time dessert (Mango Piggy). Pete and the whole Data Council team put on one hell of an event this year!
If you’ve been around the block enough times, this is a familiar kind of weirdness. Comforting.
It also felt weird in a different way though:
Because fundamentally, even though data infra + AI are moving ever closer together, there are big differences in how each side moves and progresses.
The reason boils down to this:
Data Infra is heavily engineered, based on building well-understood systems and standards.
It moves at the speed of ecosystems and standards. Three years ago at Data Council I’m sure there were people talking about Apache Iceberg and wondering whether it would become adopted across the industry. We’re big believers in Iceberg at dbt Labs and I expect to see strong and meaningful adoption of Iceberg over the next three years. I think an 80th percentile good outcome for Iceberg adoption looks like a world where organizations are not meaningfully constrained by their choice of data platform and are able to use Iceberg to avoid vendor lock-in and have true cross-platform control of how they operate on their data.
Generative AI is built differently, and it moves at a different speed.
The folks at Anthropic like to say that LLMs are grown, not built. Three years ago when Drew said that we should keep Data Council Weird, we were about 9 months out from the release of ChatGPT, and a year away from GPT-4.
Since then, the price of a query to GPT-4 has fallen by somewhere around 100x. OpenAI is projecting $125 billion in revenue by 2029. The latest paradigm shift, reasoning models, are around six months old.
I don’t know what an 80th percentile “good” (meaning fast) outcome looks like here, but there are people a lot closer to this than me that are saying we’re going to be deploying bio-engineered algae nanobots to fuel the data centers doing recursively self-improving AI by the time we hit three year’s from now’s Data Council.
That, to me, is pretty weird.
The weirdness of two worlds, closer than ever before but apparently moving at blindingly different speeds.
The weirdness of sitting in a talk and getting legitimately excited by the idea that we as an ecosystem can robustly adopt the nearly-decade-old Apache Arrow and then going into the hall to talk to someone who had just walked out of a talk on Bryan’s Foundation Models track and was wondering to what extent 2 year old LLM based coding workflows are going to change whether any of these questions are still relevant.
So what do we do with this?
Look, maybe one day soon, we’ll pinch ourselves, bolt awake and think “man that whole AI thing was crazy”. I’ll look back on this newsletter, cringe a bit about my prognostication and sheepishly admit that maybe I got carried away by drawing out lines on a curve. God knows it’s happened before.
But … maybe not. And in that world, what relevance does the data infra have?
I think it means that all of this matters a lot - even more so in this world. It means that pretty soon, the data systems and data infrastructure we build are going to be powering a whole lot of systems that interface more directly with the world than we are used to.
Because my prewritten take about data systems and AI workflows becoming increasingly intertwined and dependent on each other was right. And now we need to figure out how to make engineered data infrastructure that move at human scale support LLMs that look like they are moving much faster and are still fundamentally mysterious to us.
The real world and the data we represent it with has a lot of complexity. And if we’re about to have AI systems that are 100x cheaper and 100x more powerful than what we have today operating on the tools, systems and standards we build, then they’d better be really good.
I don’t have an exact answer to how we should approach this. I don’t think anyone does.
I do know that I’m looking forward to next year’s Data Council and the one after that and the one after that too. I’m hoping that alongside the new weirdness, we keep the familiar weirdness and that we all continue to share our knowledge, our expertise and perhaps most importantly our mango piggies.
Appendix
As I was writing this, the ever thoughtful Benn Stancil released a post touching heavily on MCP and the dbt MCP.
As with basically everything Benn writes - it’s worth your time. The post probably deserves a full response, so I’ll save commentary for another day, but I recommend you check it out.
The analytics engineering roundup is sponsored by dbt Labs.
If you want to see what the big kerfuffle about dbt + SDF is all about, plus a whole lot more, join Elias and the dbt team for our Cloud Launch Showcase on 5/28 (parapet not included).
Editors note: That is not the word
Great summary of data council! One of my favorite moments was the OpenAI talk when someone asked if old LLM models would be cherished / traded (like old collectible cars). Funny to think about - but probably going to be a thing.
34