It is time to take agentic workflows for data work seriously
Mission: 0 to semantic layer in two hours
Last week, I cleared two hours on my calendar to do a deep dive into the current state of agentic development for data work.
Specifically, I gave myself a challenge - could I go from a never-before-seen dataset to a production-ready Semantic Layer using a combination of tools:
An agentic coding CLI (I used Claude Code for this experiment)
The dbt MCP server
A terminal interface (in this case Warp)
Before we go any further, if this is at all interesting to you, I suggest that instead of reading my findings here that you sit down and try this yourself. I'm quite confident you'll find it both illuminating and worth your time.
We'll get to my findings in a bit. Long story short - it was successful enough that it shifted my thinking about the near-term trajectory of data work.
But first, let's talk about why experiments like this matter so much right now.
Sensemaking in the age of AI
You've probably been hearing some variant of these takes multiple times a day:
"An agent is just an LLM run in a loop"
"AI agents are coming to replace white collar work"
"I don't even know what an AI agent is, this is just marketing hype"
And about a billion more. All of these represent our collective attempts at sensemaking in this unique technological moment. But honestly, the noise can be so overwhelming that it's tempting to just tune it all out and wait for the dust to settle.
I don't think that's an option for data practitioners. Instead, we need to develop our own internal compass for sensemaking - and that means getting our hands dirty.
To do great data work is to be a great sensemaker. My theory of sensemaking requires holding two paradoxical skills in tension:
Build strong mental models about the world and use them to take decisive action
Constantly scan for misalignments between your models and reality, then adjust accordingly
Organizations and institutions need time to metabolize change and adjust their mental models. There's a physics to it. And that physics takes time.
But when the underlying reality is changing rapidly, the best thing you can do is go make direct contact with that reality. Don't wait for the consensus to form - go see for yourself.
Because things are not the same as they were even 6 months ago:
We've gotten the first wave of models optimized for agentic work (OpenAI’s O3, Claude 4 and Gemini 2.5)
We've started building real infrastructure to connect these models to our systems (MCP and other emerging protocols)
LLM-based coding has shifted from autocomplete to actual agents (something longtime Roundup readers saw coming)
That’s a bunch of big changes! It can sometimes feel like keeping up with everything here is a full time job. And with my last couple months being pretty tied up with other things I felt like I owed it to myself to set aside some time and go deep here.
The experiment: Two hours from zero to Semantic Layer
I chose the weather source dataset on the Snowflake marketplace precisely because it was both interesting and completely unfamiliar to me. I booted up Warp (dbt MCP server already configured - that might add additional time here) and got started.
In two hours, I went from raw data to a working dbt project1 with:
Documented source definitions
Tested data models
A functional Semantic Layer with queryable metrics
It felt incredible. A bit unbelievable. Of course this was just a simple project and nothing in here would be particularly difficult for an experienced analytics engineer - but it would have taken a whole lot of time and effort.
Some observations from the process:
The experience was exhilarating. Watching an abstract goal decompose into concrete tasks, then seeing those tasks execute in real-time feels like witnessing something total new. It was also addicting - in this interface has a “just one more level” feeling of playing a great video game.
The cognitive load is different. It was cognitively demanding but not in the same way that coding is cognitively demanding - I have a sense that I’d be able to sustain longer blocks of “pairing” with Claude code before getting mentally depleted than normal coding.
The tools aren't optimized for data work yet.
It first attempted to build out a bunch of models that depended on each other, but didn’t check if the first model actually ran. Then there was an error part of the way through it’s dependency and we had to do a bunch of unthreading.
It’s competent at writing SQL (and dbt-style SQL). I don’t expect this to be the bottleneck for AI augmented development.
It is not very good at understanding what columns or models it has access to at a given time - I expect this to be an area where the models will be most useful when assisted by deterministic tooling.
What this proved (and didn't prove)
This experiment convinced me that agentic workflows have moved beyond “pure speculation” and into “definitely worth exploring and net useful for many teams today”. It feels pretty similar to the earlish days of coding assistants like copilot. Not yet for every team but definitely for some and on a steep acceleration curve.
This was just a simple experiment and I walked away thinking just as much about what I don’t know as what I learned.
I still don't know if my models are logically sound (validation would take as long as building)
Enterprise-scale datasets might break this approach entirely
The actual utility of what I built remains untested
And even with all of this, there are just as many organizational bottlenecks that the data team face as technical. What implications does this have there (if any).
But here's the thing: in two hours, I accomplished what would have taken me at least a full day manually - not just the modeling, but documentation, testing more. That is worth paying attention to.
Your move
When facing a question as vast as "How will AI reshape data work?", it's easy to get paralyzed. But the answer isn't in think pieces or Twitter debates - it's in running experiments.
My mental model shifted because I made contact with reality. Right now, data teams not using agentic workflows are doing just fine. But things are moving fast. It’s worth it, at the very least to get a sense of what the state of the world is here and think about how you might adapt to it.
So here's my challenge: Block two hours next week. Pick a dataset you don't know. Try to build something real with these tools. Report back and let me know.
The future of data work is being written right now, in thousands of small experiments by practitioners who refuse to wait for the dust to settle. If you're reading this, you have the expertise to contribute to our collective sensemaking.
What will you discover when you stop reading about AI and start building with it?
There’s a lot that I’d improve here for a production project - making this public to show a checkpoint for where I got in a timeboxed experiment
I love this experiment. One thing that's important to remember - is that you experienced success within two hours in a very simple situation (as you stated). When you try to extrapolate this to a real customer environment, it can be 100x or even 1000x more complicated. It's not simply linear with the number of tables or models.
The reasons are all the usual issues: data is messy, businesses are complex, there isn't a single truth for how to calculate certain metrics, and so forth.
We've been working on it for over a year, a team of 17 people, and it's hard. But there's light at the end of the tunnel!
At Mammoth Growth we have build custom agents in Roo Code to build production code for our clients for the last few months. It’s real - but requires you shifting your effort to planning and verifying. We look forward to demoing at Coalesce.