I built a (very small) agent swarm.
What does the term 'agent swarm' mean? Is it bullshit? Can I build one?
I recently got nerd sniped while recording a podcast episode. You’ll hear the episode in the coming weeks, but it’s with Jordan Tigani of Motherduck. Toward the end, I quoted a line from one of his recent posts back to him and asked him what he meant by it. Here’s the line:
When it all shakes out, my bet is that there ends up being one form factor that people settle on. It will consist of an agent swarm for data management backed by a query engine for doing the actual analytics. Agents can handle change and adapt the system in real time. They can prepare insights directly for users.
The question I asked him: “what do you mean by ‘an agent swarm for data management’?” Because any time I see the term “agent swarm” I immediately find myself being skeptical. I mean of course it sounds so fun! Who wouldn’t want an agent swarm doing their data management? But … what did Jordan actually mean, and how much substance is there to this?
First, I’ll tell you my paraphrased version of Jordan’s answer. Then I’ll tell you about the (micro) agent swarm that I built and what I learned.
Jordan’s Answer
I’m probably going to get this a bit wrong but hopefully in the direction of charity. My understanding of Jordan’s answer is essentially this.
Data management is just a series of a whole variety of small tasks.
Profile a column
Document an object
Debug a pipeline
Etc. You could probably list 20+ things like this without having to think very hard. What we all do as data practitioners is we combine our context and skills and loop through a bunch of tasks that look … basically like this.
The idea of data-management-as-agent-swarm is simple: create a set of agents that have the right skills and context to do each one of these tasks, and then turn them loose on your data stack. Rather than being reactive and fixing problems as they are identified, these agents should be always observing the things that they are responsible for, proactively detecting problems, and then proactively fixing them. While this likely starts with humans in the loop, but over time it evolves to being autonomous.
There are really two immediate, gut reactions that most of you will have to this idea. Many of you will think “that’s reductive, there’s more complexity happening in my day-to-day job than this.” But many others will think “yeah that feels obvious, of course that is the way this will go. GLHF to anyone who thinks they’re going to be paid to write column documentation in 2027.”
The difference between those two reactions says more about your default internal attitude toward AI and less about the usefulness of data management agent swarms. I will confess that I am, constitutionally, more on the ‘AI optimist’ side—I find it both compelling and likely that we will live in such a world in the next 12-24 months. Or at least some of us will.
But I am also self-aware enough at this point to not get overly attached to a fleeting obsession. I have been proven wrong by the universe too many times.
So, rather than employing soaring rhetoric to attempt to get you excited about agent swarms, I will resort to pragmatism: let’s experiment!
Nerd-sniped into building a proof of concept agent swarm
After this conversation with Jordan, I became somewhat obsessed with this idea. It wasn’t brand new—certainly others have hypothesized the same thing. But this was the time that it got lodged in my brain. I decided that I wanted to see just how hard it would be to at least get started constructing something like this and whether or not it could provide real utility.
Fortunately, I already have the basic infrastructure set up. I have become at least modestly adept with my agentic scaffolding and workflow, so I could get out of the gate quickly. And I didn’t have infinite time for this experiment; I time-boxed it at ~8 hours of calendar time (for now).
But what to tackle first? According the definition I proposed earlier, there are a whole heterogeneous set of tasks that make up ‘data management’. And you gotta start somewhere.
The place that seemed to make the most sense to me was straight up data observability.
What I learned
The first thing I learned was that most of the agents in this type of multi-agent system, regardless of what they are responsible for, are going to need a fairly consistent set of context:
standard profiling information (i.e. descriptive statistics for columns)
dbt metadata
query history
(There are probably a few more things you could imagine adding to this list, like git commit history and production run logs, but I didn’t need those yet.)
Of my 8 hours allotted to this task overall, I spent a big chunk of it simply building up this context. I used Codex for this, with GPT-5.5 default fast. Pointed it at our internal analytics dbt project. Focused it on two tables to start, a fact table with one row per run ever executed in the dbt orchestrator, and a dimension table with one row per paying customer.
It was fairly straightforward to build the actual profiler, and the quality was high. Codex had no problem building some scaffolding to use dbt show to execute queries (using dbt as a connection broker like this eliminates the need for multiple connection pathways to the data warehouse). GPT-5.5 made good decisions about standard profiling metrics that should be collected. It spun up a bunch of workers to execute all of these queries, it wrote a markdown file with all of the results (both table- and column-level context). I then enriched this with dbt metadata and query history data. All fairly straightforward, even if it did take a bit to get it all wired up.
Then, I’m reading through the markdown files produced as an output, and I realize that the very first use case for all of this context gathering could actually be: filling in documentation! dbt’s “description” fields have become the standard way of storing narrative descriptions of tables and columns, and having gathered a bunch of information about these objects it felt obvious that the first step should be distilling all of that (voluminous, often-dry) context into something useful.
So the very first “useful” thing I built was a description agent. Find descriptions that were blank (or low quality) and update them using all of this assembled context. The most compelling thing about this use case is that, in improving descriptions, we are improving our ability to build subsequent agents, as they will all use these descriptions as their own context.
The first cut of this was … fine but not great. Straightforward objects got documented well, but that’s not exactly adding a ton of value. So I honed in on a specific column that I personally didn’t have any knowledge of and said “if the agent can successfully describe this column, I’ll be happy.” The name of the column was is_duplicate_customer and it’s boolean. The first generated description was exactly as you would expect: something like “is this row duplicative of another row in this table.”
Of course, that’s useless. Why would there be duplicate records / why would we keep both? What should I do with this record—should true values be filtered out? Tell me something useful!
So I did a few iterations to see what I could do better. First, I made sure that the context markdown file included the associated dbt code for each column. Second, I gave the documentation agent the ability to get curious, ask questions, generate SQL to answer them, and execute that SQL against the warehouse. Here is the response that Codex gave me after this iteration:
In research mode, the agent now proposes focused follow-up questions, writes safe read-only SQL, runs it through dbt show, stores the results, then uses that evidence in the final descriptions.
The IS_DUPLICATE_CUSTOMER result is now much better:
out/documentation/docs_20260522T110012_fc958985/suggestions.md:189
It now explains that duplicates come from concurrent Stripe and Metronome accounts sharing the same customer_id, and that the model filters to keep the non-Metronome record.
I won’t bore you with more details than that because they’re very idiosyncratic to our payments business logic, but the point is that the full text is exactly the type of response I would have hoped for. It’s not simply expanding the column name into a longer string of text, it is gathering a bunch of information, thinking, asking followup questions, and synthesizing the results into concise prose that can then be stored and used later for subsequent agentic processes.
In the grand scheme of things, this might feel small. I mean, how much time and energy do you really spend on documentation day-to-day?
But I think this is kind of the point. The idea of the data management agent swarm is not to solve some grand challenge, some truly hard thing. It is that all of our jobs are composed of this whole litany of small things, none of which are that hard. And if we decompose them, teach agents to do each one, and then somehow coordinate these agents on how to work together, well… that could be really useful.
The reason that not every object has a high-quality description today isn’t that we’re incapable of writing them, it’s that there is a tremendous amount of contention for the time of humans with the knowledge and skills to write them. If you, task by task, eliminate this constraint, I think we will start to see the data systems that we have been building over the past 10+ years skyrocket in quality and utility—which is exactly what we need if they are going to be the foundation for agentic systems.
More optimistic than when I started
In 8 hours I was able to build something actually useful. Certainly not ready for production, but at the same time with a few key insights and patterns that provided real value. I fully plan on deploying this internally, and pull on the thread further by building out more tasks.
After this experience, I’m more bullish than ever that this will be a reality we all live in in the not-too-distant future. I honestly don’t think there are any truly hard computer science problems to solve here, just a combination of existing technology, experimentation, and domain expertise. Very doable.
I’m excited to see more working code and empirical results. If you’re doing work in this space, ping me.
- Tristan
