I’ve been spending a lot of time recently talking to people running very complex dbt projects and data infrastructures. These are companies who are at the intersection of some of the thorniest data problems in the world, requiring not just advanced analytics systems but also the sociotechnical patterns to enable them.
At the same time, we’ve been having discussions internally about what does it mean to represent “best practices” in dbt. dbt always is, has been and will be an opinionated product that is built to help organizations solve hard problems using data. Historically, we’ve also paired that with recommended best practices for how to build and shape your dbt project.
These best practices emerged naturally, from the hands on work of dbt Labs employees implementing dbt, and from the massive distributed intelligence of the meme-making hivemind known as the dbt Community.
There was a point in time when our recommendations were quite prescriptive. The exact commands to run to set up your database, the single style guide you need to know to organize your project.
I love these recommendations, they were deeply transformative to me when I first encountered them and they remain a key part of the analytics engineering story. There is a real joy to be had in determining the folder structures, the naming guidelines and the design patterns that shape the ways we build our data systems.
But increasingly, when I find myself trying to write up the “best” way to use dbt, I find myself giving that answer, much beloved of senior engineers across the land.
It depends.
Sometimes people think of “it depends” as hedging, but to me it begets a wonderful sense of possibility. It depends on what your goals are. It depends on how you intend to solve the problem. It depends on knowing the problem you are trying to solve.
It depends isn’t the end of a conversation - it’s the beginning.
Let’s talk about fashion
Now, dear reader, I’m going to make an ask of you that I believe has never been made in the history of this newsletter. I’m going to ask you to watch a TikTok.1 It’s two minutes long and I promise you will learn something useful about data work by watching it.
This video is asking about style. About what is the “best” way to do things.
The first thing you’ll notice on watching the video is, not to put too fine a point on it, all of these jackets are extremely cool.
The second is that even though they are all extremely cool, not all of them might work for any situation.
To make them work, you need to understand the context that they arose from - why someone might want to wear them and how to fit them in to not just the rest of your wardrobe, but the cultural context in which you’d wear them.
The video is powerful because it so cleanly outlines three incredibly different, but equally compelling answers to what is the best jacket.
The classic heritage piece rooted in tradition
The playful, whimsical jacket pushing the boundaries
The technical synthetic marvel
Each of these is awesome in its own way - you can imagine why someone might be interested in building an outfit around any of them.
That, increasingly, is how I feel when talking about what is the best dbt project.
There is the classic heritage dbt project, built to power the analytics of a mid-sized company off a single data team.
There are dbt projects underlying operational use cases, pushing forward the discipline and blending the line between analytical and production workloads.
There are dbt projects that are interesting, multi-domain meshes that span teams working across the globe.
Which of these is the right way to use dbt?
All of them.
The implementation specifics of each of these differ. When asked, what is the best way to set these up, you would very reasonably expect to hear “it depends”.
But
There can be a temptation to hear “it depends” and think that this means that all answers are equally good - the truth is far more interesting.
It depends because it matters a lot
It depends because you need to really care about getting it right and try very hard to actually understand the problem in order to solve it the right way. It depends because not every implementation is going to fit every use case.
And critically “it depends” does not mean “everything depends”, there are contingent factors that change and then there are underlying truths that don’t change even across wildly divergent scenarios. It depends means - let’s pull out a whiteboard and get to the bottom of this.
These are the truths at the deep heart of the craft of analytics. They involve knowing the development cycle of analytics workloads. They involve knowing the personas involved in finding and solving analytical problems.
At the end of the day it’s not that finding best practices across the discipline of analytics has become any less important, less interesting or less possible.
What’s happened is that we as a discipline have found ourselves at a place and time where we must once again move up the stack. It’s time to find the patterns behind the patterns and the recommendations behind the recommendations, draw them out and formalize them.
This work then enables the dbt Community to have not one set of best practices, not one blessed guideline, but many. Not one type of person that contributes to the knowledge loop, but many.
This can only be done with the tooling, the technology and the frameworks to enable it. This is something that we’ve been pushing towards as a community for a while - I expect the upcoming release of Tristan’s Analytics Development Lifecycle Whitepaper to be a moment of particular significance here.
I want to close with a line from the last Roundup that’s been echoing in my head.
People are smart and motivated. Underestimate them at your peril.
Thanks for being smart and motivated. We promise not to underestimate you as we work together to create the next set of design traditions - inspired by the past, influenced by context, and with an eye towards where we’re all trying to go.
P.S.
We’ve got a big party coming up really soon for all of the people solving the hardest and most interesting problems in analytics engineering - the ones at the forefront of building the next evolution of our shared design language. It’s called Coalesce and I hope you’ll join me there.
Shoutout to Twitter celeb The Menswear guy for bringing this to my attention