dbt Unit Testing - the inside story from first GitHub Discussion to Shipping It
Editors note - we talk a lot about bringing software engineering concepts into data using dbt. Today’s Roundup tells the inside story of making this happen - written by Michelle Ark, a Staff Software Engineer at dbt Labs and one of the key people behind dbt’s new Unit Testing Framework.
Finally! After many years of discussions, proprietary frameworks, open-source dbt packages, blog posts, and even more discussion — we have a natively integrated unit testing framework in dbt! 🎉
It’s our first (albeit highly informed) cut of the framework, and as the person who opened the original discussion in dbt-core way back in 2020 before I joined dbt Labs, I couldn’t be happier with the capabilities it’s launching with:
a succinct yaml spec that infers as much schema information as possible, so it’s possible to write precise, localized, and readable tests
the ability to override any jinja variable or function, which provides the flexibility to test incremental logic (or any other jinja-dependent logic), or fix a non-deterministic value to a reliable, static value
thorough data type support for all dbt Labs’ maintained adapters
multiple mock input format types: csv, dictionaries, or sql — whether inlined for readability or tucked away in reusable fixture files
tight integration with the dbt build command — unit tests are run before a model builds, preventing needless builds if a logical regression is introduced in development or CI
Each of those bullet points reflects weeks or months of effort by the team(s) at dbt Labs, and we even had a few external code contributions come in during our betas which was incredibly exciting. I’d also like to shout-out and thank all the maintainers of open-source unit testing package that led the way here for years and provided a ton of inspiration to the native framework’s design.
Rather than diving deep into the details of the framework and its inner-workings, I’d love to take a step back and share why I believe this problem has resonated with me, and many others across the community, for so long.
Software engineering — what’s the fun of it?
If I’m being honest: the fun in software engineering doesn’t come from sourcing product requirements, researching the problem space, or even from releasing to production. Sure — there is a lot of satisfaction in “shipping it” and watching a fix, feature or whole system roll out and into action. But to me, unless I’m the end user of the thing I just shipped, I can’t actually feel any of that.
The part that actually feels fun to do for hours on end begins right after you finish however many magical shell incantations were necessary to set up your environment, get your reproduction case and development data all lined up, and get to that first line of code that signals back to you — “we’re in! we’re actually doing this thing!”.
From that point on, it’s all about iteration. Write some code. Take it for a spin. Oops, that didn’t quite work. Make a tweak. Spin it again. That fixed it. What about this case? Add some handling. Spin. Repeat over and over with your favorite tunes and a nice big pot of tea until eventually… it all works!
In software engineering, this quick and confidence-inspiring iteration is possible, even in the most complex codebases, by isolating the required production inputs and mocking them out with static, locally-available data. The best way to codify and share this practice of test environments is by writing, version-controlling, and automatically running unit tests on every change.
Whether you are adopting test-driven development by writing tests before you write any code, or relying on an existing suite of tests, well-written tests quickly guide you towards a higher quality solution.
Even though I like to think I’m a careful and intentional programmer, it’s extremely rare that rolling up my sleeves and putting a few unit tests in place doesn’t catch at least one bug. In fact, if everything appears like it “just works” — I’m extremely skeptical. I’ve probably missed something and could afford to write a few more tests.
Why dbt unit tests?
As someone who has spent their entire career building development tools for data practitioners, this highly iterative and confidence-inspiring flow is something I’ve always observed to be missing from the data development workflow. Getting seemingly anything done, whether a small tweak or a large refactor, is prone to a very slow feedback loop. It’s not uncommon to see models take several hours to build due to processing large volumes of data. Reminds me a lot of:
If and when the build finally succeeds, the next task boils down to auditing the contents of the resulting dataset to determine its correctness. This is typically done by writing more (untested) logic that makes assertions of the data. You know the ones: Are there any null or non-unique values in this column? Do the values of this column never exceed a certain threshold? These types of data tests provide a certain level of sanity-testing and do have a place in the overall production deployment of a model to alert us when something has already gone wrong. But when it comes to testing the logical changes you are trying to iterate quickly on, they’re ultimately the wrong tool: crude, error-prone, costly, and slow (ergo: not fun).
What if we could reallocate some of that time spent waiting for builds and audits to finish into crafting a thorough suite of test cases, and spend the leftover time iterating on the robustness of our modelling logic, its documentation, and just generally sleeping better at night?
Unit testing in dbt provides the ability to do just that: the ability to check your modelling logic on a small set of static inputs instead of your full production data - a low-cost method to ship changes with confidence.
It’s not surprising that there were so many custom-built solutions out there on top of dbt, whether open-source or proprietary. Over the years we saw the energy build - a real turning point was at Coalesce in 2022, when Mariah Rogers clearly articulated the need for and benefits of standardization. By then, we’d heard the message loud and clear and the dbt Core engineering, product, and design teams got to work prototyping shortly after!
Design aspirations and challenges
We kept a few key design goals top-of-mind, drawn from Kent Beck’s Test Desiderata, which describes a set of ‘desirable properties’ for unit tests. We wanted the native dbt unit testing framework to enable writing tests that are:
Writable — This felt like the crux of the design challenge for unit testing in the context of analytics engineering, because the input datasets we need to mock are often extremely wide (many columns) which could translate to a lot of unnecessary boilerplate. How did we tackle this?
No need to explicitly specify all columns in the input or expected fixtures - only the ones that relevant for the test case. Same thing goes for inputs not relevant to a test case. This enables writing succinct and specific (another desirable property) unit tests.
No need to specify the data types of mocked values. This keeps unit tests succinct and portable (a.k.a structure-insensitive, another desirable property) across adapters, even if the underlying model SQL is not.
I’ll caveat here and say that we tried our hardest to standardize the way you’d specify a value of a data type across warehouses. But as you might imagine, these can get pretty unwieldy and wasn’t always possible.
Readable — Unit tests should read like stories: “Once upon a time, my model had these inputs, and produced these outputs”. How does dbt’s framework make this possible?
The writable, succinct spec that infers described above doesn’t hurt the readability cause either! Additionally, it is possible to add descriptions to unit tests in dbt.
In my opinion, the best way to accomplish this is to make use of inlined fixtures. However, file-based fixtures are also available, and have the benefit of reusability.
Readability is especially important in larger organizations, where there can be frequent updates and many collaborators to a single model, leading to many readers (and eventual debuggers) of a single test. But it’s still important for the single-person team if you, like me, forget why you bothered to write a piece of code or test from just last week.
Fast — There is inherent latency associated with making requests to a remote data warehouse, which the 1.8 release of the framework requires. How did we do?
A single unit test currently runs at interactive speeds for all dbt Labs’ maintained adapters (~1-5 seconds, depending on the warehouse).
Because of dbt’s built-in knowledge of model lineage, it is possible run the minimal selection of unit tests for a given code change using slim CI.
My unit testing hopes and dreams
As we’ve already mentioned here, there, and everywhere, the dbt unit testing framework is available in our latest 1.8 release, which apparently is “just wow”! We’ve already gotten some great feedback and are shipping fixes in patches every couple weeks.
Beyond the bread-and-butter use case of “I need to efficiently test my modelling logic”, here a few additional scenarios I can imagine unit testing sparking joy in particularly challenging analytics engineering workflows:
Start prototyping and even testing your modelling logic before all your upstream production data even lands.
Let’s say you’re waiting for an upstream development team to launch their product before production data starts flowing in for analytics work. Instead, you could create a column-schema-only model (or source) in dbt, and start referencing it in the model you’re developing.
From there, it’s possible to start iterating on your modelling logic with mocked data for the not-yet-available source data.
Apply performance refactors to a business-critical, but costly model with confidence.
Create stable and reliable interfaces for cross-team collaboration at scale.
This is especially important in a multi-project, or dbt Mesh setting. Unit tests are a great signal of quality and reliability for downstream consumers taking a dependency on your model, as models with unit tests are less likely to introduce logical regressions over time.
Use unit tests as developer documentation.
Imagine it’s your first week on the team as its newest analytics engineer. Your task is to extend an existing model to add a new column, but you’ve never even heard of this model before!
If the model has unit tests, you’d be able to ramp up and get a high-level understanding of the models’ inputs and how it processes them to an expected output by reading through a handful unit tests as a primer before diving into the more-complex SQL.
Not everything from software engineering can or should translate to analytics engineering. But if there’s one thing that I hope makes it over through dbt: it’s the value we place on an enjoyable analytics development lifecycle, because it’s my firm belief that happy developers write better code, have more meaningful careers, and generally bring more joy to this world <3
Join data practitioners and leaders in Las Vegas this October at Coalesce, the Analytics Engineering Conference built by data people, for data people. Register now at http://coalesce.getdbt.com/ for early bird tickets to save 50%. The sale ends June 17th, so don’t miss out.