I posted this on the dbt blog earlier today. It’s very big news for the entire dbt Community, and so wanted to make sure you saw it!
- Tristan
I am not generally an excitable person. I do not dance. I try to avoid hyperbole.
And yet. And yet! It is very, very hard for me to avoid literally jumping up and down as I share this news with you.
The TL;DR: today, I have the pleasure of announcing that dbt Labs has acquired SDF Labs. The two teams are already working side-by-side to bring SDF’s SQL comprehension technology into the hands of dbt users everywhere. SDF will be a massive upgrade to the very heart of the dbt user experience moving forward. It will enable faster dbt project compilation (~2 orders of magnitude), amazing developer experience (think: type-ahead in your IDE of choice), the highest-fidelity lineage on the market, and much more.
Let me take a sec to share the story of how we got here. Because I think it’s an interesting one.
A standardized way to author SQL data pipelines
From the very beginning, we wanted writing dbt pipelines to feel as simple as writing SQL. Just write a select statement, in your dialect of choice, and dbt would take care of all of the fiddly bits. This desire arose directly from our focus on empowering more humans—not just highly technical data engineers—to author production-grade data pipelines. So we started with models that were just SQL.
Then it was obvious that we needed some dynamic ability in our pipelines. Declare variables, create dynamic relationships between nodes, etc. So we added Jinja. Initially for just a few functions: ref()
being the first and most important, but later config()
, var()
, and others.
These functions were a gateway drug, though, and we quickly became convinced of how important custom macros would be. Users could define their own functions, and as long as they output syntactically valid SQL, they could do whatever they wanted! And thus, dbt utils and the entire dbt package hub was born. Community-wide code reusability for the win!
Through the ensuing years, dbt became an ever-more-complete SQL authoring framework. One of the biggest steps we ever made was to push all materializations into user-space. This allowed advanced users to take complete control of the SQL authored by dbt. From that point forwards, if a data transformation was expressible within a given data platform, dbt users could author it.
Throughout this entire journey, however, dbt could not actually understand the SQL its users were writing! This remains true today. dbt is an incredibly powerful framework for users to author data pipelines, but it fundamentally treats the SQL that you author as text and leaves the evaluation of that text to the database. This gives the user ultimate control—the framework doesn’t get in the way—but it also asks the user to do more work.
For a long time, this felt entirely normal and rational. We had, for a long time, thought of dbt as “Ruby on Rails for SQL”.
Borrowing from software engineering (again)
For those of you who didn’t do web development in the mid-2000’s, Ruby on Rails (RoR) is a web development framework that made it incredibly straightforward to develop web applications. It combined the power of two (!) declarative languages (HTML and CSS) with a templating system (.erb, or ‘embedded ruby’) to enable developers to do things on the web that they could never do with HTML alone. RoR did not, itself, understand the HTML its developers authored; that task was left up to the browser.
The analogy is pretty direct, and it worked for a long time. Everything you can do with dbt today—the entire control plane, from transformation to testing to catalog to orchestration—is powered by this paradigm. dbt helps you author data pipelines using SQL; Ruby on Rails helps you author web applications using HTML and CSS.
But RoR is no longer the dominant framework for building web applications today. Over the last decade, React has taken that mantle. And React does understand the HTML/CSS/Javascript its users write—and it does that in a very clever way. Rather than writing to a given browser target, React developers code against an intermediate abstraction and the framework takes on the responsibility of ensuring browser-level compatibility. Gone are the days where web developers have to struggle with browser compatibility.
The level up in capability for web applications from the days pre-React and post-React is pretty dramatic. (Think of the web interfaces you used in 2014 and compare them with what you’re using today!) These types of epochal changes in developer tooling can make massive differences in the products that can be built on top.
Enter: SDF
We’ve known for years that this type of step function change was coming for the world of SQL. And different teams have taken some decent swings at it along the way. Unfortunately, every proposed solution had tradeoffs that we considered unacceptable. Limiting the dynamism of Jinja. Unpleasant syntax. Forcing developers to use some intermediate proprietary language. Etc. In our opinion, the right solution just hadn’t emerged yet.
That is…until a little company called SDF Labs came out of stealth in the summer of 2024.
SDF’s story is fascinating. Founded by a father/son duo (Lukas and Wolfram Schulte, CEO and CTO respectively), and with a core team of database researchers from Microsoft Research, Meta, and others, they are among the most qualified humans on the planet to think about the problem of highly reliable SQL comprehension at scale.
Wolfram, in fact, was hired at Meta to build the system that tracked PII throughout all data pipelines at the company across over a million tables. Talk about being battle-tested—I can’t imagine there are many (any?) more demanding use cases for this technology anywhere.
A few years later, the two decided to take the lessons learned from this work and build a company around it. They recruited a team of the best talent in the world and got to work building in stealth for two years, emerging in June of 2024 with a fully-functioning and dbt-integrated product already in production use by customers.
Fatefully, Benn Stancil introduced me to Lukas on the day before their public launch. It was clear to me after a single conversation that this was the future of dbt.
How does SDF work?
So how does SDF actually…work?
SDF is a high performance toolchain for SQL development packaged into one CLI; a multi-dialect SQL compiler, type system, transformation framework, linter, and language server. It is written in Rust, highly parallelized, and designed for scale.
The toolchain is powered by a state-of-the-art development in SQL understanding. SDF represents each SQL dialect (Snowflake, Redshift, BigQuery, etc.) as a complete ANTLR grammar with definitions for all datatypes, coercion rules, functions, scoping intricacies and more. Unlike dbt historically (which has treated SQL as strings), SDF sees objects and types and syntax and semantics. In the same way that virtual machines (VMs) emulate physical hardware, SDF emulates the SQL compilers native to the data platforms you use.
The result is magical: at every point in time the entirety of the data warehouse is fully defined and statically analyzed as code. A complete understanding of SQL allows the SDF engine to faithfully emulate cloud data warehouses in their behavior and provide that feedback before execution and catch breaking changes as part of development rather than after deployment.
Best of all, integration is easy. SDF has adopted dbt’s syntax, configuration, libraries, and Jinja natively, as part of the SDF runtime. As a result, for most dbt projects there will be no code changes required to take full advantage of SDF’s capabilities!
The power of a new paradigm
Ok cool, all of this has been interesting. But I'm sure you're wondering...as a user, what does it actually get me? The answer is: quite a lot.
Let’s start with the first, most basic benefit. SDF parses and compiles dbt projects really, really fast. Because it’s built in Rust, it simply runs faster than Python. As a result, SDF compiles the same dbt project multiple orders of magnitude faster than dbt Core. If you’re working in a large dbt project, this will meaningfully impact your productivity.
Next is developer experience. There are many things that will eventually go into this bucket, but here are two great examples. First, SDF’s ability to understand SQL means that it can power IntelliSense in your IDE of choice. With every keystroke, SDF understands what you are typing and can automatically suggest what comes next, including suggesting table and column names. Second, because SDF understands your SQL, it can detect errors without connecting to the remote database. Troubleshooting all of a sudden becomes far faster, as errors get caught as you are typing, not when you do a dbt run
.
Third is lineage. SDF has both the highest-fidelity and most high-performing SQL parsing on the market. And lineage and metadata is, of course, at the heart of the entire data control plane. Understanding how tables and columns flow throughout your entire data estate is what SDF's technology was originally built to do, and it has been proven out in the most complex data environments on the planet.
Finally is local execution. It is common for the workflows of software engineers to run development environments on a local machine, then for higher environments to be cloud-based. The local development environment gives software engineers speed and control that are important in the very tight, iterative development cycle. But that’s not how it’s worked in data in the past. Most modern data platforms cannot be ‘run locally’, but that’s one of the superpowers of building a logical plan from the SQL query: you can take that logical plan and execute it in a local environment. And that’s exactly what SDF does in development, making the developer experience that much more responsive and delightful.
The benefits above are only the start; the ability to deeply understand the SQL authored inside of dbt pipelines will fundamentally transform the experience of every dbt user.
Gimme the details! What do I get?
All of the benefits laid out above are being realized in production today by SDF’s existing customers. But it will take some work to get all of these capabilities integrated into dbt, and this won’t happen overnight.
Our first goal is to get SDF’s SQL parsing capabilities integrated into dbt. These capabilities will enable meaningful improvements to the dbt developer experience, and we want everyone to have access to them. While SDF won't be included as part of the Apache 2.0 code base, we plan to make meaningful parts of SDF’s capabilities available to all dbt users—whether you’re using dbt Core or dbt Cloud.
As we work through integration details we’ll share more about how this will work. But if you use dbt today, you’ll be able to use this new tech.
In a few weeks, we're hosting a webinar between me and SDF's Co-founder and CEO to share more and answer any of your burning questions. Be sure to register. In the meantime, you can learn more about this acquisition and what it means for the bright road ahead for dbt by reading our press release, the follow-up blog post written by dbt Labs Chief Customer Officer Ryan Segar, and SDF's acquisition announcement blog post.
Beyond that, all I can say for now is that the technology that SDF has built is foundational to the entire data control plane, and you should anticipate seeing it show up in more and more dbt experiences over the coming 12 months. We’ll share more in public as soon as we can.
In returning, for a moment, to my statement from the very beginning of this post: I am just so very excited about this development and what it will mean for dbt users everywhere. This is the type of step function change that doesn’t come along in an industry very often, and it is an absolute privilege to be able to share this with the entire dbt Community.