A picture worth 1000 lines of code
Coming to you with a one-day-delayed roundup for a very good reason - the dbt Labs team was at our company kickoff event in Cancun.
I’ll spare you _most_ of the team bonding pictures but you really do need to see Joel’s face as he had his mind absolutely blown while we were getting a walkthrough of how SDF manages Level 3 SQL comprehension.
What’s Level 3 SQL comprehension you ask? I’m so glad you did! I thought we’d use this roundup to quickly recap some of the recent posts we’ve put out about dbt + SDF and what it means for the industry - as well as a couple external posts that help draw these concepts out further.
SQL Comprehension - what it is and why it matters
SQL comprehension is a key concept for understanding the next era of dbt and data tooling. At its core - SQL comprehension refers to systems that are able to make sense of SQL code. You’re probably extremely familiar with one system that understands SQL code - we call it a database.
What you need to know is that SQL comprehension is beginning to be decoupled from the database and integrated into other tools and systems.
SQL comprehension isn’t just one thing though - it’s a series of technical unlocks that build on each other.
Having SQL comprehension decoupled from the data warehouse will allow us to unlock a series of key developer experiences - things like Column Level Lineage, validation of queries before they hit the data warehouse and IntelliSense.
For a full deep dive into these levels and why they matter - check out Joel’s recent post - The Three Levels of SQL Comprehension: What they are and why you need to know about them.
Why did it take this long?
Software developers have had code aware tooling for ages. What’s been keeping data practitioners in the cold here?
Well it turns out that you need to build a lot of tech to do this right. Data, as some of you may have determined over the course of your career, can get kind of complicated. Tristan refers to it having “a physicality” that must be managed - this is pretty different from pure-play code systems.
So to get dev tooling that could comprehend SQL at a deep level, we needed to get dev tooling with some more advanced components. Specifically - these components.
The deeper you dive in this chart, the more SQL comprehension capabilities you’ve unlocked - and the closer you’ve got to building a full on database.
It’s really worth your time to learn the pieces here. I’ll recommend two pieces that can be helpful.
The key technologies behind SQL Comprehension by Dave Connors - this is a companion post to Joel’s post above on the Levels of SQL comprehension - unpacking the technologies that underpin each level
Behind the Scenes of SQL: Understanding SQL Query Execution by SeattleDataGuy - from last September this is another great walkthrough covering a lot of the same topics from a different angle
Having the shared language to understand the tooling we’re building and working with is so helpful - it leads to everything from greater knowledge among data practitioners of how you do your job, makes it easier to build tooling and standards across these layers as more and more people get a sense of how the systems work and perhaps most importantly, uplevels the common understanding that we, as a profession have of the work we’re doing.
That’s one reason I was so thrilled to see Deepyaman Datta’s post Does Ibis understand SQL. Deepyaman concludes that:
Ibis doesn’t understand SQL per se, but it does understand what you’re trying to do. Ibis, much like SQL, defines a standardized interface for working with databases. Because Ibis understands queries expressed through this user interface, it also provides users with some of the unique capabilities SDF offers, including the ability to execute said logic on the backend of the user’s choice.
I’m looking forward to a lot more discussion about SQL comprehension over the coming months. But what I’m really looking forward to is you all getting your hands on it and starting to really feel what it’s like working on data tooling operating off of deep SQL comprehension.
End of regularly scheduled roundup, begin Ganz’s AI musings
Anthropic’s MCP appears to be gathering steam and has real potential to be a standard powering AI agents - check out this talk for a deep dive.
No one knows what to make of GPT 4.5. Did we hit the wall for pre-training? Is there a measurement crisis that stops us from being able to spot subtle but powerful improvements? Is the underlying intelligence flywheel that powers post-training and reasoning going to be amplified by the more powerful base model? Is this at or about the level of improvement you’d expect from one OOM of scaleup and we’re on track as ever?
This AI voice demo is freaky good.
OpenAI’s Deep Research is available across all plans now. Go try it on a topic you know well. If you don’t find it fascinating and just a bit unnerving, idk what to tell you.
Daario Amodei doesn’t think we’re hitting a wall - he think we’re on track for models smarter than nearly all humans at nearly all tasks by 2027 and that we are not adequately preparing for this eventuality.
Finally - I was incredibly validated to hear Daario say that renaming Sonnet 3.5 (new) to Sonnet 3.6 across their APIs and systems was “harder than training the models”. As someone currently engaged in determining how to name and communicate and update names for technical concepts (Hello dbt compile) it’s both comforting and I suppose a bit alarming that the Frontier Labs are having the same struggles.