Going to spare you all the think piece this week and get back to the roots of the Roundup - great articles written by data practitioners solving problems at the frontiers of data work. Buckle up - these are some important reads:
Reflecting on my tenure at the City of Boston by Jenna Jordan
Jenna Jordan has written a masterpiece of a reflection that accomplishes two things extremely well and is entering into my personal Canon of great analytics engineering posts:
A hands on, practical story of how to upskill in analytics engineering and become an effective data practitioner
A detailed lens into the ways dbt and analytics engineering can be impactful in the public sector
Jenna tells the story of her two years working in analytics at the City of Boston. She breaks it down into her year of learning where she got the foundational skillset in place, and her year of doing where she used those skills to ship major changes to the City’s data infrastructure.
Jenna made so many smart moves including building up a network of practitioners of city analytics professionals across the country.
As a member of the City of Boston Analytics Team, I didn’t just want to learn from my fellow Boston teammates - I also wanted to learn from data analytics practitioners in other city governments. Fortunately I was not alone, and other teammates also wanted to cultivate a network of city analytics professionals! We started by reaching out to former teammates who had moved on to work for other cities, who could then reach out to their new teammates. We also started with a couple of motivating topics - specific tools that we all used or initiatives we were all trying to work on (e.g. knack, open data portals).
There’s great information in here about the process element of kicking off a data project - it’s more than just the technical details, the process and people aspects are just as critical.
you could argue that the dbt migration project I kicked off in year 2 was just one very long process improvement project. I was an individual contributor, not a manager, but I was still able to propose and implement an improvement that would substantially change how the team worked with the data warehouse. I had essentially spent a year constructing an internal process map of how the data warehouse functioned, and I could identify many instances of how implementing dbt would reduce waste in that process.
And of course - she wrapped it all up by giving a banger of a talk at Coalesce 2023
Attending Coalesce was one of my favorite experiences of 2023. The community of data practitioners that gathers at Coalesce is truly amazing - the online friends I had not yet met in person (including my cospeakers!) and the new friends I was able to meet through the talks and social events, the impressive speakers sharing their work, the professional support networks (Data Angels!), even the dbt Labs folks who had been providing me assistance via slack… the people are the reason why I want to go back and attend Coalesce next year (and the year after that!).
On that note - Coalesce call for proposals closes this Tuesday - apply now.
As a native Bostonian, this post was a joy to read. And on the Boston note, it’s high time I share publicly my internal picture in the dbt Labs Employee Directory.
Scaling to Count Billions: How we built a scalable and reliable content usage counting service by Sangzhuoyang Yu
If there’s one thing I love as much as public sector implementation of dbt, it’s a detailed walkthrough of a data intensive operational use case running on top of dbt and the Canva team has delivered that in spades.
The problem: Canva needed a scalable solution to calculate creator payouts on their platform, with billions of events each month and volume that has doubled every 18 months.
The solution:
This post is a deep and engaging exploration of how to solve problems at the highest levels of scale and precision.
One theme I keep coming back to is Analytics Engineering Everywhere. As dbt continues to build out the data transformation standard, we’re going to see it in more places, performing more business critical workloads, more production systems.
Oops by Benn Stancil
A common complaint of data professionals is that we do all of this work to create metrics, dashboards, KPIs and then often decisions are made based off of “vibes”. This is absolutely pointing at a real thing - and the real thing is that organizations tend to have strong heuristics, or understandings about how they view the world.
We have two options:
Attempt to fundamentally alter how humans process information so that we all become Spock and make decisions purely based off of cold hard system 2 thinking1
We start to get good at understanding our organizational heuristics and do the work to update them based off the best information we have available to us
Benn argues here that we should do the latter. I’d tend to agree.
Or, to put it in a format more comprehensible for system 1 thinking:
dbt unit-test framework by Matthieu Bonneviot
Did you know that unit tests are coming to dbt? The team at Teads does, and they’ve generously shared their learnings with all of us. They get into some great details in this post, such as the functionality for defining upstream models while writing unit tests and how to use unit tests to handle variable overrides.
The conclusion:
This unit-test framework is definitively nicer to use than using dbt_utils.equality: faster to write and debug, more atomic tests, clear output. It has become the new standard at Teads.
The launch of unit tests is the most significant development for testing in dbt … potentially since tests were first introduced. There’s still a ton to learn about how and when to implement these into your projects and how we can make them more useful - please do share any thoughts and feedback you have as you are implementing.
Stay tuned for a future Roundup where a special guest is going to be going deep on dbt Unit tests.
This newsletter is sponsored by dbt Labs. Discover why more than 30,000 companies use dbt to accelerate their data development.