Can we just be lazy already?
In this issue: balancing investigation and action. Also: in defense of laziness. Plus tips on refining your data craft and some new dbt extensions
👋 Happy “can you believe it’s already April?” weekend!
Hope you’ve been enjoying the guest posts from Winnie and Kshitij the past two weeks ✨ They add new dimensions to my thinking on a day to day basis, and I’m excited to see them back on the lineup again in the future!
This week is going to be easy on the thought leadership as y’all simmer and muse on your fresh ideas from DataCouncil, and because I’ve got a nice backlog of interesting writing and software for you 😉
Enjoy the issue!
-Anna
🌌🧠 (Galaxy brain) content
Just…
- by John Cutler
I can always count on John Cutler to make some new neural pathways in my brain. His latest post about the seductive appeal of reducing complexity, often prefixed by the word “just…”, is wonderful meta commentary on 2023 (so far).
Justing kicks in when the tension between the conceptual understanding of a problem's intricacies and the practical need to make decisions and take action reaches a breaking point.
The current macro-environment is a massive catalyst for the tension John describes above. When faced with pressure, we can end up making poorer decisions if we don’t set the dial correctly between exploring the complexity of the problem space and taking an action. Put differently, we’re more likely to fail both if we act too soon or delay action too late. And as the stakes get higher, this has a meaningful impact on how we work together as humans.
Here’s what I mean when I say that: everybody’s dial between research/investigation and action is set differently. Some folks spend more time exploring, others “just” really quickly. Not being aware of where the dial is set for yourself, or that it’s different for others around you, leads to folks talking past each other during problem solving, conflict, and stress on the team/your organization. The higher the stakes, the more folks retreat to their default dial positions.
Read John’s post for some good advice on how to shift more towards exploration when you need it, how to shift gears into action gracefully, and how to slow down with intention.
As you do that, don’t just think about your own predispositions, but also the humans you work with day to day and where their dials might be set. There’s some good patterns in John’s article that can be useful when working with someone whose dial is set to a very different default from yours!
Good data engineers are lazy
- by Benoit Pimpaud
I’ve been enjoying Stephen Bailey’s symposium on orchestration a whole lot. The latest post in the series particularly resonates: I went down the path of analytics engineering because I was lazy. Pulling that same ol data real quick just didn’t sound like a good use of my time.
Benoit is making the same point about orchestration — how much work do we actually want to do here? This is especially relevant as we leverage more event based and operational workflows, the complexity of what we manage grows, and our definitions of “freshness” change:
Playing with words here, but even if that myriad of tools didn’t replace Airflow, they still highlighted something: we need a new central control plane to deal with our event-based reality and its infinitely running data flows.
Yessssss 👏
Refining your craft
Raise your hand if you have ever used a hexbin plot or a KDE plot? 👀👀👀 I’m not sure that I have, but dang did you know that you can make a much more interpretable scatter plot with one of those options?
Another one from Avi Chawla low key blowing my mind: did you know there’s a z-index option you can use to order the layers of your matplotlib plots? 🌌🧠💥
Do you have a standard for ordering columns in your dbt style guide? I love this particular example because it considers the UX of the consumer of the table, not necessarily the person building the underlying SQL ;)
It’s really fun to see the dbt Community Forum become a place where folks iterate and come up with neat solutions to thorny problems. If you’ve ever needed to break down a complex DAG that you’re unfamiliar with, this technique to identify root nodes and list out dependencies can be helpful to break down complexity and help you get a handle on what you’re working with!
If you write a lot of jinja, you probably want to be able to debug the code you’re working on. Enter stage left: the debug macro + a great writeup by Benoit Perigaud
Neat new dbt integrations and libraries
If you do a lot of work in a command line interface, you will probably enjoy this library that allows you to do a fuzzy interactive search of your dbt models
fal.ai just announced a cloud based Python runtime that integrates with dbt’s new Python nodes 😍 ⚡
That’s all for this week! See you next time 👋
KDE and hexbin plots are great in two dimensions, but you have to choose a bandwidth/bin width parameter, which you'd be better off avoiding if a scatter plot suffices. Or you can layer the KDE contour lines on top of the scatter plot for the best of both worlds!