Discover more from The Analytics Engineering Roundup
Data expertise everywhere.
Why—eventually—the analytics organization will melt into the background.
New podcast episode! David Jayatillake spends a lot of time thinking about career progression + data team structure, and in this conversation we dive into the classic individual contributor vs manager conundrum, migrating between warehouses, and reactive vs proactive data workflows. My favorite part was about David’s heroic pre-Black Friday migration to Snowflake—never let a good crisis go to waste!
Get it here. And enjoy the issue!
On Coming Together
Sometimes dbt Community members1 freaking blow my mind. Read this whole thread for the wildest, most wondrous invitation to a conference I have ever witnessed.
If you haven’t yet registered for Coalesce 2021, do it now. ~8,000 folks have already registered, and I guarantee the event will be unlike any other virtual conference you’ve ever attended (unless, that is, you attended last year’s). Come to spend time with and learn from your peers—relationships that start at Coalesce live on in dbt Slack DMs for years.
We have spent the entire past year curating a fantastic list of speakers and working with them in depth on their talks. I seriously cannot wait.
On Data Leadership
Let’s start out, as usual, with a Benn post. The topic of analytics leadership—where it comes from, how to create it, where should it report into, what it should actually deliver to the org—has long been interesting to me. I got to actually live the role of analytics leader to ~30 distinct companies personally over the course of three years as a consultant from 2016 to 2019, and got to interact, in that process, with many folks who were doing the same work in-house. I noticed just how frequently companies were analytically underperforming, even with sophisticated CXOs, talented data practitioners, and good data tooling / hygiene.
That should be all of the things, right? Executive sponsorship, technology, human capital…shouldn’t that be all of the pieces of the puzzle? There’s at least one missing piece from that equation, although that piece often feels invisible.
Imagine it’s England in 1660. King Charles II wants to create the British Royal Society to advance the discovery of new knowledge for the betterment of the realm. Charles rounds up the smartest folks he can find—people like Newton, Boyle, Hooke, etc.—and gives them a pile of silver to fund their operations. There’s plenty of tea, snuff, tools, and reagents for everyone.
But in this alternate version of reality, no one had invented the scientific method. No one understood how to design experiments. No one understood how to create an organization that was tolerant of, and even celebrated, the inevitable failure that arises in the process of attempting to discover new things. In fact, Charles declared that if there were not at least one major discovery a year, heads would be separated from bodies.
Ok—a strange thought experiment, I’ll grant you. But sometimes to see our own situations clearly we must cast them in a brand new light.
My point is that raw ingredients are not enough. People do work. And people must work together in any endeavor of sufficient scope. People working together require a set of shared principles / knowledge / behaviors to enable this collaboration. Call it culture, call it society, call it best practices. Whatever.
I think that most companies today are roughly as good at operationalizing data—at building organizations with a reality-based epistemology—as that fictitious scenario above would indicate. An executive allocates some budget, hires some people, buys some software, and voila! Right?
Benn’s post is poking at this—his point is, essentially, we do not really know how to create organizations that output great analytical work, and this starts with the fact that our model for analytics leadership is wrong. He has some specific advice for how organizations could change their org charts to address this. There are totally good and valid ideas in there, and I don’t honestly have such specific thoughts at the tactical level. It’s a great post.
But there is an assumption that runs through the entire thread of the post, and through the structure of our analytics organizations today, that I want to question. It’s not a useful thought in the short term, but maybe could point to a long-term future that looks fundamentally different. That assumption is this: most professionals, at all levels of the business, are incapable of generating their own insights. And they need our help.
For most organizations, this is largely true today. I’ve written at length about how opaque the modern data stack is to non-analysts, and how it actually disempowers them. Today, organizations clearly do need teams of experts to help them understand the world around them.
But join me on one last trip through time. Imagine Mad-Men-era western businesses. Men (always men) in suits, dictating memos to dictaphones, later typed up by female secretaries.
This was not that long ago—50-60 years. That generation of executives never learned to type. It wasn’t something that they imagined they were supposed to know how to do. But I don’t know a single successful executive today who doesn’t type, and moreover, often it is the executives who are the most aggressive about hotkeys on their email client of choice and about inbox zero.
Most people overestimate what they can do in a day, and underestimate what they can do in a month. We overestimate what we can do in a year, and underestimate what we can accomplish in a decade.2
Things change. Whereas typing used to be a skill taught to a select few specialists, it’s now a requirement for all knowledge workers. It was the switch from hardware-based typewriters—hard-to-use, unforgiving3—to word processing software and the mouse that ushered in this change.
I believe that we are in an era where similarly large changes are afoot. We will look back on the centralized analytics team in much the same way that we today look at typing pools. Every knowledge worker will make decisions informed by large-scale and diverse observations of reality. Partially, this will be the result of advances in tooling that make complicated things simple. But partially this will be because new generations of professionals will see this skillset as foundational to their professional success.
This future, though is already here — it's just not very evenly distributed. One of the reasons I am so tremendously confident of this long-term trajectory is that I get to live in that world today. At dbt Labs, of our eight-member executive team, six of us are world-class analytics engineers. Our internal analytics team of three is primarily dedicated to creating and maintaining shared infrastructure and models used by the rest of the org. Roughly half of our 170 humans are very experienced data professionals. Even our sales org, sometimes the least technical team in a software business, has many folks with impressive analytical chops. One of our sales directors built a customer lookup data app in Python earlier this year and it’s become widely used ever since. (I can’t tell you how happy this made me when it first happened!)
This level of analytical sophistication throughout a company creates a very unusual environment.
Conversations are low-ego (truly).
We have very few conversations about data, it just forms a backdrop for almost every conversation in the business.
We do sometimes struggle to form strong opinions about things for which it is not yet possible to have data on—setting out on brand new strategic directions always feels scary—but we recognize that about ourselves and do our best to counteract it.
We typically expect analytical talent to live inside of operational areas of the business. Data isn’t who you are, it’s a competency you bring to the table.
We often hire generalists with analytical chops and teach them functional skills (marketing, sales, etc.). This has worked very well.
Every executive is expected to be able to personally traverse strategic issues using data. They may not build every dbt model but they must fundamentally drive the insight-generation process.
The overall awareness of how the business is performing on its most important metrics is quite high. Within a given team, the awareness of performance on team-specific metrics is extremely high.
Certainly, there are a lot of things unique to our history and industry that make us an anomaly on this dimension—my point is not look how cool we are. Rather, my point is: this will become commonplace. All of our organizations will look like this. People will evolve, tools will evolve.
I think sometimes we forget the long view, and I think it’s important to keep it in mind. The future we’re all trying to create together is more transformative than is sometimes comfortable to think about.
From elsewhere on the internet…
The whole thread is fantastic!
❄️ Snowflake’s Snowpark now supports Python! It’s in private preview—if you have access I’d love your thoughts. I can’t even see docs yet :( I’m cautiously optimistic…this could go a long way towards simplifying and unifying analytics engineering tech stacks.
💡 Erika Pullum maintains a list of dbt tips! Really really good stuff…
🧑🏫 Sarah Krasnik on why you shouldn’t try to build reverse ETL yourself:
As someone who has built reverse ETL jobs before tooling for this type of work was largely available, I have one important takeaway: When bespoke reverse ETL implementation is done, the ongoing maintenance time will snowball as APIs change, new fields are added, or data grows. In short, while a DIY solution may seem like a great way to move data from A to B, you’ll save some time, sanity, and engineering effort by adopting a pre-built reverse ETL tool to automate an otherwise grueling task.
File this under “blog posts to pull out when someone asks you to build it yourself”.
💵 Lots of fundraising going on:
🏁 The benchmarking wars have escalated! Since the first post, Snowflake ran its own benchmark and then Databricks responded again. The final Databricks post feels like it contains a real smoking gun, and if you’re into this sort of drama (which, sadly, I am), I highly recommend you read it. It’s an unusually spicy take from a generally-staid vendor. Drama aside, it is noteworthy that even in Snowflake’s own claims, Databricks outcompetes it on price-to-performance ratio.
🗄️ Is the data warehouse the new backend? This is a trend that many smart people believe in; I have a hard time grokking how it would actually work in practice. I need to dig in more here and would love your thoughts.
Ok, yes, Jillian is officially a member of the dbt Labs team…but she’s been creating dbt Community magic like this since long before we hired her.
Have you ever typed on a physical typewriter? What a miserable experience.