One of my favorite stories about how dbt came to be, and why I think it has been successful in a counterintuitive way, centers on our understanding of analytics personas.
In this issue I want to tell a bit of that story and then link it back to the writing / thinking I’m doing about the ADLC (analytics development lifecycle) these days.
First, the story.
======
Ten years ago, I was a bit of an unusual creature. Today, there are a lot of people with my particular mix of professional skills, but a decade ago there weren’t.
Specifically, I had three specific things:
Strong analytical skills
Deep business knowledge in a few domains
Just enough software development skills to be dangerous
A decade ago it was not unusual to have one of those. Two of them could give you an unfair advantage. But if you had all three, you could do some pretty neat things.
So when I went to start a data consultancy called Fishtown Analytics, I started with slightly different priors. All of the work I was going to do was going to adhere to software development best practices. My clients didn’t really care about this at the outset, but I knew it was going to help me deliver a ton of business value per hour (ultimately the most important metric for any consultant). This ended up being true, and it drove our success as a consulting business.
And when I needed to build some tooling to help me implement software engineering best practices in data, I built it assuming that the user was modestly technical. Not highly technical, not primarily interested in thinking about software architecture all day, but technical enough to do some historically-atypical things.
Such as: use git. Such as: use the command line. Such as: write Jinja macros.
If you’ve only been in data for <5 years, theses behaviors may seem normal to you. In 2015 they were not. And I know this for two specific reasons:
When I showed data practitioners dbt back in 2016, almost none of them wanted to use it. Data engineers didn’t want to use it because they preferred to just write Spark jobs. Data analysts didn’t want to use it because they found it to be too technical, too hard to use.
When I showed dbt to VCs back in 2016 (I pitched maybe half a dozen folks just out of curiosity), they all told me that they weren’t confident there was a large market for a product like this. Products built for data analysts were typically visual, and products built for data engineers didn’t use boring languages like SQL.
I was totally fine with this. I didn’t intend for dbt to be a widely-used product, I just intended it to be the way I created exceptional client outcomes and supported an awesome hourly rate. I thought it was neat that a few other weird folks like me also found dbt to be useful, but that was never really the point.
The first people that joined the dbt community were risk takers, confident in their own abilities, willing to say ‘fuck it, I’ll learn the CLI’ even if that was a totally new skill. And when this paid dividends for them, they told others. The rest of the story is just an exponential function playing out over time.
Over time, people are like water. Water always travels to the lowest point if it has a path to get there. People always develop the skills that help them be maximally effective…if they have a path to get there!
Betting against people, their initiative, and their intelligence is always a bad bet. If a certain group of people seems like they’re underachieving, it’s not because they’re stupid and unmotivated, it’s because they’re blocked. dbt allowed data analysts to build mature data pipelines and significantly magnify their impact as professionals. It got them noticed, got them promoted. It unblocked them. It never underestimated them.
I always think about those VCs who, in 2016, told me that there was no market for dbt. They weren’t wrong—there was no market for a product like dbt in 2016. The market had to be created. And that required two specific things:
The actual result had to be 10x better. No one is going to learn challenging new skills to be 30% better; that type of hard work has to be rewarded with an order of magnitude improvement. The only way to know whether something is an order of magnitude better at the outset is to experience it for yourself.
There had to be a culture in at least a part of the data practitioner community that valued technical skills. In 2016 there were a LOT of data analysts who were envious of their data scientist peers and wanted to “get more technical” so that they could increase their earning potential. This meme was already present.
Both of these things ended up being true; it was just really hard for VCs to observe this from the outside back in 2016. From the inside, it was obvious.
======
Why do I share this story now? Is senility setting in? Am I longing for the good old days?
You’ve stuck with me this long; indulge me for just a bit longer.
I mentioned in my last issue that I’m writing a rather lengthy whitepaper right now about the ADLC. In it, I write a bit about personas, and I split basically all data work into three of them. Here’s the snippet:
The Engineer
The engineer creates reusable data assets: pipelines, models, metrics, etc. The engineer is primarily focused on creating data assets that will be used by others to create business value.
The Analyst
The analyst performs in-depth analysis that drives decision making. The analyst does not make decisions, their role is quantitative investigation and presenting of analysis and/or recommendations to the decision maker.
The Decision Maker
The decision maker is responsible for taking quantitative outputs and translating them into action for the business. This is inclusive of everyone from a campaign manager optimizing segmentation to a CEO directing the resources of an entire company.
These are not job titles, they are roles that humans play in the ADLC.
They’re also not revolutionary. You could maybe pick at the details, but likely this seems fairly uncontroversial. Here’s the important point, though:
Analytics is not an assembly line.
You cannot disassemble an analytical problem and hand it out to a set of different humans and have them all come back together with an answer. Well, you can—but you can’t expect this to get you good outcomes.
Analytics cannot be effectively treated as an assembly line because analytics is an iterative process that involves asking and answering questions, gathering data, poking at it, getting curious, getting stuck in dead ends, and realizing that the fact you learned way over here is actually the answer to this question way over there.
Analytics requires a neural network—currently, a human!—interacting with a really good computer. And neural networks do not cleanly submit to industrial logic, to mechanization.
When you try to treat analytics like an assembly line you do get predictable outcomes (likely dashboards!), but not insights that drive ROI. You certainly don’t get agility or velocity. Insight, agility, and velocity in analytics require curiosity, flexibility, integration.
The best data teams allow talented people to flex between these different roles. They allow them to take an idea and get curious about it, to explore it without needing to file a ticket or wait for anyone else. Getting stuck in someone else’s queue is where non-linear ideas go to die.
Here’s my mental model for how personas work in data.
The above three personas are like hats you put on. You have your primary hat, the one you like wearing best. But over the course of the day as you get pulled into solving real problems, you will be best served by putting on different hats.
The most effective data practitioners can wear all three hats. And the best data tooling enables as many people as possible to wear all three hats. Even with great tooling, you will still have a hat you prefer. But the ability to wear all of them as the situation demands is an absolute superpower—it allows you to complete a single thought yourself, without getting stuck behind someone else’s priority list.
I believe most data tools get this wrong. I believe this is because classic product management teaches us to focus on a single persona and then ask how that persona wants to get a particular job done. This mindset assumes people are fixed.
I prefer to ask the question: how do we create product experiences that have a gentle learning curve, that take every user on a journey that leads them to non-linear outcomes? How do we make it possible for many people, each on their own journeys, to collaborate together and achieve something incredible? This mindset assumes people can and will grow.
People are smart and motivated. Underestimate them at your peril.
=====
I’m getting close on wrapping the ADLC whitepaper. Expect a link soon. If you’re not subscribed today and want to read it, just subscribe and you’ll get the link as soon as it’s out.
Also: can’t wait to see you at Coalesce :D
- Tristan
Completely agree with everything in here Tristan. AI is going to start blurring the lines even more between these roles since a single person will easily be able to flex into different areas. The best data analytics platforms will be agile data analysis platforms, not the ones that require rigid, linear modeling.