The speed of analytics

Also: deconstructing the Data PM, spillover effect bias, a language for query modification and why Coalesce matters.

Oct 16, 2022

🎆🎉 It’s here! Coalesce week is nearly upon us. It is T minus 1 and I. Can’t. Wait. 🎉🎆

Tristan and I are both going to be in person in New Orleans, Connor is going to be representing us this week in Sydney, and Drew will be seeing you in London! All of us (and most of the dbt Labs team!) will also be hanging out at Coalesce Online 💻 in the dbt Community Slack — here you will find all the live session chatter and the best conversations. Wherever you’re joining from, come say hi 👋👋👋

My personal list of must see sessions this week:

Monday:
- Data - The Musical by Tiankai Feng
- Empathy-building in data work by Lorena Vasquez
- Operations vs. product: The data definition showdown by Nadja Jury
Tuesday: How to build your data team like a community by Kasey Mazza
Wednesday: From worst to first, revamping your dbt project to be world-class by Kelly Burdine & Michelle Ballen
Thursday: How to not be a terrible writer by Justin Gage

In this issue:

Our data assets are broken and data requests forms are at the centre of why. By Harshavardhan Mamledesai
What is a Data Product Manager? An example from a Role at Apple. By Eric Weber
The case for a query modification language by Amit Prakash
Why Spillover Effects Bias Your AB Testing Results and Ways to Overcome them By Weonhyeok Chung
Data Orchestration Philosophies and why Coalesce matters by Mahdi Karabiben

Enjoy the issue!

-Anna

The speed of analytics

We’ve recently started building out our own Analytics function at dbt Labs. This past week, Erica Louie and I had a great time jamming on what it means to hire Analysts alongside Analytics Engineers. I’ll leave her to share that particular set of opinions, but in the process of doing so we also talked a lot about the speed of analytics 🤓

Let’s assume that the objective of an Analytics function in a company is to help make timely business decisions based on good data. If the operative word here is timely, what is the right SLO for an Analytics function? Is it measured in weeks, days, or hours?

The question I asked Erica was:

“What if all of that is too slow? What if the answer needs to already exist by the time the question is asked?”

Though it is 🎃 season, I’m not talking about divination.

I’m talking about shifting from the typically reactive approach to aiding in business decisions (wait for the question, then help answer it as quickly as possible) to a proactive approach — anticipating the sort of questions that are likely to come up based on what is happening in the business (Is it planning season? Is there an upcoming product launch?) as well as based on what questions are not being asked (Is there a part of the business, customer journey, or user problem that is unowned or less talked about? Why? What do we know about it?).

The implication here isn’t that the analytics team should know how to run a business better than the folks, well, running the business 😛 The idea is that an analytics team is really close to the data being generated by the business and spends far more time looking at it than the folks running the business.

If you add to this the curiosity and storytelling capability of a great analytics team, the result you get looks something like this:

“Oh hey, we haven’t really looked holistically at our product funnel in a while. I wonder what this looks like today. Oh interesting — that conversion rate is a lot higher/lower than I expected. I wonder if that’s a data issue or something that changed in the business. Looks like something has changed and I have hypotheses about why that I’m going to test with some more data. Let me write a quick one pager and show this to the rest of the team/bring this to product/engineering/company leadership to make sure they’re aware”

Or maybe something like this:

“Product X is a great and established feature that I expect our customers to get lots of value from. Right now, most of the attention of our product/engineering/design function is going towards new launches. Let me see how this somewhat more established area of the business is doing, and how it’s contributing to the health of the business overall today. If I learn something interesting, I’ll write up some notes about it and share the next time folks are getting together for planning”.

I’m by far not the first person to suggest this is needed (👋 Data Twitter). I think that we all agree on. The part that’s harder is how do we get there?

Yes, being proactive in analytics requires carving out time for investigations into things that haven’t hit your company’s radar yet. And yes, it also requires having good quality data already available to reference.

What if those problems could become very tractable if we spend some time developing the right data assets that describe our business the right way?

Harshavardhan Mamledesai happens to have an idea for how to make this happen: focus on modeling customer touchpoints. In other words, according to Harshavardhan, you should develop your data assets (be they models, dashboards, a semantic layer) with an orientation towards your business customer and the activities that are happening in the business that are related to the customer:

Data assets with clear attribution to the customer touchpoints which are themselves part of the larger sales and marketing strategy mean the data assets are relevant as long as the sales and marketing strategy is relevant. As the sales and marketing strategy evolves with the changing needs of the organizations so do the customer touchpoints. The data assets can then be version controlled to match the evolving needs of the strategy.

Here’s an example of what this could look like for a business:

Marketing and sales customer touchpoint roadmap divided into pre-purchase, during purchase, and post-purchase stages. — Source: Our data assets are broken

We already do a lot of what the author is describing: attribution modeling that enables sellers to take action on incoming data about a prospective customer; measuring the success of the launch of a new product, price plan or feature; understanding customer health.

The difference from what we do today is being more systematic about building out the touchpoints. Very often, data teams have this type of work low on their priority list because it’s larger lift and the payoff is not immediate. The effort to build out touchpoints in models is erratic at best when it's driven by inbound requests. And in turn, not having your entire journey mapped out in data makes it hard to see the bigger picture of what's happening in the business.

However, if you make the choice to systematically describe your business through your data model in ways that allow you to take action or evaluate the results of business action — then you shift into proactive analytics territory.

That doesn’t necessarily mean dropping everything you’re doing and retiring to a cave for a year to build this out in SQL.

Map it out in a diagram first. Make a plan for the pieces of data you need to be able to express this efficiently — what are the core entities that need to exist that are shared across this customer journey? What instrumentation is missing? What common interfaces can you leverage to describe transitions from one touchpoint to the next?

And then chip away at your plan, one touchpoint at a time.

PS: If you enjoy these “looking under the hood of how dbt Labs does a thing” snippets, you might enjoy Tristan’s recent podcast on 20VC where he talks about the philosophy behind how the company was built, and the way he has put that into practice over the years.

Elsewhere on the internet…

In What is a Data Product Manager? Eric Weber is breaking down a real life Data PM job description, and talking through the the implicit (and explicit!) assumptions being made about the kind of background the ideal candidate should have and what they will be doing. Eric is inviting conversation in this thread, and I think it will be a very interesting one so pop on over and leave your thoughts!

Amit Prakash makes the case for a query modification language and I am into it. Enabling drill downs for meaningful data exploration in a visual way is one of the most time consuming things a data team has to do. The time invested rarely yields the desired value for the business because it is by definition always limited to what can be anticipated. Being able to do this on the fly can be life changing. Amit does a great job describing how their team solved this problem and why they made the product choices they did, so I encourage you to read the original post. This writeup is a really great example of behind the scenes of a major new user experience paradigm, and the thoughtful process that went into creating the right user experience, not just the easiest one.

I love detailed write ups on everything that can go wrong with experimentation. The latest post by Weonhyeok Chung is full of rich examples demonstrating how the very act of running an experiment can bias what you learn through that experiment. 10/10 must read before you embark on your next A/B test.

Finally, a shot of data espresso from Mahdi Karabiben, with some spicy opinions on the differing philosophies behind different orchestration tools, and also these very kind words about Coalesce:

If you never attended Coalesce before, I totally recommend doing so this year. You’ll learn quite a lot about the Modern Data Stack and how fellow data practitioners are doing more with third-wave data technologies. You’ll see how fun and engaging the dbt Slack is. You’ll experience how welcoming and diverse the data community is. And most importantly, you’ll feel that you belong - because you do. (emphasis original)

I’m not crying, you’re crying 😭

That’s it for this weekend folks. SEE YOU AT COALESCE!

I’ll be the one with purple highlights in my hair and a giant grin on my face.

The Analytics Engineering Roundup

Discussion about this post