The speed of analytics
Also: deconstructing the Data PM, spillover effect bias, a language for query modification and why Coalesce matters.
šš Itās here! Coalesce week is nearly upon us. It is T minus 1 and I. Canāt. Wait. šš
Tristan and I are both going to be in person in New Orleans, Connor is going to be representing us this week in Sydney, and Drew will be seeing you in London! All of us (and most of the dbt Labs team!) will also be hanging out at Coalesce Online š» in the dbt Community Slack ā here you will find all the live session chatter and the best conversations. Wherever youāre joining from, come say hi ššš
My personal list of must see sessions this week:
Monday:
Data - The Musical by Tiankai Feng
Empathy-building in data work by Lorena Vasquez
Operations vs. product: The data definition showdown by Nadja Jury
Tuesday: How to build your data team like a community by Kasey Mazza
Wednesday: From worst to first, revamping your dbt project to be world-class by Kelly Burdine & Michelle Ballen
Thursday: How to not be a terrible writer by Justin Gage
In this issue:
Our data assets are broken and data requests forms are at the centre of why. By Harshavardhan Mamledesai
What is a Data Product Manager? An example from a Role at Apple. By Eric Weber
The case for a query modification language by Amit Prakash
Why Spillover Effects Bias Your AB Testing Results and Ways to Overcome them By Weonhyeok Chung
Data Orchestration Philosophies and why Coalesce matters by Mahdi Karabiben
Enjoy the issue!
-Anna
The speed of analytics
Weāve recently started building out our own Analytics function at dbt Labs. This past week, Erica Louie and I had a great time jamming on what it means to hire Analysts alongside Analytics Engineers. Iāll leave her to share that particular set of opinions, but in the process of doing so we also talked a lot about the speed of analytics š¤
Letās assume that the objective of an Analytics function in a company is to help make timely business decisions based on good data. If the operative word here is timely, what is the right SLO for an Analytics function? Is it measured in weeks, days, or hours?
The question I asked Erica was:
āWhat if all of that is too slow? What if the answer needs to already exist by the time the question is asked?ā
Though it is š season, Iām not talking about divination.
Iām talking about shifting from the typically reactive approach to aiding in business decisions (wait for the question, then help answer it as quickly as possible) to a proactive approach ā anticipating the sort of questions that are likely to come up based on what is happening in the business (Is it planning season? Is there an upcoming product launch?) as well as based on what questions are not being asked (Is there a part of the business, customer journey, or user problem that is unowned or less talked about? Why? What do we know about it?).
The implication here isnāt that the analytics team should know how to run a business better than the folks, well, running the business š The idea is that an analytics team is really close to the data being generated by the business and spends far more time looking at it than the folks running the business.
If you add to this the curiosity and storytelling capability of a great analytics team, the result you get looks something like this:
āOh hey, we havenāt really looked holistically at our product funnel in a while. I wonder what this looks like today. Oh interesting ā that conversion rate is a lot higher/lower than I expected. I wonder if thatās a data issue or something that changed in the business. Looks like something has changed and I have hypotheses about why that Iām going to test with some more data. Let me write a quick one pager and show this to the rest of the team/bring this to product/engineering/company leadership to make sure theyāre awareā
Or maybe something like this:
āProduct X is a great and established feature that I expect our customers to get lots of value from. Right now, most of the attention of our product/engineering/design function is going towards new launches. Let me see how this somewhat more established area of the business is doing, and how itās contributing to the health of the business overall today. If I learn something interesting, Iāll write up some notes about it and share the next time folks are getting together for planningā.
Iām by far not the first person to suggest this is needed (š Data Twitter). I think that we all agree on. The part thatās harder is how do we get there?
Yes, being proactive in analytics requires carving out time for investigations into things that havenāt hit your companyās radar yet. And yes, it also requires having good quality data already available to reference.
What if those problems could become very tractable if we spend some time developing the right data assets that describe our business the right way?
Harshavardhan Mamledesai happens to have an idea for how to make this happen: focus on modeling customer touchpoints. In other words, according to Harshavardhan, you should develop your data assets (be they models, dashboards, a semantic layer) with an orientation towards your business customer and the activities that are happening in the business that are related to the customer:
Data assets with clear attribution to the customer touchpoints which are themselves part of the larger sales and marketing strategy mean the data assets are relevant as long as the sales and marketing strategy is relevant. As the sales and marketing strategy evolves with the changing needs of the organizations so do the customer touchpoints. The data assets can then be version controlled to match the evolving needs of the strategy.
Hereās an example of what this could look like for a business:

We already do a lot of what the author is describing: attribution modeling that enables sellers to take action on incoming data about a prospective customer; measuring the success of the launch of a new product, price plan or feature; understanding customer health.
The difference from what we do today is being more systematic about building out the touchpoints. Very often, data teams have this type of work low on their priority list because itās larger lift and the payoff is not immediate. The effort to build out touchpoints in models is erratic at best when it's driven by inbound requests. And in turn, not having your entire journey mapped out in data makes it hard to see the bigger picture of what's happening in the business.
However, if you make the choice to systematically describe your business through your data model in ways that allow you to take action or evaluate the results of business action ā then you shift into proactive analytics territory.
That doesnāt necessarily mean dropping everything youāre doing and retiring to a cave for a year to build this out in SQL.
Map it out in a diagram first. Make a plan for the pieces of data you need to be able to express this efficiently ā what are the core entities that need to exist that are shared across this customer journey? What instrumentation is missing? What common interfaces can you leverage to describe transitions from one touchpoint to the next?
And then chip away at your plan, one touchpoint at a time.
PS: If you enjoy these ālooking under the hood of how dbt Labs does a thingā snippets, you might enjoy Tristanās recent podcast on 20VC where he talks about the philosophy behind how the company was built, and the way he has put that into practice over the years.
Elsewhere on the internetā¦
In What is a Data Product Manager? Eric Weber is breaking down a real life Data PM job description, and talking through the the implicit (and explicit!) assumptions being made about the kind of background the ideal candidate should have and what they will be doing. Eric is inviting conversation in this thread, and I think it will be a very interesting one so pop on over and leave your thoughts!
Amit Prakash makes the case for a query modification language and I am into it. Enabling drill downs for meaningful data exploration in a visual way is one of the most time consuming things a data team has to do. The time invested rarely yields the desired value for the business because it is by definition always limited to what can be anticipated. Being able to do this on the fly can be life changing. Amit does a great job describing how their team solved this problem and why they made the product choices they did, so I encourage you to read the original post. This writeup is a really great example of behind the scenes of a major new user experience paradigm, and the thoughtful process that went into creating the right user experience, not just the easiest one.
I love detailed write ups on everything that can go wrong with experimentation. The latest post by Weonhyeok Chung is full of rich examples demonstrating how the very act of running an experiment can bias what you learn through that experiment. 10/10 must read before you embark on your next A/B test.
Finally, a shot of data espresso from Mahdi Karabiben, with some spicy opinions on the differing philosophies behind different orchestration tools, and also these very kind words about Coalesce:
If you never attended Coalesce before, I totally recommend doing so this year. Youāll learn quite a lot about the Modern Data Stack and how fellow data practitioners are doing more with third-wave data technologies. Youāll see how fun and engaging the dbt Slack is. Youāll experience how welcoming and diverse the data community is. And most importantly, youāll feel that you belong - because you do. (emphasis original)
Iām not crying, youāre crying š
Thatās it for this weekend folks. SEE YOU AT COALESCE!
Iāll be the one with purple highlights in my hair and a giant grin on my face.