Funnel analytics and AI models for event sequences
Misha Panko, co-founder and CEO of Motif Analytics, on moving from "what" questions to "why" and "how"
Misha Panko has worked in data for a long time, including on high performance data teams at Uber and Google. Today, Misha is the co-founder and CEO of Motif Analytics, a product focused on helping growth and ops teams understand their event data.
In this episode, Tristan and Misha nerd out about the state of the art in computational neuroscience, where Misha got his PhD. They then go deep into event stream data and how it differs from classical fact and dimension data, and why it needs different analytical tools. It's possible to answer specific event-based questions using common tooling and practices today, but it always feels like you're only scratching the surface of the questions that you could ask.
Make sure to check out the back half of the episode, where they dive into AI and how Motif is applying breakthroughs in language modeling to train foundation models of event sequences—check out his team’s blog post on their work.
Listen & subscribe from:
Key takeaways from this episode.
Organizations often put a lot of the value from analytics on answering basic questions about the state of business, and even when questions are more exploratory, they are fairly simple. Where do you see differentiation in analytical value between companies like Google and Uber compared to early stage companies?
Misha Panko: I think you might be surprised how much of Google and Uber are similar to smaller companies when it comes to where the value comes from. Lots of it is, is still about just getting basic facts and getting them to the decision makers when they are asking for it.
The very typical loop is product manager working next to an analyst or data scientist. And the product manager keeps asking questions for a deck or to help analyze an experiment or for some dashboard, or just for a custom question. That has been my experience in almost any company
My understanding is that at Matif, you're significantly less focused on that type of analytical flow and much more focused on a type of exploratory analytics that is very event stream based and helps you optimize the flow of customers through large scale digital systems.
I was a product manager on data platform teams serving internally a lot ofdifferent growth teams and product teams. And often I needed to help frame analytics.
One of the best lenses that I found was talking about analytics as a pyramid. At the bottom, when you establish analytics, people want to answer “what” questions—this is what I think of reporting. Usually they want to just know the number of users on a platform, the revenue, and they want to build dashboards and monitor that. That's lbread and butter of analytics that we usually think about.
But the north star of analytics is to actually guide decisions in business.
To guide decisions, you need to answer “why” questions and not “what” questions.
Exactly. If I didn't have this data, would I make a different decision about launching this feature versus that feature or expanding into this country versus that country. I would probably make a decision some way, based on hunches, based on my mental model of the business.
But if I have data coming in can it influence and change my decision? That's where the real value comes now. And when you start thinking about those types of questions, they are rarely just give me a number. They are more about the why.
The next level of the pyramid is answering the why questions or root causes. When I'm looking at my metrics, why did it suddenly go down?
The top level is where analytics is guiding and answering the “how” question. I don't want to just understand why metrics are moving, I want to understand how I can move my metrics in the direction that I want. It goes from what to why to how, but it's always a pyramid.
You can't come into organization and say let’s just start guiding, because first you need to get the bottom layer down. And that's where I see 80 percent of the companies are. They are still building that foundation and that's where a lot of data and data tooling work is.
We as a field are making a lot of progress with companies like dbt helping organizations get to the next level to answer why and how questions.
In most data domains, you actually don't have sufficient data to answer why questions very effectively.
That's a good point. I think that you're touching on a lot of things here, including causality. We're after causality. We want to find what drives the business.
Now, as you say, causality is very hard. You usually don't have enough data. You also cannot isolate like an experiment very easily. That's why A-B testing is the gold standard, but you can't make every decision based on A-B tests. And so what can you do?
I'm a practical guy and I help make practical decisions. PMs very often have to be threading the needle. On one end, you have the total causality of A-B tests. Now that's where you can make very definitive decisions based on causality. It’s rightfully became the bread and butter of how analytics needs to be run. Once you become mature enough, you set up an experimentation infrastructure in your company and run experiments.
But even at huge companies like Google, Uber, only a small fraction of decisions, everyday decisions, end up being made on A-B tests. And that's because it is pretty expensive to run one, right? You have to actually build the feature that you are testing. And then usually it takes anywhere from a couple of weeks to a month to run the test, analyze the experiments, make a decision
It's a slow process. So what do we have on the other end of the spectrum? It's correlations. We have an idea, A influences B, say, people seeing certain feature affects them subscribing, so let's just plot the correlation right between the two.
Unfortunately, the mantra is correlation is not causation. And there's too many correlations that exist in the data. So can we do something better between correlation and this A-B tes? And this is something that we call causal opportunities or practical causality.
And what it means is that you take into account the data that you have to try to de-confound, right? Or look at other possibilities of an explaination and still say that my hypothesis of A influences B still holds. It's a much stronger version of correlation. It's not fully causal, but it, what it allows you to do is to really narrow the space of potential ideas or hypothesis to work on.
Confounders are such a powerful concept. There's some underlying characteristic that you're not able to directly observe, but it's kind of showing up in the data. How do you de-confound that?
In general, there are statistical methods to do that. Usually you need to isolate all other variables that you're considering against and consider them together as a group and see if you consider all of them as potential independent variables affecting the dependent variable that you're looking at. If in the presence of this other confounder you still see the effect, then it's probably there. At least it's confounded against that other variable.
This is the space that the Motif operates in. Broadly, you're trying to be this space between A-B testing, which is extremely expensive but gets you a tremendous amount of confidence around causality, and general reporting, which looks at correlations but doesn't really try to make statements about causality. What is the Motif product experience?
The Motif experience is actually very different from traditional tools. We do a lot of education; we still don't allow people to just self onboard. We have a session with them first to explain the main concepts. We wanted to make the trade offs in a different way compared to how traditional reporting tools are doing decisions in analytics.
In reporting, you usually pre-think of the questions that could be answered. In exploring what, what we wanted to unlock is you don't know questions a priori, right? You might have starting questions, but you don't know where they're going to take you, depending on what you find. And so what's critical is this fast feedback loop, and ability to touch a lot of data.
That's why we go to events, the original raw events, that you have rather than standardized tables.
If you're working with a very large dataset, that will be sampled. And then we use rich visualizations to show it to you. Now based on that, you start understanding, okay, well, is it going where I need to go? Do I need to modify my query a little bit? Or maybe I'm ready now to actually query to the end. It’s a three-step process: ask, compute, visualize.
What types of questions are people asking in this interface? Do they use code to express them? Do they use visual experiences to express them?
So you can ask the same questions that you ask with traditional tools. How many users did action A? Or how many searches per session does a user do? You start with those questions, but sequences also allow you matching patterns on sequences.
You can start answering questions like, how often does A happen before B. How often do people look at this banner before they subscribe? You start going to these relationships questions. Does A precede B? How far is it before? How many times does it happen?
This is becoming more of exploring what affects what.
My experience doing event stream analytics is that you're sitting in front of this pile of data, and even if you have the ultimate technical capability to traverse this data, it is actually just challenging to know what hypotheses to test for. I would love for you to give your thinking around AI here.
Quick stream data or event data could be overwhelming and you’ve got to narrow it down. In fact, that's usually why we see people come to us at Motif. They want to see how to approach this. They want to understand their funnels and figure out whether A affects B, but the space is too big or not well-defined.
We start with what we call outcomes. Usually you are pretty good at knowing what you care about in the business. Are you trying to grow subscriptions, hours that people spend on the platform, things like that.
That's your outcome. And now what I'm interested is looking at are the predictors. It's still pretty wide because you can look at all the events that happened before, at all the dimensions on that event, but also at any sub-sequences of events.
To be able to go through all of that, you have to restrict it to the search space. And usually what people do when they go a little beyond correlations, they start building decision tree models, and then you want to test between them. You're creating a custom model to see if they're affecting the outcome.
It takes a lot of time to construct these features. That's usually 80 percent of the work, not running the model, but getting the data in place. And then you're still working with very restricted space. So we're looking at ways you can expand that search, but still keep it under control.
One way of doing it is just through exploration. It is giving you access to all of the events before. Where AI comes in, underneath this breakthrough LLMs technology is this transformers. And they happen to encode sequences very well. What we thought is what if we take that and expand it from just using it on words of sequences of words or tokens to using it on sequences of events with dimensions.
You train your transformer based model on sequences of events and it understands inherently the structure, which events go together, which dimensions go together or not, and then you fine tune it based on the outcome that you care about.
This transformer technology, this technologies behind LLMs, they happen to be good at modeling sequences and sort of compressing the information from sequences as just small embedding vectors.vAnd we are trying to use that for this use case of product analytics.
What do you hope will be true in the analytics space in five years from now?
I hope that we move beyond reporting. Even Google teams, Uber teams, they are still spending most of their time setting up reporting and trying to use data that way. While that's necessary, I think the most interesting part comes after that. So you have your data in place, and now I want to find the best way to find practical insights out of it.
I work with, with growth teams. They ask a lot of these questions in terms of why and how do I change things. And I think the answers lie in exploring the relationships between different pieces of the data, rather than just counts. And I hope that the field moves more toward that.
This newsletter is sponsored by dbt Labs. Discover why more than 30,000 companies use dbt to accelerate their data development.