Ep 32: Katie Bauer: Data Scientists are not Pizza
Understand the mental model of a data science manager.
Katie was a founding member of Reddit's data science team and, currently, as Twitter’s Data Science Manager, she leads the company’s infrastructure data science and analytics organization.
In this conversation with Tristan and Julia, Katie explores how, as a manager, to help data people (especially those new to the field!) do their best work.
Listen & subscribe from:
Thanks for reading The Analytics Engineering Roundup! Subscribe for free to receive new posts and support my work.
Key points from Katie in this episode:
Why do you think data teams are necessary? Why is it important for companies to invest in them?
Yeah, that is a topic near and dear to my heart. And I'll also say I don't think it's only accountability. I think data teams have a unique role in that they are solving problems with data. And if you have the right kind of data at your company, You can use that to solve many different types of problems.
Accountability is often the most pressing one. And it intersects with a lot of different things. Some of it is just understanding what's happening which is where you probably should start if you're running a data team or building a data team because it's very hard to hold yourself accountable.
If you can, just objectively state or describe what is happening or what has happened in your organization. So that's always a first-order concern is just what is going on. For that starting concrete and really just basic things you will always need to know is always the right way to start.
People like there, there's a lot of discourse on Twitter and substack and in the community general about dashboards and how people don't like them. But I'm personally a bit of a dashboard appreciator because they're just like such a useful tool for actually making information widely available and concrete to people.
Why are our insights actually the wrong goal for data teams?
Yeah. I mean, there are probably all sorts of like catchphrases.
You could coin around this. Like it's not about insights. It's about knowledge. But this, this is part of, this is part of why I say, data teams should be solving problems with data. And sometimes the problem is that you need insight. But like insight is just such a vague word. Like, I know what people mean by it, but there's so much wiggle room in it. It really should be focused on what people will do, or actions they will take. And people will give the word insight, the modifier of actionable insights, but something like growth accounting, for example, like it's a common framework people use to understand the growth of a business. That's the type of thing that is really useful for data teams to organize around because the action is baked in.
If you've got your DAU number and then you've got a sense of like, "Okay, well, new users are up, but we've had so much churn". Like you know what to do about that right away. And fundamentally that might be an insight of our insight is that churn is down this month, but it is tied to a recommendation.
And focusing really on what people are gonna do with the information you give them or how you expect them to act or what they're even capable of doing. In some cases, that's the step that's missing. I think a lot, when people talk about insights, being the goal of a data team, it just makes you sound like you're doing science fair projects. If all you're doing is coming up with insights.
How do you know a problem is worth solving?
That's a good question. I don't think there should be a blanket recommendation here. I mean, this is like the statistician, it depends. But like, I really think you should be conditioning it based on what stage your team is in what kind of relationship you have with that stakeholder, what state your business is in. When you have a new product or a new area that you're working in, probably it's gonna be a lot more pull. You just don't know necessarily what is necessary yet.
So it's really hard to be proactive. You're still learning about the space. And in order to do that, you have to listen to what the business is telling you, or maybe what your direct stakeholder is telling you and try to react to that probably a bit more than most people think they need to because you're trying to learn and build a relationship with those people so that later you can anticipate their needs.
In the core model, there is also something that helps you to learn the space that you're working in. Maybe you have done data work in a similar domain previously. So you'll, you'll have a good sense of like the metrics and the rhythms of that type of business, but in many cases, you won't because people like to do new things all the time. And it's important to. Take direction. When you first start working on something, just to get a sense of what's important. And as you learn the space better, as you learn your stakeholder better, eventually you should start pushing more being more proactive and anticipating what they need. I do think that stakeholders want this. There's a bit of a narrative of like, "Oh, they just always give us tickets or like, oh, they give us this vague thing that we can't possibly do", or just not being a service team, for example, that's kind of a meme is that no one wants to be a service team.
But you have to play a role in that yourself. Like you have to actually want to not be a service team and you do that by trying to actually focus on driving impact. For many data teams that is focusing on analysis on things that would be actionable or giving them data or models or experiments, like whatever your particular deliverable is, like making sure that you are producing things that tell them what to do next and help them contextualize the things they're seeing learn about the choices that they're making. That's how you get proactive in that you make it possible for them to know what to do.
There are platform teams and teams that actually help with analysis. How do you set up these two different types of teams? What's a good way to get started as your data organization?
Yeah. The best thing is probably to figure out which of those types of teams you're building. , I say that because there's so much emphasis on things like self-serve analytics that people don't necessarily realize you might need a specific type of person who can do that. And I've certainly made this mistake before where I've hired analysts, knowing that they're gonna have to write ETL.
And they don't realize that they're gonna have to write ETL and just realize oh man, we actually need to do a ton of training here for them to be able to do this job. And that can be very frustrating for those people to come into that situation, thinking they were gonna come and do a bunch of fun analysis.
They probably will still do fun analysis, but they might need to get through some other stuff first. So the first thing is just to figure out what does your organization actually need? In some cases, you'll have very savvy partners who maybe can do some basic analysis on their own. And at some point, they may need help scaling.
But if you've got like that first level of defense of they know how to fend for themselves on basic questions. You should think about what they need in addition to just having information available to them. And that's probably a harder situation actually like building an analytics first team because it's so open-ended it's an insights team. It's fluffy, it's abstract. what, you're what you're gonna do. So really clarifying in that case, like what are the research questions? What type of information are we going to be finding and making available and how are we gonna do it?
And when can you expect us to give it to you? That's really important for building that kind of team. If you are going to build a team like that. And the reason I'm emphasizing the analytics team or the insights team over the platform team is that's more of the type of manager than I am. But depending on what your potential scope is in an organization, figure out who your key customers are because another really huge and frequent trap that I think a lot of data teams fall into is everyone is excited to have data and everyone loves data. So they just pile on the analytics team and there are ways of trying to serve everyone, but you probably shouldn't. One of my colleagues has a saying data scientists are not pizza.
You cannot split them up. And, it's good advice. Like people will need to have a meaningful understanding of the context they're working in. So if they're switching between domains a lot, they're probably not gonna be able to provide very useful analysis or information to their stakeholders.
So they need to be able to focus on something long-term and you cannot support your entire organization very easily if you don't have a person to pair with every potential stakeholder. So being careful about what you commit to supporting is another really important part of starting to build a team like that.
How big should data teams be? There should be a big diversity of years of experience on the team so that you can have more mentorship and you can create more guidance for newer folks on teams? What have been some patterns that you've seen work?
Yeah. On team size, anecdotally, I find that it depends on the scope of the particular manager in question as well. But I have found that four to six is more of a sweet spot for at least data science teams that are focused on analysis.
And this is something that like, I've talked to a bunch of people about this and that's the number they've given me. And I have managed nine people directly and found it very difficult, previously, because it's just the detailed knowledge that you need to be able to give meaningful feedback to the people you manage.
It's hard to do that when your scope gets really broad. I think that one of the reasons why embedding is also popular is that it reduces complexity for the people who manage the analysts or scientists or whatever title you happen to be using. The junior /senior question. If you have control over it you might not always because headcount is really just an abstraction for the budget.
And if you don't have a budget to hire a very senior person, sometimes you can't. But if it's a new domain where no one's worked before, you should probably try to hire a senior person first because of kind of what we talked about earlier, where a junior person, who's just learning how to do data work, having that person go sit in a domain with no other data people to help them. That's probably not gonna be a very good situation for them. And if you are the manager of that junior person who doesn't have anyone else in their domain, it's your job to be their senior person as I said earlier. But in terms of like cross-domain collaboration, that's another big problem that happens with these embedded styled teams.
And I don't know that I have a good general purpose solution to it beyond just making sure that people know that they have partners. It's almost like you have like this person's a major in marketing and this person is a major in growth. And the growth person's minor is marketing and the marketing person's minor is growth.
And those two people can provide feedback on each other's work. Hopefully, share context with each other that helps them do it. And this, if you start having a bunch of sub-data teams in your organization, it's very helpful to organize them in this way where their scopes are generally aligned and related.
So maybe you do have a GTM data team in your company. And within that, there is like a sales data team and a marketing data team and a growth data team or like growth product is what I mean there. Trying to make sure that scopes are aligned up and down management chains. And also stakeholder groups are just a really good way to make sure that relevant context is shared and that people are able to get support and feedback from people who know enough about their work for it to actually be informative and constructive.
Tell us a little bit more about why you think the conversation around tools is really just another conversation about the craft or the data work itself.
Yeah. So I work with an infrastructure organization and it means I end up thinking about how infrastructure gets created a lot. And something that was once described to me is the infrastructure process that everyone agrees on.
But like for a lot of data work, we haven't agreed on how we do things. And over time as we start to recognize common workflows or very standard types of analysis or processes that we do. Eventually it's yeah, like there's we know what we're doing here. There's not any deviation. Let's just make a tool so no one has to make a choice about this and you can move through it and go to whatever other thing you find more meaningful. That's one reason why I think the tooling discussion is really just about work because tools are processes that got hardened into infrastructure.
Another aspect of it is that it really reflects that the complexity of problems that data teams are solving is getting bigger because it requires us to collaborate now. A lot of old, Excel is a great tool. Don't get me wrong. I like Excel a lot. Like I'm a dashboard appreciator, I'm an Excel appreciator.
But it's really hard to directly collaborate with someone in Excel. If you've ever received a random Excel workbook from someone, and you have to go through all of the different cells and figure out what's happening and how all the tabs are related. That's just a nightmare. The push to be more like engineers is really just helping us get to more collaborative working styles because engineers are very collaborative, and there's so much that is baked into tooling that makes it so you don't have to understand detailed information. I guess a really good example of what I'm about to start describing is continuous integration, for example, in a large code base.
There are so many different other things happening in the code base that you probably don't have direct knowledge of. And having integration tests allows you to test before you push code into production, whether you're gonna break something you don't know about. We don't necessarily have that with data yet, but we're starting to get there.
Like we're starting to have tools that require us to, or that reduce the requirement for us to know about every detail of something. And I'm fundamentally a practitioner. The tooling conversations are interesting to me because it's us all agreeing on something.
More about Katie:
You can also find her on Twitter at @imightbemary.
Thanks for reading The Analytics Engineering Roundup! Subscribe for free to receive new posts and support my work.