Discover more from The Analytics Engineering Roundup
Ep 21: Ashley Sherwood (AE @ Hubspot) on Permissionless Innovation for Data Teams
How Ashley kicked off her team’s dbt implementation without disrupting their existing stack.
Ashley is a Principal Analytics Engineer at Hubspot, and has helped lead their implementation of dbt.
Ashley makes unique connections in her writing and work. On her Substack, "syntax error at or near ❤️," Ashley might be found comparing growing companies to butterflies, or going deep on how to accommodate sensitive people in the workplace.
In this conversation with Tristan & Julia, Ashley dives into the nuts and bolts of her trajectory pushing data innovation forward at Hubspot.
Listen & Subscribe
Listen & subscribe from:
Key points from Ashley in this episode:
Hubspot had impressive growth in the last few years. How does the role of the data team shift in this growth story?
That's a great question. And I think it depends in an org on whether that work is truly data first or not. And I think especially when you're a data person, data first feels like the right way to do things for me, it's much more what's the right fit for your org. I would say that HubSpot is very human first.
If a problem can be solved by having a slightly longer and more empathetic conversation with a customer, that's how the company wants to handle it as opposed to trying to automate that away. So we're biased to keeping human touches in the process to really hearing people out and trying to give solutions that benefit that individual customer. And that means data isn't first. That means automation and efficiency aren't first. We are having to really grow out those skills now that we're scaling, so that's some of the big shifts that we're seeing now. But in terms of the evolution of this data, or it's going to feel the most familiar to other people in companies where data isn't first, something else is, for us, it's that human piece.
So in my time at HubSpot, I came in when we were still pretty new, off of running derived tables on a chronic. So orchestration was kind of new, people are figuring this stuff out. It's pretty scrappy because you're not hiring data leaders to run your data for strategy, you're trying to figure out, "Hey, this looks like a cost center on our balance sheet. How much do we really need to invest here? What are the benefits?", and especially if somebody's been at the same company for a long time, it's hard to bring in that fresh perspective.
So, as HubSpot grew and was kind of in that middle of size when I joined, there were a lot of people coming in who'd been at other places, seeing how much benefit you could get out of a well-structured data stack. And could see that as the company was going to scale, we weren't going to be able to do what we needed to do fast enough if we were still flowing everything through the same central team of five people. So, things went from being very much you had to know what you were talking about. You had to come into the conversation, nobody was going to teach you Git, unless you found somebody who could sponsor that for you, nobody was going to kind of give you the full tour of the back end cause they had truly vital things to do things we're really busy. So the more you could come in and know for yourself, learn for yourself in the scrappier, you could be the more success you were going to see.
Then as the company's grown, we've gotten to the point where we need things to run more efficiently. We can't scale through just adding in more headcount and people have been articulating the return on investment we're going to get from having cleaner data or from being able to run operations on a broader set of data than was available previously. And as those pains grew, more structure is necessary. So in my specific role, I've gone from being the one person who was a combo operations analyst, data analyst, analytics engineer, and project manager to now each of those roles having four or five different people filling them.
And so we've gone from individual people being really scrappy and learning stuff quickly and trying to network with each other, to have to build pipelines of how do we get this knowledge to the people who need it so that you don't have to be a Git expert to be successful so that you don't have to already know everything there is to know about how to put a SQL query together in order to solve the problem that's in front of you. And really make it so that you don't need to just keep hiring unicorns in order to run your data org.
Usually, earlier stage companies don't even think in terms of cost center. You start thinking about these things when you're a much more mature company, and Hubspot does it in reverse as you said. How did that change? Was it lots of micro-changes that shifted the mindset?
Yeah, really good question.
A lot of it had gotten into motion before I joined HubSpot. So I can give my best estimate and only give my perspective, right? So I can share that and kind of where I see those things coming from.
So, HubSpot had gotten to a pretty good size mostly with data in Salesforce.com, which it's enterprise software, it can do that. As HubSpot as a product started to get into the CRM space and we really wanted to build our own CRM, but you go over the market is so certainly learned a lot from the way that Salesforce has executed things, both in terms of things to emulate and ways to differentiate. And as we got into that CRM space, we want it to be running our business on our CRM. But our CRM was targeted at small to medium-sized companies. And we, at this point, we're a large company. So in order to run our systems off of our CRM, we needed to build. The systems to fill the gaps between what the nascent HubSpot CRM could you, and what we had been doing with Salesforce. That got us into the data warehouse in a new way that got us into Looker in a new way.
So Looker was relatively new to the company when I joined in 2018 and folks were kind of getting their feet under them figuring out how to actually work with it. The data set that I came into owning when I joined was for our solutions partner program. And that had previously been run exclusively in Salesforce and to have over the past couple of years before I joined been built into its own microservice that HubSpot engineers had created so that we could actually run our partner program within our own systems. That meant there were new data sources people hadn't used before, there was a new reporting. We unlocked a lot of stuff we couldn't have done in Salesforce, customer reports, but also revealed a lot of the inconsistencies that happen when your source of truths don't line up with each other when everybody, and this team's running out of Salesforce and everybody in that team's running out of something else.
So for us moving to the data warehouse was later and was an equalizer in a lot of ways. And that we could finally start to see how these different frames of reference from different parts of the business weren't actually lining up with each other because we pulled them out of software and into an agnostic data container. So that triggered a lot of change in terms of who was doing what work. If you had asked somebody in 2016, how many partners do we have? That's a salesforce.com custom report. If you ask that question in 2019, that was a Looker report pulling from our backend databases. And so that skillset changed from, can you use the CRM tool to you need to understand SQL and entity relationships and how do these things work on the backend then? How normalized do we want them to be and how do pieces connect together?
So it created this opening for people with those SQL skills, data skills, and data structure skills, and necessity for them. So the person who could have pulled that report in Salesforce wasn't the same. So that could pull it in this sort of new world.
So that kicked off a really rapid hiring spree from late 2018 through 2020 of bringing in people with these more general data skills to build out our insight into all of our reporting systems and data because we'd fundamentally shifted the way that we were looking at things from Salesforce and Excel to having a centralized data warehouse accessible broadly to the company through looker.
We have fiefdoms that end up popping up where people get to pick their own tools and they have. It's only so powerful as if we can get that interoperability. Do you think we're going to need a bigger push to realize that future?
I think it comes down a lot to how complex and dynamic is the space that you were trying to tame, if you will.
So I think in some contexts you do have issues of territorialism where the problems that two teams are solving are similar enough that which solution is chosen. Maybe more a matter of individuals trying to exercise their own autonomy and preference than in the service what's most efficient or what's most consistent.
However, I think you also have cases where you're just dealing with two different parts of the organism. How you treat an injury on a finger is different than how you treat an injury on the trunk, and the tools that you use in those cases are going to be different.
So when you have teams that have very strict regulation or who are mostly looking back at large sets of data and telling stories, but not doing much in real-time, Tableau can be a better tool for that type of storytelling than Looker is. But when you're operational, when you're dynamic and when you need to be hitting the data warehouse with whatever, Looker is a much better tool for that.
Sometimes it comes down to what are the right tools for those specific contexts. And when those tools are different, that's where you start to get into this interesting question of what's the operating system that lets these different things plug in. And that's where if that metrics layer can be that nervous system that takes a central signal and flows it through to everybody, then that's can be really powerful. You've got like a vertebrate there. But if you need different parts of the org to be really independent of each other and have their own closed loops, you're dealing more with like an invertebrate octopus tentacle type situation, where that technical has its own brain.
So sometimes the central brain isn't the right way to solve the problem for an org. Different sub brains can learn from each other, but one of the reasons that an octopus doesn't trip over itself us each of the eight tentacles has its own brain and those brains talk to each other. Sometimes that central brain works best, sometimes those distributed options are good as well. And I think that in service of trying to optimize for not repeating ourselves, we actually miss some of the opportunities that autonomy can provide.
If you want two teams to move quickly, react to what their customers are saying and just go for it and innovate. If you ask them to always check with the central brain, you're going to slow them down and they won't be able to innovate. If you give them free rein to innovate, they will duplicate effort with other teams, but that opportunity cost of duplicated effort weighs against the opportunity of truly novel innovation. So if you have five new ideas and two repeated ideas,
you're way ahead of if you slowed your process down and only had two new ideas.
So all of that to say something like a metrics layer can be that central nervous system. The org is simple enough or hierarchical enough that it makes sense for everybody to plug in. Otherwise, you're going to just run into other constraints. It works well enough for enough of the org to share their metric definitions through Looker. That when we think about adding those same metrics for the machine learning team and Python, it might actually be better for us to redefine those metrics in the data science context for that team, separate from people who are working out of Looker because their worlds are so different and you lose the opportunities that your BI tool can create when you maybe move those definitions up into another layer that's more simple, even if it's more consistent,
What do you mean by "permissionless innovation"?
I got that from Tristan. In our first call, I told you what I was doing with the really hacky PowerShell script and he was like, "That sounds like permissionless innovation", and I was like, "Yes! That is exactly what is was". So I can kind of give the very short version of that context, and then we can get into the idea a little bit more.
This was when I was new to HubSpot, Jill had initially learned about dbt to throwback to her and then shared that with another good friend of ours who worked at HubSpot at the time and has since returned, Nina, who had done a proof of concept with it. And then as I was coming in, it was like, "All right, here's what I was able to get set up". This helped me think about the data modeling that I was doing in a better way, and then, "Hey, it templates SQL. All of our current orchestration is raw SQL scripts. You can copy and paste the SQL out of this and into there, and at least have done your development and ideation in this in the box of dbt with the tools that are available there". So I was like, "Yeah! Let's get this set up! I've done command line stuff before, not worried about installing things into Python".
So I got that up and running. I used that to build out a new data mart for our team. It made it so much easier for me to map what I was doing look for places to optimize good SQL by using dbt. And then spit SQL out the other side. And I hate doing the same thing more than once as a principal, so I wrote this really hacky, PowerShell script, and I used the analytics section of the dbt compiler to like compile a shell for the script to make it fit with our status quo. And then I used PowerShell arrays to just pick up the actual script and stick it into the shell and throw that into the GitHub repo and make my PRS for me. So I had put this together and was jealous like "Tristan, you have to see this", and then I explained it, But I really appreciated getting that kind of reflection back from Tristan of like, "Yes. This is innovation". So what can we do without disrupting the existing systems to prove that there's value with that long game of how do we bring this innovation to the rest of the org?
There's a little bit of scrappiness and hackiness of knowing your stack really well so that you know what you can get away with in your stack to prove the value of a new tool. And then once you have that value, that's life in practice, it's a lot easier to get other people to invest in it because the risk is much lower at that point.
What happened after you proved the value of the new tool you had developed?
It started as sort of an underground thing where I was using it to be very efficient in my workflow and sharing with other people who were interested in it. And then as is so often the case that takes something else shifting in a different place and you look for that opportunity.
I was on the analyst team for our partner program, HubSpot kind of chunks out into these different domain areas, we were large enough at the time that's how we were doing a lot of our data analytics and what was analytics engineering, but we didn't know to call it that yet. And the team that does that for our product analytics like usage and foreign, all that sort of what are people doing in the app? How are these products performing? They were also about to build out a new mart and wanted to do that in a scalable way. They wanted to hire a bunch of new analysts. So we've got clean slate. We need to hire a bunch of new people because we don't have the bandwidth. And we've got this person's awareness of dbt as a tool that can make all of that easier.
So that helps the prompt. If we have a bunch of new people, we want to train and we don't want to train them all on this status quo process, we want to train them on something more efficient, and then I have this running proof of concept of all of my models are built in dbt and then sort of shoehorned into the existing system. So that gave me the opportunity to put my hand up and say, "Hey, I actually have a running. We can cut over my schema from being populated by our status quo process, to being populated by dbt, we can build this net new team in that same way.". And I had some specific recommendations for how to hook it into our DAG and how to do some of the orchestration that would be low risk and help mitigate some of the concerns that our data team had in terms of spinning this up as a sort of test experiment, but more like a beta of, if this works, we know we want to roll it out.
And so that's what tipped us over to being able to get dbt actually running on our servers. And once it was actually running on the servers, there was a lot of organic interest in the tool and the efficiencies that it was providing. I was socializing a lot of that.
And then the final piece was getting the BI team. Are we? So we have our central data team. We call our business intelligence team through the way that, that the capillary works in the market more broadly, it doesn't quite line up to how we're using it at this point. But so that the central data team gets them bought into it and able to recommend it to all analyst teams. And then that provided the sort of security and guarantee to the other analyst teams that this was an endorsed tool, there would be support, this is what we're committing to in the long run. And then. From there, we were fully committed and migrated the status quo from what we'd been doing to being then fully on dbt.
More from Ashley: