Ep 11: Should great data people become managers or not? w/ David Jayatillake of Lyst

David is Sr. Director of Data at Lyst, and as leader of their analytics + data science teams he has followed the evolution of data roles closely over the past decade.

David spends a lot of time thinking about career progression + data team structure, and in this conversation with Tristan + Julia he dives into the classic individual contributor vs manager conundrum, migrating between warehouses, and reactive vs proactive data workflows.

If you enjoy this episode, you can catch David’s upcoming talk at Coalesce 2021 in December (or watch the replay if you’re reading this post-Coalesce :).

Listen & Subscribe

Listen & subscribe from:

Show Notes

Key points from David in this episode:

What are some of the benefits that you've got from making these investments in open source technologies within your data stack?

I think with dbt, given we used it for migration from one data warehouse to another, it's obviously in our minds that it would be very straightforward to then migrate all our logic again to another data warehouse if we wanted to, and you can see some burning competitors to Snowflake on the horizon you could consider. 

But with Terraform, I guess that an element of that is about possibility, although I would imagine if we were to move to another data warehouse, I think the Terraform work would probably be more onerous than the dbt work to move. The draw of it was actually to automate how resources are created — squads roles, grants, schemas. We have two or three key modules in Terraform, which will either build data pipelines in Snowflake or actually build teams. So we'll provide a module of email addresses and names of team members, and what team they're in and what kind of warehouses they need. And it will just create all of those things to ensure they have access and all at once. So it just makes it easy for us to onboard, it makes it easy for us to pipeline. That's the real draw of Terraform, actually I thought about it as a draw for migration. 

Is there a standard advice that you get for helping people take an initial jump in their data journeys? 

I think it's so much easier now than when I started out. There's so many options, so many courses, all these places offering ways to interactively learn with provided datasets. In the data science space there's things like Kaggle and things where you can try your hand with real world data.

I think one of the things I do in my spare time is that there is a lack of people out there who actually know how to deal with data. If you go into the smaller worlds like local areas and things that they need. 

So in the village I live in, they have to do surveys to understand how people feel about development. So one of the things I did when I moved there, I volunteered to do the analysis for them. It's a small dataset, but it's an interesting dataset, and actually that could be a way in for someone who's starting out, is to go and help with some kind of community.

As people progress in data careers, is it more technical skills that start differentiating them or is it softer skills?

So, I think that's really interesting. That's quite a deep question because, for me, it depends on your choice. And less than I think — and this is true of big tech companies and other tech companies in the U.S. — you don't have to go into management to progress in your career. You can be an individual contributor at quite a high level.

And, as an analyst, there's a point at which your technical skills are good enough to do any role above that, and actually then it's much more about storytelling, influence, and it's almost like being who your business needs you to be at that point in time, looking in the right places, helping them understand their problems, making recommendations to the right teams. I think that's the higher level of being an analyst as an individual contributor.

Whereas I think that when you're on the management side of that progression, I think that the difference between of being an analyst is you're trying to be much more generic about understanding what are the wider needs, what everyone needs to know and trying to translate those requirements into either something that you're going to engineer yourself, or ask analytics engineering, data engineering, to provide for you, to enable you to serve. And then also mentoring, unless looking after their progression.

With analytics engineering, again, I think one of the individual contributor track it's very different to the analyst one. I think as you go up as an analytics engineer, it's very much where your technical skills keep going up and up. 

And I think it's much more similar to becoming a principal software engineer. You just want to get much, you want to get better and better at your tool chain. You want to have mastery over, obviously of git, SQL, dbt, warehouse optimization queries, probably to a level of software engineering as well, just generically, because it's so useful to them go beyond your data stack and fix things upstream of yourself because it unblocks you being able to do that so powerful. 

One of the analytics engineers we've hired used to be a front-end developer, and that's something we really want to leverage.

What have you done on the people side or culturally organizationally to make sure that your data team is thinking about how we create value or create a data product vs. constantly being the recipient of questions?

Yeah. So, there is still a fair bit of reactivity. 

I think the most reactive channel we have is probably our BI Slack channel, where people ask questions — now we have to be quite, quite strict about. This is for letting us know you think something is broken or where this thing is and look at it. It's not for some deeper question which should really be a JIRA ticket or something like that.

But even then, I still think even with the JIRA Kanban workflow we have our tickets, I still feel that it's reactive like you said. It's still a bit like being tapped on the shoulder or something, it's just in a better and more well-structured way where you can prioritize it.

And what our team has become more focused on is where we have analytics engineers looking at those tickets. And when they build the data models, building them in a generic way to solve many tickets, or many tickets at once. 

And then you have analysts. Most likely, I want them to become much more proactive about writing maybe pieces of research for the business about more complex topics and sharing them almost like in a blog style on Confluence or wherever else. So that the business can react to those recommendations and those insights, and actually change. So I think that's a real product in my mind. It's like a consulting product that internal analytics teams can do for you.

And then if you think about the analytics engineers are building it's the data model that enables both that product, but also self-service analytics from everyone else in the business. And also increasingly, you see things like Snowflake marketplace, the rise of data products, ETL, these themes of how people are monetizing their data. 

We want to move towards that too. We're trying to build towards a data product at least for our partners, and possibly other third parties as well.

Looking 10 years out, what do you hope to be true for the data industry?

So, I feel like for analytics, we're getting to a point of having a fairly mature toolkit. But I look off the data science side as well, and I'm looking at what they have, and it's so much less mature. You have some really cool things out there, like Coiled and implementations of Dask like that out there, but they're not production ready I'd say. 

And I'm seeing some new startups like Layer and things like this emerge. The idea of feature stores, which are similar to metric stores in many ways. And I hope that will be better because I feel like data science is so valuable if it's deployed properly and applied properly and we just spend a lot of time dealing with infrastructure and making a deployment. If data science came to the same level of maturity in the way that analytics has in the last five years, I think that would be amazing. 

Also, I feel like we still have to worry about — and this is true of analytics — you still have to worry about sizing your infrastructure and things like that. And I think it's going to become unnecessary to do so in the near future — and I think that would be a really good step forward as well. 

Whereas in the past BI tools were very closed source — you've got Tableau, Power BI, Looker to an extent as well — I'm seeing all parts of the stack becoming more open source — ETL as well — and I think that's an encouraging way forward. 

The remaining piece that isn't open source is the data warehouse itself. I can understand why that's a difficult thing to be open-source. There are people trying, but that would be interesting to see if that was a truly usable open source data warehouse in 10 years time.