Ep 35: The Data Generalist's Vision Quest (LIVE w/ Stephen Bailey)
Stephen joins Tristan for a live follow-up conversation to Stephen’s Coalesce talk, “Excel at nothing: how to be an effective generalist."
Stephen’s substack is a must-read if you’re uninitiated - my personal favorite is his September ‘22 post on data contracts (Data Person: Attorney at Law).
This conversation w/ Tristan (recorded live at Coalesce!) pulls on many of the threads from his writing, particularly his post Wander Well on the data generalist’s career path.
Listen & subscribe from:
Key points from Stephen in this episode:
There was a time when the modern data stack was focused exclusively on analytics. Now we are pushing it into other parts of businesses. How can we use innovations in the modern data stack to spur growth?
My mind goes to two distinct questions. And I think the first question is: what new groups are going to come in and own pieces of the modern data stack? So if you take out the data team entirely, can a marketing team set up Snowflake and build some operational analytics pipelines by themselves?
Can engineers who are already very sophisticated can they start building Syncs from their event logging system into Snowflake, and then building some data applications that are exposed to end users without the involvement of the data team at all. And so that's the sort of infrastructure layer that I think the advance in tooling could enable, where it's you're enabling more people to get in there even without centralized coordination.
And then the second axis I think about is what does the data team look like in two to five years. And I think that's a question that I'm really thinking a lot about.
Do you need a data team?
I don't know the actual history of this, but I imagine this sort of specialized role really had to be developed in response to complexity, with Hadoop and big data lakes, where all of a sudden you need people to take care of this big, massive investment and curate and maintain it and wring value out of it.
I think Joe Reis and Matt Housley's new book on data engineering, Fundamentals of Data Engineering, really expresses this really well where you used to have different types of data engineers like you had these big data engineers in the early 2010s, and now what you really have from a data engineering perspective are these life cycle data engineers who are stewarding data from ingestion through transformation and then to a domain.
And so it's much less about maintaining the system and much more about shepherding, this sort of making sure that it's getting to where it needs to go. And that is a complex process. It requires a lot of tools and a lot of knowledge of different technologies.
But it's a different type of specialization.
It's less technology ownership. and more business attached, this area of the business needs certain data flows. And we need to make sure that the data life cycle connects all the dots that are needed.
We don't need to spend two weeks optimizing a query and making sure it's efficient.
We can let the tools do that for us, but we do have to make sure that the organization is connected in such a way that the data is reliable and trustworthy, and on time for whatever the use case is.
The modern data stack creates widely divergent reactions from people with the title data engineer. Why is that so?
A bit about my background: I was trained as a scientist and did some biomedical imaging work and cognitive psychology type of research. And then I was the first data science hire at my last company, Immuta. And at Immuta, I just bounced around to a bunch of different places and got to see the business from a hundred different angles. And built some of the first dashboards. I ran Salesforce for several months, which was a lot of fun.
In some ways a total headache and a lot of fun in some ways.
But I think I have always embraced the chaos. I love the chaos and the ambiguity. And I think that's like, when I think about data teams, even in the name data team there's a sort of a lack of specificity in it.
There's an ambiguity to it like we don't know what to call this. So the defining factor of what they do is that it's like data being repurposed for some reason or another. And at a certain point, especially I think in the engineering world, you get to a certain level of specialization and then right now you're graduating into something else.
If you're focused on real-time event systems, you're probably going to be maybe more of a software engineer, at some level you might get turned into a machine learning engineer, or you might turn into an analytics engineer. You're kind of like a Pokemon. It's like you're a base-level Pokemon.
And then, depending on what you're exposed to in the business, you can evolve into a more specialized one. But I love having that flexibility. And I think there's a very real organizational challenge that people who stay flexible in general can address by being able to plug into different parts of the business.
It seems that generalists like you are often valuable for the business, but many businesses don't know how to promote and compensate generalists that well. Would you agree?
I think one reason is you can see patterns in generalists. And I would say in 2019, probably a lot of the people who were in the dbt Slack and like really glommed onto the tool and saw the potential were generalists. Because they'd seen the patterns over and over again across different companies that they'd worked with. And they saw, oh, this is repeatable and if we get it right, we can reuse this again. And so I think what happens with data teams is we get spun up in a company, but the company doesn't really understand what we need to do.
And in some ways, you can't understand before you're there what needs to be done. Like you have to react, it's a little reactive to the needs of the business. And you have to plug into where the most value is at the moment.
My experience of being a data professional is that it is the integration of different things that creates the value that I am able to bring to an organization. I think that career paths and data are endlessly fascinating, and really, I don't think we have figured it out yet. Would you agree?
I think that's why you have so much crossover. My first job was as an operations person for a six-person startup in Alabama. It was a teacher recruitment company called Teach for America, and we were starting a new region. And I was the operations person, which meant I did basically everything to support everybody in all the initiatives from like a logistical standpoint.
And so I got to see the entire business, whether it was fundraising or teacher support stuff. I worked with external partners on certification processes for teachers. It was like a vision quest and I was being dragged through it against my will. And bumping my head on everything.
But it showed me the value of those experiences later on because the value you get out of those experiences is never that clear in the moment.
And I think the big value is you start to see the world into business from so many different perspectives. And so you start to see what is common across the different ones, and also what people care about.
What are they incentivized to care about, whether that's the marketing part of the business, whether it's the fundraising, whether it's the product side, you start to understand and so you can start pulling patterns and seeing the bigger picture.
Your very first post was completely sarcastic. What made you write that first post and adopt this particular writing style?
Yeah, so that first post. One thing I found in the last job I worked at Immuta which is a technology company, a data catalog that has privacy controls on it.
And one of the things I found that I really loved there was working with the product team and giving product feedback and like ideating on different features we could put in and the ways we could connect with customers and make and simplify things. And I just love that sort of process of throwing an idea out there and seeing how well it lives on its own and curating it a little bit. I would do these GitHub prototype projects over the weekend and things like that. And so when I transitioned to my new role at Whatnot, there was this gap where I knew I wanted to get these creative energies out, but I also knew I didn't want to be building prototypes that just got scrapped on the Monday after I built them.
And writing is a great outlet for that sort of energy where you want to funnel a bunch of creative forces into a single thing and then let it go, do its live or die, on its own. So that was the motivation to start writing.
So the first article was a mock press release in the style of Amazon's press release, FAQ type of format about a metadata platform that looked just like a segment where you could just pipe metadata into it and it would assemble a lineage graph dynamically and integrate with all of your other tools.
The title was The Metadata Data Platform. And it's like all of the ideas. So even though I write a little sarcastically, usually they're ideas that I actually really want to get out in the world, and maybe someone else will build this.