Ep 43: Julia, Pedram Navid + Taylor Murphy recap Data Council
Julia just got back from Data Council in Austin, a conference organized by Pete Sonderling, where lots of startups share what they're building, data practitioners go to learn in hands-on workshops, and of course, investors go to spot the next big trend.
In this episode, Taylor Murphy (Head of Product & Data at Meltano) + Pedram Navid (Founder, West Marin Data) join Julia to recap the conference and have a bit of fun. They talked about streaming, how the MDS is growing up, new SQL variants, and, of course, AI.
Listen & subscribe from:
Key points from Pedram and Tayor in this episode:
What is it that makes streaming so exciting in your mind, and why now?
Pedram:
It's a great question. I think part of it is that when you go up on stage, nobody wants to hear you give a reserved answer about the future. They want bold predictions, so you kind of have to give them one, and I figured, why not stream this year? But at the same time, I do see a lot of changes in innovation happening in the streaming world.
I think if you think about it, our data and the way we think about the world is really like event-based and streaming when it all starts, even when we like to pull data out of a database, a lot of the time it comes out as a stream. It's just easier or has been easier traditionally to just take that stream of data and cut it up into batches and then process it that way as to where all their tools sort of operate. That's how we've sort of thought about data for a long time. If you go back to like the Hadoop days of daily batch jobs that ran on yesterday's data that's sort of been the mentality for a long time.Â
But it's starting to feel like streaming makes more sense for the types of workflows we have, especially as we're moving towards more operational types of activities where you want to keep data in sync and you need data to flow through these systems and it's.
Perhaps not necessarily to do an entire batch job every single day but maybe offer it in smaller pieces. So if it's possible, if it happens, I'll be super happy. If it does, then everyone will think I'm a genius. If it doesn't, then hopefully, we'll forget about it and we can claim next year, we'll be the year streaming.
Taylor:Â
Yeah, I'll fully admit my inexperience here. I think I live kind of in a batch world where the idea of streaming is nice. I think. I've never been convinced that the complexity for a lot of the use cases I've seen makes sense.
I think there are a lot of boring things that benefit from batch and that's a really good solution, and I think there will always kind of be that use case. There are just a lot of companies out there and it's not the most exciting thing to always talk about, but it's kind of fundamental. That's originally what drew me to data engineering is just that it's so fundamental to any of the cool things that we wanna do with data, and batch-based, for me, is kind of at the heart of that.
So, yeah, I would love to see streaming become a bit more mainstream. For me, that would be being like easier to set up and put into place in existing organizations where you maybe aren't able to think about everything, all your processes beforehand, before you actually set up all of these systems. I'm very much a batch boy, I think.
Could say a little bit more about the tech industry versus the non-tech industry? Do they have different problems, or should they work in similar ways?
Pedram:Â
I don't know if there should be, but I think there is, at least for now. There are certainly a lot of companies out there that are still like afraid to be on the cloud. They need air app services. They can't use SaaS products. The restrictions may be either on the government or they work with the government, or they're in healthcare and finance and things are a little bit more difficult.Â
So I think it's easy for tech companies, especially early on in their lifecycle, to adopt a lot of tools and play around with these things. I think a company with 30 to 40 years of baggage, if not more, many different companies that have been like put together and fallen apart, running mainframes, all these things that you can't really shed, it's a little bit harder to adopt latest, the greatest thing when you're still trying to keep the lights on with everythingÂ
Taylor:Â
I would add that the nature of startups and tech generally is the ability to experiment and try out things. A lot of these things are laboratories for new ways of working, for new, new modes of operation. For the past several years, everybody responds to incentives and when you have a strong incentive to grow fast, and you don't have to carry too much about costs, it incentivizes a certain set of behaviors. I think this has been really awesome for exploring new tools and finding like better, more enjoyable ways of working.Â
So now we have to respond to a different set of incentives, and the cost is obviously one across the board. And then, as Pedram mentioned, customer value, having empathy for what the customer actually needs, wants, and cares about. I definitely think there's an aspect of maturing, but now it's okay how we can meet people where they are, and bring them along this journey in a way that's cost-efficient. Personally, I think about dbt a lot where the story, it' is a much more enjoyable workflow. How can we maintain a lot of that efficiency and joy in these new workflows, but also consider cost in that as well, and then bring those together?
What are some of the milestones or critical markers of how data teams evolve and mature as an organization?Â
Taylor:Â
I'm happy to jump in here. I have one point I want to make about this. Emily Shario wrote a blog post back in 2019 on kinda like the three levels of data analysis, and it's basically reporting insights and predictions. So it's descriptive. It's things that you wouldn't be aware of, and then it's like making predictions about the future, which I guess gets more into data science. So that's a nice mental model of thinking about where you are as a business or just do I need to understand what is actually happening versus what new insights can I generate to help the business versus can I make predictions about the future?
To your point though, positioning data teams more strategically, I'm generally with you. Like I think it can both be true that maybe CEOs and CFOs are necessarily thinking about data teams as much as they should. But one thing that I talked about with some folks at Data Council, in particular, Abby was really good at saying that he always positions his data team along with the growth team. So something that is driving core metrics for the business, either in terms of raw revenue or cost savings.Â
I think that's where data teams can really shine uncovering insights, uncovering kind of the truth of a situation to help you increase growth or save on cost or protect your brand. And that's where I want to see people get plugged into more, and not just as this function, the center of excellence function that you put a request out and something comes back, and then it exists in isolation from any of the larger context. When it's really plugged in and more aware of the business, I think that's when I get really excited.
Do we think there needs to be more collaboration between the data science and analytics engineering worlds? If yes, are we getting more converging on similar ways of working together, or are we still operating in two different spheres?Â
Pedram:
I think we're not even close to getting close if that makes sense. I talked to some data scientists I used to work with, and they still don't even know what dbt is for the most part. They're so far removed from my small bubble of analytics.Â
So I don't know if we're anyone's really trying to bring these things closer. At the end of the day, maybe we operate on the same fundamental data. But our concerns in theirs are so different. Even though the way they think about metrics and feature stores, or the types of data you wanna aggregate, we want small, bespoke artisanal data they want giant factories worth of data that they can just dump in a warehouse and run massive models on. Do we need to align? I don't know, maybe one day I haven't really felt a real pain from that lack of alignment, though.
Taylor:Â
There was a recent article floating around the Hacker News, the title was like "Machine learning operations are mostly data engineering", which I thought was a catchy title. So from my perspective, there are definitely a lot of similarities and it's why data is so cool. There is such a diversity of backgrounds in how people come to this field and they bring with them their own experiences and I own ideas. I wish there were more spaces where it was easier and we were honestly like more incentivized to have these cross-discipline conversations around workflows and how we each do data.
I think back to when I was in grad school and we were running our models and our simulations in Matlab, dumping the data into Excel. The models would run overnight, triggered via my laptop on four Dell computers in my PI's office, and to make any updates, it was all using subversion. But it was like doing data, right? And so there are so many ways to do it, that's why it's so good to have conferences. There are so many diverse ways of working that we're not even aware of and I love hearing about it.Â
I just want more people to talk about what they're doing and how they're doing it because it's so interesting and I think we have so much to learn from each other. And yeah, I just wanna learn from them because there's probably something we can learn from the problems they're trying to solve and then make all of our other tools better.