Discover more from The Analytics Engineering Roundup
Keep Data Council Weird
Data Council changed in part because we changed!
New podcast alert! In season 2 episode 3, I hosted friends-of-the-show Benn Stancil and David Jayatillake where we discussed bundling-vs-unbundling, data lakes and file formats, and data app stores. The format is a bit different than our standard and I had a lot of fun doing it—curious to hear your thoughts.
Data Council took place this week in Austin, TX, and a lot of people came. Has data grown? Was everyone just itching to get together? Have everyone’s event stipends just been piling up over the past two years? Hard to say, but this was the biggest Data Council ever to my knowledge.
I didn’t go, but my cofounder Drew did. I asked him to write up a hot take for this week’s issue. Enjoy :)
Let’s just get it out there: Data Council was weird this year. There are a lot of takes already swirling around the Twittersphere spanning from concern to appreciation, but at the core of my experience — and a lot of other’s from what I could tell — was a sense that things had changed. Data Council has long billed itself as “the no-bullshit data conference”: it has historically been a conference of, by, and for practitioners featuring deeply technical talks on the state of the art in data. This year was... I don’t know... not that.
The resounding takeaway that I heard from people in the hallways and happy hours was: “There are a lot of vendors and VCs here.” I felt it too; it was palpable and ironic in a “you’re not sitting in traffic, you are the traffic” kind of way, but it was true nonetheless. I am of course affiliated with a vendor, but I’ve been to enough of these events (an attendee since 2017!) to know that this year was different than in years past. The historically deeply technical talks from hyper-growth companies have receded and given way to only-slightly veiled product pitches from data companies about A Very Hard And Important Problem which — by the way — their product is built from the ground-up to solve (and it looks like we’ll have time for a quick demo!). The question I find myself asking and struggling to answer is: is this a bad thing?
I don’t know. Kind of. Maybe not. It depends on how you look at it.
Was it good and right that the state of the art in data tooling has historically come from internal software built at FAANG-style companies? Maybe so if you like doing ETL with Jupyter notebooks, but maybe not if you’re in the 99% of companies that struggle with more fundamental challenges around working with data. I personally never found those kinds of talks super relatable and have always preferred to hear about tool-agnostic concepts that helped me see new points of view.
A lot has changed since the last Data Council. If I was in a room with folks like Claire, Emilie, Anna, Scott, Taylor, Max, and Julien in 2019, I’d be in a room full of practitioners. Today though, that same group is now composed of people building or investing in data tools. I don’t think I had internalized exactly how big or efficient the data-expert-to-data-startup pipeline had become, but it was hard to miss it at the event this week. I guess the thing that feels resonant to me upon reflection is: Data Council changed in part because we changed, and it’s helpful for me to keep that in mind.
Overall, I think it’s a net-positive that this new crop of data tooling is coming out of companies dedicated to solving well-defined and widely-experienced problems. These companies are led by industry vets who have put in the work and experienced these problems first-hand. The incentives are just better aligned than they were when this stuff was coming out of FAANG (and similar) companies, and I think it will result in more capable and useful data tools in the long-run.
I’ve thought about this a lot over the past couple of days and, ultimately, I think that I’m here for it. This version of Data Council — the one that is more of a trade-show for industry people to connect, bond, share, and explore ideas — feels super compelling. It is more or less what we experienced this week, and I personally found it helpful and instructive if not divergent from what I was expecting. If we leaned into this, I think we could drop the pretense that the event is exclusively for practitioners, make our product pitches a little bit less thinly-veiled, and spend more time collaborating on the future of our ecosystem together. There’s a way to do this that is still “no-bullshit” so to speak, but it would require a little bit of reframing and some different expectation setting to get right. I don’t know if this is something that Pete and the team would be excited about, but I’ll say that I’d be happy to do it again next year if it shook out that way.
Any way you slice it, it was just good to see folks in person again. I’m grateful for all the thoughtful conversations, inspiring visions, and new friends. Catch you at the next one!
From elsewhere on the internet…
💯! Could not match my personal experience more.
The updated version of 2020’s Emerging Architectures for Modern Data Infrastructure is fantastic! From mapping current architecture to diffing what exists today vs. what existed two years ago to musings on platforms—really a home run. I think the section on platform dynamics is particularly interesting and agree directionally with the authors’ take.
At least one counterpoint can be found in recent post Space Time Tradeoff by Rockset CPO Shruti Bhat. It’s a perfect explanation of why it’s either impossible or incredibly challenging to have a single data platform service all use cases. And “Friends don’t let friends build apps on warehouses” is not a bad tag line.
Spotify is moving off of Luigi…? I mean, this is not entirely surprising, but it also made me a little concerned that the Luigi install base (which I’m sure is non-zero) would be maintainer-less. Turns out I was both right and wrong 🤦:
Datakin, we barely knew you! Will Astronomer still advocate for OpenLineage? (My guess is yes…) The acquisition does seem to make a ton of sense—Astronomer would certainly benefit from a lineage offering given the heterogeneity of the Airflow DAG.
Amit Prakash writes about what AI will and won’t automate in the analytics workflow in the near future. He’s one of the best-positioned humans on the planet to know the answer to this, having worked on search at both MSFT and GOOG and as the CTO of Thoughtspot for the past decade. The details are worthwhile but here’s the TL;DR:
a lot of automation is making analysts 10X more powerful. The intellectual parts are not going away, but the elimination of rote tasks will enable analysts to do much more high-value work. It is also creating more room for the analyst to focus on impact and communication. In my experience, this often leads to faster promotion for analysts. As an analogy, we are not deploying self-driving cars which would eliminate cab driver jobs. We are upgrading horse carriages to automobiles, which means a lot less dealing with manure and going much faster to longer distances.
Could not agree more. It’s quite hard to apply ML to the insights work that data analysts do.
Hex raised a lot of money. Congrats team!
re_data is cool!
re_data is an open-source data reliability framework for the modern data stack. 😊Currently, re_data focuses on observing the dbt project (together with underlying data warehouse - Postgres, BigQuery, Snowflake, Redshift).
I clicked around in the demo and enjoyed myself. Would love to hear from you if you’re using it in your org!
Sarah Krasnik talking frozen yogurt:
If I told a friend we’re going to get froyo and brought them to a farm where they are expected to milk a cow, process the milk into yogurt, wait a few hours until it’s frozen and then flavor it, I’d look crazy.
The analogy actually works quite well. I haven’t thought this much about froyo in…ever. Best line:
Reduce the need for context by democratizing metrics instead of data.