Keep Data Council Weird

Data Council changed in part because we changed!

Mar 27, 2022

New podcast alert! In season 2 episode 3, I hosted friends-of-the-show Benn Stancil and David Jayatillake where we discussed bundling-vs-unbundling, data lakes and file formats, and data app stores. The format is a bit different than our standard and I had a lot of fun doing it—curious to hear your thoughts.

- Tristan

Data Council took place this week in Austin, TX, and a lot of people came. Has data grown? Was everyone just itching to get together? Have everyone’s event stipends just been piling up over the past two years? Hard to say, but this was the biggest Data Council ever to my knowledge.

I didn’t go, but my cofounder Drew did. I asked him to write up a hot take for this week’s issue. Enjoy :)

Let’s just get it out there: Data Council was weird this year. There are a lot of takes already swirling around the Twittersphere spanning from concern to appreciation, but at the core of my experience — and a lot of other’s from what I could tell — was a sense that things had changed. Data Council has long billed itself as “the no-bullshit data conference”: it has historically been a conference of, by, and for practitioners featuring deeply technical talks on the state of the art in data. This year was... I don’t know... not that.

The resounding takeaway that I heard from people in the hallways and happy hours was: “There are a lot of vendors and VCs here.” I felt it too; it was palpable and ironic in a “you’re not sitting in traffic, you are the traffic” kind of way, but it was true nonetheless. I am of course affiliated with a vendor, but I’ve been to enough of these events (an attendee since 2017!) to know that this year was different than in years past. The historically deeply technical talks from hyper-growth companies have receded and given way to only-slightly veiled product pitches from data companies about A Very Hard And Important Problem which — by the way — their product is built from the ground-up to solve (and it looks like we’ll have time for a quick demo!). The question I find myself asking and struggling to answer is: is this a bad thing?

I don’t know. Kind of. Maybe not. It depends on how you look at it.

Was it good and right that the state of the art in data tooling has historically come from internal software built at FAANG-style companies? Maybe so if you like doing ETL with Jupyter notebooks, but maybe not if you’re in the 99% of companies that struggle with more fundamental challenges around working with data. I personally never found those kinds of talks super relatable and have always preferred to hear about tool-agnostic concepts that helped me see new points of view.

A lot has changed since the last Data Council. If I was in a room with folks like Claire, Emilie, Anna, Scott, Taylor, Max, and Julien in 2019, I’d be in a room full of practitioners. Today though, that same group is now composed of people building or investing in data tools. I don’t think I had internalized exactly how big or efficient the data-expert-to-data-startup pipeline had become, but it was hard to miss it at the event this week. I guess the thing that feels resonant to me upon reflection is: Data Council changed in part because we changed, and it’s helpful for me to keep that in mind.

Overall, I think it’s a net-positive that this new crop of data tooling is coming out of companies dedicated to solving well-defined and widely-experienced problems. These companies are led by industry vets who have put in the work and experienced these problems first-hand. The incentives are just better aligned than they were when this stuff was coming out of FAANG (and similar) companies, and I think it will result in more capable and useful data tools in the long-run.

I’ve thought about this a lot over the past couple of days and, ultimately, I think that I’m here for it. This version of Data Council — the one that is more of a trade-show for industry people to connect, bond, share, and explore ideas — feels super compelling. It is more or less what we experienced this week, and I personally found it helpful and instructive if not divergent from what I was expecting. If we leaned into this, I think we could drop the pretense that the event is exclusively for practitioners, make our product pitches a little bit less thinly-veiled, and spend more time collaborating on the future of our ecosystem together. There’s a way to do this that is still “no-bullshit” so to speak, but it would require a little bit of reframing and some different expectation setting to get right. I don’t know if this is something that Pete and the team would be excited about, but I’ll say that I’d be happy to do it again next year if it shook out that way.

Any way you slice it, it was just good to see folks in person again. I’m grateful for all the thoughtful conversations, inspiring visions, and new friends. Catch you at the next one!

- Drew

From elsewhere on the internet…

Janessa Lantz @janessalantz

"data driven decisions" what we imagine: ✨insights✨ that equal 🚀obvious action🚀 what it feels like: look at dashboard every day for 3 months and one day make a sort of "huh" sound and something shifts in your thinking and you start to talk about it

💯! Could not match my personal experience more.

The updated version of 2020’s Emerging Architectures for Modern Data Infrastructure is fantastic! From mapping current architecture to diffing what exists today vs. what existed two years ago to musings on platforms—really a home run. I think the section on platform dynamics is particularly interesting and agree directionally with the authors’ take.

At least one counterpoint can be found in recent post Space Time Tradeoff by Rockset CPO Shruti Bhat. It’s a perfect explanation of why it’s either impossible or incredibly challenging to have a single data platform service all use cases. And “Friends don’t let friends build apps on warehouses” is not a bad tag line.

Spotify is moving off of Luigi…? I mean, this is not entirely surprising, but it also made me a little concerned that the Luigi install base (which I’m sure is non-zero) would be maintainer-less. Turns out I was both right and wrong 🤦:

Erik Bernhardsson @bernhardsson

@jthandy No one has maintained it for the last 6 years so I don’t think it matters much in practice

Datakin, we barely knew you! Will Astronomer still advocate for OpenLineage? (My guess is yes…) The acquisition does seem to make a ton of sense—Astronomer would certainly benefit from a lineage offering given the heterogeneity of the Airflow DAG.

Amit Prakash writes about what AI will and won’t automate in the analytics workflow in the near future. He’s one of the best-positioned humans on the planet to know the answer to this, having worked on search at both MSFT and GOOG and as the CTO of Thoughtspot for the past decade. The details are worthwhile but here’s the TL;DR:

a lot of automation is making analysts 10X more powerful. The intellectual parts are not going away, but the elimination of rote tasks will enable analysts to do much more high-value work. It is also creating more room for the analyst to focus on impact and communication. In my experience, this often leads to faster promotion for analysts. As an analogy, we are not deploying self-driving cars which would eliminate cab driver jobs. We are upgrading horse carriages to automobiles, which means a lot less dealing with manure and going much faster to longer distances.

Could not agree more. It’s quite hard to apply ML to the insights work that data analysts do.

Hex raised a lot of money. Congrats team!

re_data is cool!

re_data is an open-source data reliability framework for the modern data stack. 😊Currently, re_data focuses on observing the dbt project (together with underlying data warehouse - Postgres, BigQuery, Snowflake, Redshift).

I clicked around in the demo and enjoyed myself. Would love to hear from you if you’re using it in your org!

Sarah Krasnik talking frozen yogurt:

If I told a friend we’re going to get froyo and brought them to a farm where they are expected to milk a cow, process the milk into yogurt, wait a few hours until it’s frozen and then flavor it, I’d look crazy.

The analogy actually works quite well. I haven’t thought this much about froyo in…ever. Best line:

Reduce the need for context by democratizing metrics instead of data.

Stephen McDaniel

Mar 29, 2022Edited

Hi Tristan and Drew. It is fascinating how the problems remain the same, but the tools are much nicer and the costs are down 100x over the past two decades! Connecting people, gaining trust and building a culture is critical to achieving ROI! A culture I learned the most from was Netflix, "it isn't wrong to be wrong, it is ONLY wrong to NOT learn and take action!"

I worked at Brio Technology in the 1990s (the easiest BI tool of its day), Loudcloud, SAS, and Netflix in the 2000s, and Tableau in the 2010s. The core problems remain unchanged. Connecting data scientists, analysts, and data engineers to the critical issues that possess the potential to produce high ROI. Even more important, connecting them and building trust in the teams of the business that WILL actually act upon insights and iteratively build upon successes and failures.

So, quality data must be collected, socialized, enhanced, analyzed, and optimized. This is the process that is needed. But at those "99% companies," it is often the need to build a culture that can admit/accept being "wrong" and will actively implement changes using the data insights gathered.

Ironically, vendor sales efforts are often key to helping achieve liftoff in these cultural shifts. Why? When executives hear the sales pitch and invest the dollars, it can signal to the line of business managers to use the new systems to drive change. By forcing the new approach, it enables team members to become part of the process as tools designed to be accessible by line-of-business analysts along with advanced data scientists and engineers can help build trust and collaboration.

I appreciate the openness and sharing some of the Data Council highlights!

Expand full comment

Matt Arderne

Great snippet on what went down, thanks Drew

> Today though, that same group is now composed of people building or investing in data tools.

As you say, this benefits the broader community, but at the expense of the core community[1], which will have to be considered effectively disbanded due to new allegiances which are probably on average less encouraging of vulnerability.

I guess this is what maturity looks like for the scene, and thus the "scenius"[2] is finished - does this mean the tech and process in this space is reaching its approximate final form?

1- dare I say OG? I only recognise some of those names lol

2- https://perell.com/fellowship/conjuring-scenius/

2 replies

2 more comments...

The Analytics Engineering Roundup

Discussion about this post