How big is this wave?
How much more stuff will be built on top of the modern data stack? And how will that impact each of us as practitioners?
New podcast episode! As a product leader at companies like Heroku and Zendesk, DeVaris Brown specialized in building infrastructure-grade products. Currently, as the CEO of Meroxa, he helps enable data teams to build real-time data infrastructure with the same ease as we now take for granted in batch.
Get it here. And enjoy the issue!
- Tristan
Thinking through my Coalesce talk.
This weekend I’m in a Coalesce state of mind…we just passed 10k registrants and I’m hosting talk #3 of the event. It’s game time.
My session is called “How Big is this Wave?” I originally chose the title months ago, and here’s the blurb I wrote then:
The modern data stack is the third generation of data analysis products to come to prominence since the 90's. The prior waves—data warehouse appliances and then Hadoop—were both big steps forwards but ultimately failed to live up to their initial promise.
Is the modern data stack just another iteration in a long string of “trendy technologies” in data––waves that crash upon the shore but ultimately recede? Or is it somehow more permanent?
The answer to this question drives how we think about the future—how much we invest in skill acquisition and our own career paths, how companies think about investing in both technology investments (easy) and organizational change (much harder).
The goal of the session is really to open the conference up with the biggest possible frame, so I think Martin and I will end up posing a lot of questions that we will certainly fail to answer. But I hope that it’ll achieve its core goal and widen your aperture—where is this industry headed?
Here’s where I’ve gotten to while mulling over this topic in the past few months. The modern data stack is fundamentally about a new generation of infrastructure (the cloud data warehouse, ingestion, and transformation). And infrastructure becomes more critical as more things get built upon it. Permanently disable the electric grid in 1920? Life goes on. 100 years later? Societal collapse.
So: the question, for me, has sharpened into “What will ultimately be built on top of this infrastructure?” From what I can tell, there are really three places to go beyond the core BI use case: more pervasive analytics, data science/ML, and application development. Here’s how each one of those questions shows up for me:
Pervasive analytics: With this new infrastructure, we’ve mostly rebuilt existing analytical experiences on top of it. This is typical with infrastructure shifts: first the industry rebuilds known experiences, then eventually shifts into building fundamentally new things. Can we, over the coming decade, build ways of pushing data out into the edges of an organization? And doing so while maintaining trust, reliability, and context?
Data science / ML: How will the data science and analytics worlds start to blend together? Today, many years into our current epoch in data tech, these two worlds (tools, humans, workflows) remain persistently separate. Who will be responsible for bridging the gap? When will it happen?
Application development: To what extent will the modern data stack move from being primarily focused on internal analytics to being a target for application development? Could Snowflake ever become a multi-modal database / actually be an excellent OLTP database too? If that happened, what if all SaaS products began writing their core data to it? Tom Tunguz wrote about this very effectively early this year.
This whole topic is fascinating to me because it ties directly back to the empowerment of data analysts. The modern data stack is already empowering to us in so many ways, but as this S-curve continues to gather steam we may find that we’re still just at the beginning of the impact we can have on our organizations.
I’m looking forward to the conversation, and plan to hang out in Slack afterwards. Hope to see you there :)
From elsewhere on the internet…
⌨️ A very thoughtful post on the future of development environments. The core idea is: what if we offload our local development environment to a remote IDE? There are cost savings, device independence and flexibility, and more advantages. There are also downsides.
What seems to be assumed in this take, and the thing I’d like to challenge, is the idea that your development environment needs to do much heavy lifting. Rather than take your entire IDE and ship it to the cloud to solve these problems, why not just ship the actual computation to the cloud via APIs? Whether that’s K8S or SQL or Spark or whatever… I still feel like the local command line can/should be an important part of the development experience. Just don’t do the data processing locally.
⚡ Fantastic new Erik Bernhardsson post on the evolution of cloud infrastructure over the coming decade+. This is upstream of analytics engineering but impacts our world significantly.
📊 Benn’s weekly post talks about the critical importance of dashboards for a company’s ability to understand itself. I really agree. My weekly dashboard review practice is the single most important hour in my week (Monday @ 8am). There are a lot of cynical data professionals who snipe at dashboards (and certainly, they are used in plenty of problematic ways), but it’s hard to understate their value when used well. No one wants to go back to 1x/week .pptx email distributions of metrics updates.
⚫ Erika Pullum talks about how to configure your dotfiles.
I really agree with this, although I have to admit that I’m someone who has dramatically under-invested here. Drew always used to watch my terminal workflow and cringe. 🤷
🕵️ Fantastic work from the Monzo team rolling a column-level lineage tool using dbt + ZetaSQL.
📄 Claire Carroll on entropy.
☁️ Redshift goes fully serverless. While I am fully, 100% supportive of this move, I do wonder how impactful it’ll be at this stage of the market. And current Redshift users already had access to some level of elasticity via autoscaling. Regardless, this is an important development and one to keep an eye on.