Lots going on!! Metrics, Malloy, Sanity Checks, CTEs, and Making Good Decisions.

Oct 24, 2021

It’s official: after a couple months of taking turns publishing every other week, Anna’s stats are officially better than mine! 😢

Seriously, though, it’s been exciting for me to add a new voice and see your response. I look forward to figuring out how to further expand the tent in coming months and years. Thoughts welcome.

New podcast episode! One of my favorite humans in data, Benn Stancil of Mode, joined Julia and me. Benn and I have been having versions of this conversation for ~7 years now and it was fun to get one on the record. Get it here.

Enjoy the issue!

- Tristan

From around the internet

Anna and I have both pushed the format of this newsletter in recent months towards one long topic plus several short bits. This week I’m gonna skip the deep dive because…well…there’s just too much interesting stuff going on. Here are a bunch of things you should be aware of in no particular order.

😰 My favorite tweet of the week:

Katie Bauer @imightbemary

The most intimidating part of doing data analysis on something that's never been explored before is that you're automatically in an unsupervised setting, and with no ground truth it's difficult to tell if what you've found is brilliant, idiotic or maybe just not that interesting

@Katie: I super identify with this. This is one of the great values that long working relationships have, IMO…when I run across something like this I always call Drew and ask him to sanity check me before showing it to anyone else. Having someone who roughly shares your priors but can pass your new thought through their brain is so important.

Do you have a sanity-check-buddy?

🍽️ Ben repurposed his recent Future Data talk into a newsletter, and it’s fantastic. The core of it:

The secret, I believe, is in the subtlety. We don’t immediately notice how much we use data in products like Yelp, Google, and Resy because data isn’t detached from the rest of the experience. These services don’t attempt to make you “data-driven.” Instead, they focus on a bigger problem—choosing a restaurant, a means of transit, and a reservation booking—and integrate data alongside other features, like phone numbers, pictures of food, and links to menu, that help you address it. It’s all a single experience, with no clear line where the product ends and the data begins.

For all Benn’s previous high-fallutin’ talk about self-service, this is a concrete answer to how data will migrate to the edges of the org, and I think it’s a very compelling one. This is highly aligned with my own thinking around data products and how there is no “one right way” to analyze data.

🚚 Zapier just released a bulk data movement product called Transfer. Theoretically, this could compete with products like Census and Hightouch. Will it, though, in practice? I have a hard time imagining that Zapier is going to be an ideal fit to pipe million»billion row tables from place to place. 🤷 happy to be wrong about this, but I just have a hard time seeing it.

💰 Hex, one of my favorite new products in the MDS, raised an A and announced an integration with dbt.

⏭️ This could be kind of a big deal:

lloyd tabb @lloydtabb

We've been working on, Malloy, a new experimental data language. If you work in SQL, we'd love your feedback.

github.comGitHub - looker-open-source/malloyContribute to looker-open-source/malloy development by creating an account on GitHub.

I have so much to say about this but will keep it brief for the moment. The short version of why this matters: I think SQL is like HTML/CSS: a declarative language to express the desired outcome in some particular domain. It doesn’t say how to process data, only what the result should look like. Languages like this tend to be verbose and ugly for humans to write, but they have wonderful properties insofar as they can be extremely exact (can be reasoned about mathematically), are amenable to standardization, and can abstract away a lot of complexity.

One of the first ways that industry attempted to solve the programming unpleasantness in HTML/CSS was introducing templating. In web programming, that happened in the late 90’s, early 2000’s. This is what dbt introduced in 2016 and is where the SQL industry still is.

The next step in the HTML/CSS stack was abstracting away the underlying declarative code altogether in more powerful frameworks like Angular and then React. It seems like (from the outside, I’m not a front-end developer) React has become a fairly stable point along the trajectory that practitioners are quite happy with.

SQL hasn’t gotten to this place yet. There is no “React of SQL”. I do not yet have strong opinions on specifically what Lloyd and team have built with Malloy, but I think they are asking and answering exactly the right question. I think if there is a React of SQL, it could potentially look something like this.

If you get a chance to download and play around with Malloy, please let me know how you like it!

🧳 JP Monteiro is interested in this trend and proposes that Malloy is an admission that folks at Looker feel the squeeze that their proprietary bundle of functionality (dimensional modeling + metrics modeling + dashboarding) is being unbundled. This may or may not be true—it’s certainly an interesting market dynamics question but totally orthogonal to the ultimate usefulness questions around Malloy. (The industry needs this type of abstraction regardless of who builds it!)

There’s a lot of good stuff in JP’s post and much of it touches on dbt’s role in the ecosystem—”How far do we go with dbt? Where do we draw its boundaries?” I have a lot to say about that but will hold off for the moment.

🔄 It made me so happy to see someone go this deep on researching the impacts of CTEs on Snowflake explain plans. This is actually the post that got the internal conversation started on the topic of the expressiveness of SQL (and then Malloy). We try to write readable SQL using our style guide, but what happens when the style guide comes into conflict with the optimizer? Really, there should be an ability to separate ergonomics from performance and optimization. Right now these two things are too tightly coupled, living little room for choice on the part of practitioners.

📈 More on metrics from Robert Yi. A lot to agree with:

…metrics and transformations are philosophically quite elegantly aligned. Both basically enable you to DRY (don't repeat yourself) in SQL. Never transform twice. Never define business logic twice.

Agree. Metric modeling and dimensional modeling are both modalities of data transformation. Dimensional transformations occurs before an analytical query is issued. Metric transformations must occur in response to an analytical query.

💭 This newsletter has been a lot of inside baseball, a lot of people opining about the future. Here’s something tangible that I’ve gotten a lot of value from in the past month: How we make decisions at Coinbase. It’s a post from 2018, but it’s resurfaced for me as the team at dbt Labs has been growing and we’ve been forced to contend with some important decisions of late. We’ve evolved our own internal process to look at lot like the recommendations from this post and have been very happy with the results.

Your job as a data practitioner is to facilitate the making of strategic decisions at your organization. Given that, it’s likely important for your career that you not only know how to surface relevant data but that you can shepherd the overall decision-making process forwards. Success == decisions made, not data delivered.

The Analytics Engineering Roundup

Discussion about this post