Discover more from The Analytics Engineering Roundup
LookML + Tableau. The metrics layer. Don't blame the customer.
Whew! What a week! It's a fun time for the modern data stack right now: Looker announced a partnership with Tableau (!), The Future Data Conference painted the (ahem) future of modern data experience, and the conversation around the next layer of the modern data stack is heating up! So put your feet up, grab a cuppa ☕ and get comfy.
Why LookML is a big deal
Many of us have cut our teeth on LookML, an early entry into the semantic layer of the modern data stack. On the surface, the value proposition of Looker in the burgeoning modern data stack was that it helped lower barriers to data access in organizations. And it did this by allowing a data team to define models that business users could explore safely via a graphical interface. Dashboards were never the intended products for data teams managing Looker instances. Instead, data teams produced models, explores and interactive "applications" (dashboards that could be filtered, sliced, drilled down etc.)
And yet, LookML replacing SQL wasn't the real industry game changer. Not on its own. If your data organization spent a lot of time modeling data in the warehouse to optimize performance, storage and/or cost, adding a LookML layer was yet another thing that needed to be maintained, and actually slowed down moving from question to insight. If a business user needed to make a metric available that didn't yet exist in LookML it often required working with an analytics team to expand the LookML layer and working with a data engineering team to update the data model at the warehouse layer.
LookML (and by extension Looker) was most powerful when used with modern cloud lakehouses like Snowflake, BigQuery and Redshift because it allowed you to do most of your data transformations in LookML, provided you've already done your EL. Suddenly, transformation has moved up the stack. Analysts were more productive. Data models were more nible, and kept better pace with the business. And when performance and cost were an issue, Looker allowed you to use Persistent Derived Tables to write materialized versions of your most important models to the warehouse, with smart refresh intervals. LookML helped create cloud warehouse demand. It was just… easier.
Google's acquisition of Looker and its integration into the GCP stack made a lot of sense in this context in 2019 — it lowered barriers to entry for data developers, and opened up a new and growing audience for GCP: the data analyst. The partnership with Tableau is just a reinforcement of something that was already obvious to Google more than two years prior — the value of Looker lies in LookML:
Looker choosing to partner with Tableau makes particular sense because Looker’s always been a transformation tool first, and a consumption tool second. Looker’s crown jewel is LookML; Tableau’s is visualization. I’d speculate that Looker originally built its visualization tooling in large part so that they could market and sell the value of LookML, rather than the other way around. By launching this integration, Looker is simply doubling down on that long-standing identity. -Benn Stancil
Is Google BI Dead?
So, is Google exiting the BI layer game entirely? Not yet. For now, it looks like you need to have both a Tableau and a Looker license to play. But the current direction is actually pretty consistent with GCP strategy more broadly. GCP is a set of cloud development tools. Looker is very much a developer tool first because of LookML, and a BI user experience second. Partnering with Tableau is one step closer to making the GCP data developer stack an industry standard via LookML and making it easier to walk into a customer call and say “porque no los dos?” .
How far LookML can go as a standard remains to be seen, though. Even though Looker was early to the semantic layer party, the LookML developer experience is still maturing. Data testing capability exists, but it is nascent. Robust CI/CD is a third party, open source extension (👋 Spectacles crew 😘). Metrics aren't reusable across models, so it's not a true metrics layer (yet!). Looker’s semantic layer itself conflates SQL and LookML in both table and metric definitions, which makes development and debugging messier.
And finally, LookML is proprietary. Expressing the entirety of your business logic in LookML locks you into Google's Cloud Platform, which is perhaps exactly what Google wants 😉
For more hot takes on this announcement:
Match made in heaven? (Emre Semercioglu)
BI is dead (Benn Stancil)
This thread in the dbt Community Slack
Also this week, our friends over at Sisu hosted their annual Future Data Conference. October is definitely conference season, so ICYMI, there were several talks worth catching up on. Not only are they very very good individually, but they speak to each other (sometimes very directly!) in ways that are delightful at a single track event. Some highlights below.
I haven't laughed out loud at a conference talk in a long time, that is, until Benn decided to mock up what it would feel like if one were to build the Yelpuser experience the same way we build internal data products today. I won't spoil it, except to say we, career dashboard-ers, definitely deserved it. 🔪
If you take away nothing else from Benn's talk but one thing, it's this:
don't 👏 blame 👏 the 👏 customer.
Yes, building the muscle to ask good questions of your data is important regardless of your role. And yes, learning SQL will probably go a long way towards enabling you to work with data today, even if you don’t directly work on a data team. Everyone in an organization should be able to make data informed decisions and contribute to the evolution of knowledge at an organization... but not everyone should have to be an experienced data analyst to be a part of the conversation. And that's very much what internal data products expect of business users today. Of course our customers aren't using them!
Benn also reminds us that the products of our work on (with? in?) data are decisions. Whether we are building a recommender system, or preparing for our next monthly business review, our goal as data professionals is to facilitate decision making. And yet, we do a much better job with providing an intuitive user experience today when our data products are designed to help external customers make decisions: Yelp for restaurant recommendations, Booking.com for hotel recommendations, Google Maps for navigation around traffic jams, and so on.
Why is that exactly?
❓Is it because.... it's easier (today) to get investment and resources if you are building external customer facing features rather than internal ones?
❓Is it because... the industry building these external customer facing features is much more mature in terms of know-how, talent pool and software engineering frameworks being used? (TIL that Yelp was founded in 2004 and Booking.com in 1996!)
❓Is it because... the kind of decisions that we need to help facilitate for our internal business customers are more complex? Require more human intervention and rigorous data expertise?
❓Is it because... the costs of a false positive or a false negative when making important business decisions are very different from the cost to a consumer of choosing a suboptimal route home, or the second best taco place? And if yes, where else do we draw our inspiration?
Kill your darlings.
One of the fun things about having a second author on this Roundup, is that now, someone can write about Tristan's writing 🥸 Or in this case, speaking. In a great complement to Benn's talk immediately prior, Tristan this week painted a picture of what the future of the modern data developer experience could look like. And that future looks a lot like ⌨️ code ⌨️
Configuration as code is a road well traveled in DevOps circles because it enables idempotence, which is a fancy word for having the ability to run the same code or configuration multiple times without changing the result beyond the first run. I won't rehash the talk or last week's Roundup in great detail, except to say that idempotency is important in data pipelines because sometimes ... ☠️ they fail ☠️. Just like your http requests may fail intermittently if you've got bad wifi. 📶 Or like writing to disk may fail if there's a bad sector. 💽 Except with data applications, its kind of extra important that you don't accidentally write 1B extra rows a day if you have to restart your pipeline 👀.
Kill your darlings means idempotency at many different layers of the data stack: working on branches that you can trash, stash or squash regardless of if you're working on configuring your data infrastructure, building a metric or configuring a dashboard; it also means the freedom to confidently delete legacy transformation code, and reduce the number of times you repeat yourself in your code by building and safely testing refactors and new abstractions.
Although we're moving in the direction of this being the default data development experience, we're not all the way there yet. A few things we're still missing:
🔁 Standardizing inputs and outputs at different layers of the data stack. Today we already have an ecosystem of ingestion apps that produce standardized, predictable outputs. Soon, we'll have the ability to standardize metrics across a data organization's codebase. Now, we just need the connective tissue between these layers to mature so that Analytics Engineers can stop writing the same transformations that produce these common metrics like retention and ARR, over and over again for every new team at a company, or at every new job.This is exactly why metadata can’t be siloed and needs to be centralised into a central, metadata lake which can power use cases like data catalogs/ discovery/ governance/ observability +++Can we pay attention to the fact that between @AtlanHQ, @montecarlodata, @okerainc, @dbt_labs, @SnowflakeDB, @LookerData, etc we ended up with a multitude of places where we can tag/describe models/tables and no centralised way of managing metadata?alexandre carvalho @nervokid
👀 A rigorous audit trail. Right now, our answer to being able to keep a data audit trail is not only a git log of all of the transformations to the data, it is also to keep a copy of all of the data. For all time. Forever. So if a core company metric like ARR is misstated, it's possible to go back and figure out: when?, by how much?, why? and what other reports or decisions were affected by this?.
And yes, storage is cheap but remember we also care about idempotence. So rebuilding our entire data pipeline might mean recreating a table with years worth of records if you find a logical error somewhere far enough upstream in your data model. Imagine software engineers having to reconstruct your application's production database from a series of event logs every time they deployed a bug fix to production. 🥜!
What will it take before we can start trusting our data pipelines and applications enough and stop hoarding every bit of data ever emitted about the business? We're used to thinking of logging as an input into the data modeling process. What does an audit log of the data modeling process look like?
Other great talks from the day:
Keeping up with the proliferation of tools in the modern data stack. (Veronika Durgin)
Why it's hard to manage data like a product. (Eric Weber)
Elsewhere on the internet…
Mr. Ben, riffing on our very own Jason Ganz, poignantly sums up why Analytics Engineering is such an exciting space to be in right now:
Just to double down on the point, because it’s so important, “The fun thing about analytics engineering is it’s a discipline we’re all collaboratively building together.”
No company owns the definition for an Analytics Engineer. There is no certification handed down from a board of experts before a data professional can start practicing their craft. Yet, a vibrant community has formed around analytic work, welcoming your ideas, and weaving them into the larger narrative.
A new on-demand dbt Learn course has just dropped courtesy of Christine Berger and Pat Kearns: Refactoring SQL for modularity. Can’t think of better timing for this as we, as an industry, start to think about what are all the other nodes that should exist in our DAGs. 👏
Meltano (who spun out of GitLab, so congrats on the IPO folx!) have just launched a new home for Singer connectors and Meltano plugins… and it’s going to live on GitHub! 🙂
Also, low key love that Yelp was a focal example in a talk directly after a talk from the Head of Data & Experimentation at Yelp 👏