4 Comments

While not “solving the metadata problem,” I think that a few basic standards can enable BI and other tools to leverage dbt as a full semantic layer. I put together a dbt project to show how this would work: https://github.com/flexanalytics/dbt-business-intelligence

In short, it leverages dbt as a semantic layer to define metrics, dimensions, aggregations, calculations, data relationships, business-friendly names and descriptions, synonyms, formatting and more. Then, BI tools can just plug in without needing to create a metadata/semantic model. “Semantic-free BI”:

https://towardsdatascience.com/semantic-free-is-the-future-of-business-intelligence-27aae1d11563

One big problem is that standards require buy-in from big name vendors to take off, but big name vendors don’t necessarily want to buy in because then you can more easily switch to another vendor. Another problem is that these standards are not “baked in” to dbt, and perhaps they shouldn’t be, but this forces more non-standard usage of dbt’s ‘meta’ tag. Also, the metadata needs to be more “active” with the ability for many roles to update catalogues in a friendly UI (not just YAML).

Expand full comment

> Maybe the pattern here looks something like...

Sounds a lot like the "emergent layer" pattern! Where something scarce suddenly becomes abundant, solving one set of problems and moving us on to a new set of constraints. https://medium.com/swlh/emergent-layers-chapter-1-scarcity-abstraction-abundance-5705666e4f15

Expand full comment

Whoah this is awesome. Thanks for the link!!

Expand full comment

> If we knew a priori that this were going to happen via standards in an open ecosystem, what’s the most likely way in which that would have come to pass?

I still think the best path for this is someone comes along and tries to do it directly https://benn.substack.com/p/metadata-money-corporation#:~:text=The%20data%20stack%20could%20use%20a%20similar%20switchboard.

Most standards have some cold start problem, where they're not valuable until they're widely used. But I think a company (or open source thing) could say, we'll do the legwork to read from data tools' APIs. If you want to know what's going on in those tools, you can either go build integrations yourself, or connect to our API. Even if we're the only customer of that tool, it's valuable. But the more people who use it, and the more people who connect into it, the more valuable it becomes.

Expand full comment