3 Comments

Hey! Thanks for the post - how the heck do you keep track of all these articles?? Lol

The thing that most resonated with me is at the end of your post:

“The problem becomes neither the technology nor its application, but its lineage: its ancestry, context, reliability, fidelity, centrality. Those who work in this area learn that data is never wrong — only misunderstood.”

On this topic, I think that something data professionals get wrong, and therefore everyone who uses their data gets wrong, is the different between your code working and your business logic making sense. As in, just because the query works doesn’t mean it’s logically right. And while this may seem super obvious, it’s the reason y’all created dbt metrics - code worked for two different people, but the logic was different.

One thing that I’m curious about in the future it more extensive dbt testing. Writing tests that test the logic of interrelated models, not just if the columns are unique/not null. Or maybe it’s a standard set of logical questions that get asked for each model - how many rows are expected? How did you get the expected number of rows? What are the relationships of the joins and how can we prove the results are intended?

Overall, when comparing an analytics engineer to a software engineer, the AEs test coverage seems super... bare.

Anyways, thanks again for the post <3

Expand full comment
author

Appreciate the comment! Totally agree, the tests that dbt ships with are fairly straightforward. Have you checked out dbt expectations? It's a dbt package that provides a *lot* more types of tests.

https://github.com/calogica/dbt-expectations

It's pretty awesome and very easy to install via dbt's package manager.

Expand full comment

I haven’t! Checking it out now - appreciate it

Expand full comment