Data "downtime"

Anna Filippova

Mar 5, 2023

or why I can't wait to be able to declare data incidents

Read →

3 Comments

Kyle Kirwan

Mar 13, 2023

We did your Option #2 model at Uber a few years ago and it was definitely helpful.

Here was the napkin-architecture:

The query-and-dashboard tool (Querybuild/Dashbuilder) talked to the data catalog (Databook) and pulled in metadata about the tables you were working with: schema, column types, etc. and let you easily open up the data catalog (Databook) to see more about them.

The data quality testing harness (Trust) would publish test pass/fail statuses into Databook. So when you were searching for / reading about the tables you planned to use in your query, you could see how many tests were on it, and whether they were passing or failing. From the lineage graph you could see if any parent tables were failing, all the way up to the original Kafka topics / MySQL snapshots.

This way you could create tests in the test harness, see tests statuses in the catalog, and expose a very summarized "healthy / not healthy" type indicator right in the query editor. We never got the warnings pushed into the dashboards, at least while I was on the team, but it was an obvious next step.

We also cloned the existing incident management tool (Commander) but the things you need to do during incident management for a data pipeline problem turn out not to be wildly different from the usual kind, so it wasn't a great use of effort. The main leverage for "Data Commander" came from using our lineage graph + query logs to automate communications to downstream affected users without the incident-responder needing to know who they were manually.

Expand full comment

Reply (1)

Anna Filippova

Apr 3, 2023

Yesssss this is spot on! We implemented a very similar architecture at a past role at one point, and it changed my life.

Expand full comment

Dominik Ruf

Mar 6, 2023

Related to your questions about data quality warnings:

I would say your "option 2" kind of already exists e.g. in Tableau as data quality warnings:

https://help.tableau.com/current/online/en-us/dm_dqw.htm

I agree with you that it should exist there in the stack where it directly reaches the users. So there is no final answer, but it depends on the specific setup of each company how users interact with data.

E.g. in a self-service environment based on Tableau I think the Tableau solution is pretty good. If there are more users closer to the SQL world, maybe even decentralized users, then having similar data quality warning mechanisms available in your SQL editor like dbt cloud may be a nice feature, too. Similar to Tableau I could imagine options to either manually set such warnings or have them automatically created e.g. based on rules on how to react to certain test results.

Expand full comment

The Analytics Engineering Roundup

Data "downtime"