HTAP Databases

Feb 13, 2022

Do we actually need so many different databases? Or can we shove them all into a single cloud infrastructure and behind the same SQL API?

Read →

1 Comment

Shalabh Chaturvedi

Mar 13, 2022Edited

> we can combine these disparate systems and unify them under a common interface for consumers

I strongly agree with this perspective. Any query provided by a consumer first starts as a question in their mind, expressed in the "language of the business" and not in the language of databases. Certainly they don't start by thinking "ah, I think I'd like to query our olap database X and write some sql for table T1 and view V2". It would be more like "I wonder which weekday we produce the largest number of widgets in the US". Then this idea is translated manually into some kind of a high level query plan, ie which db and tables to query. This part of the translation from business semantics to databases and tables is done manually - but the high level query is never recorded!

I think the key idea that's kind of floating around is to express the query as two distinct parts:

1. the topology-independent query - does not refer to dbs or tables or view, just to some high level semantic model

2. an elaboration expression that maps query 1 above to an actual query on topological objects

If the system maintains some semantic metadata, #2 doesn't have to be fully manually specified but can be somewhat guided. For example, given a query of type 1, the system can show various db/table/view options to execute the query and the person can pick the one they think makes sense. The main advantage here I think is de-coupling expression #1 from your implementation details. If you add a new db later, or even normalize/denormalize your existing database, you just modify expression 2, while expression 1 stays intact. It seems better to separate the business-language expression from the topological-implementation expression?

I'm not sure this means that there is only one database. However I think this implies there is one high level topology independent data model and query language, and multiple implementation databases that this can be mapped to.

Anyway, this is all "in theory", but quite interesting IMO. I often feel most of our logic is coupled to too many implementation details, while it should only be coupled to a higher level semantic model.

Expand full comment

The Analytics Engineering Roundup

HTAP Databases