I briefly scanned your GitHub repo, and it seeemst o me that you adopted the KG but you still use SQL query as the output. Unlike the original paper, where it uses KG and produces SPARQL as output. I suppose the key is the KG. Not so much whether SQL or SPARQL is used right?

Expand full comment

That's exactly right - we produced SQL which is how the dbt Semantic Layer interacts with data platforms. The great things about this benchmark is its agnostic to the querying language - it's validating correctness of results. Our findings confirmed the original paper that layering semantic information into the natural language query yields superior results.

Expand full comment

Interestinggggggg....so if I'm following this right, there are three prompts between y'alls tests and Juan et al's paper:

1. "Answer my question, and here's some DDL that defines a (relatively simple) schema. Write a SQL query. Good luck."

2. "Answer my question, and here's some knowledge graph ontology in OWL. Write me some SPARQL."

3. "Answer my question. First, here's some details about how to write a metric query using this SQL-like thing from dbt. Second, here are the current metrics and their associated dimensions available in this dbt project. Write a query in that SQL-like thing I just taught you."

And the results are...

- Approach 1 is trash. Don't do approach 1.

- Approach 2 is like, pretty good? Not great on the hard stuff, but better?

- Approach 3 is also pretty good? Though somewhat limited, in that it can only answer questions that are metric-ified (ie, it's not going to work for a question like, "List all the claims that were filed by Peyton Manning").

Is that...right? There are so many different approaches to how to do this; I do wonder if we eventually settle on some standard for the right series of steps to go through here...

Expand full comment

Yes that's basically correct! We directly replicated the SQL control. SQL and SPARQL are already known concepts for gpt-4, so they get the tidiness of zero-shot prompts. Its current knowledge cutoff is April 2023, so just missed the Transform acquisition and new docs. Crossing fingers that a future variant will be able to generate SL queries without an exemplar, but I don't think it's unfair to do a one-shot prompt in the meantime.

"only answer questions that are metric-ified" - my understanding is that MF will soon also be able to do the "Low question complexity" questions (i.e. don't do an aggregation, just return some columns), as long as there is a semantic model defined on the claims model (which is a prereq for adding a metric, so if you're building metrics you should also get "give me some arbitrary columns" for free).

"There are so many different approaches to how to do this" - where "this" is "implement the magic box between 'natural language request in' and 'table of results out for comparison'?"

Expand full comment

Yeah, agreed on the one-shot thing for the semantic layer now. That part seems fine to me; it just adds complexity to the text, but doesn't make it "unfair" or anything to me.

The metric-ified thing make sense to me too, though I guess it does raise a mostly unrelated question for me about how y'all are thinking about the semantic layer. Is the idea to model everything (within reason), or just to do what's necessary for building the models for the metrics you need? There's a kind of subtle difference there, but I realize I've thought about what y'all are building as the latter, and things like knowledge graphs as the former. I'm not sure why that is, but there's something slightly unexpected to me about thinking about dbt's semantic layer as something where you model the whole schema. But maybe that's a me thing.

And yeah, the "this" to me is all of the various approaches that are the middle box between English question and table of numbers. Just a few of the options:

- Feed a bunch of DDL into the prompt and ask it to write SQL.

- Feed a bunch of looser context (other queries, docs, etc) into the prompt and ask it to write SQL.

- Feed a semantic model into the prompt and ask it to write SQL.

- Feed a semantic model into the prompt and ask it to write a semantic layer/KG query.

- Feed it documents and have it tell you how to configure something like Tableau.

It's just a reminder to me that we're very very early in trying to make all this stuff work.

Expand full comment

Hey Benn! We've been thinking more along the lines of: Just to do what's necessary for building the models for the metrics and dimensions you need.

This approach let's you expose metric definitions to stakeholders and all the dimensions they need for analysis, without putting and undue configuration burden on the data team, who is actually setting up the semantic layer. As a by product of this you do generally get pretty good schema coverage, at least for the most used dimension and metric. This dimension/metric set let's you answer the majority of business questions. I think there is a long tail of questions that the semantic layer won't be able to answer since we don't have the full schema and that's ok!

Expand full comment

Gotcha, that makes sense.

I do wonder, though, how that ultimately plays out in practice. One of the biggest double-edged swords about dbt to me is that you can build it iteratively. That's great for getting started, but (at least from what I've seen) leads people to a lot architectural quirks, where the overall system/DAG can some strange quirks. It becomes meandering, like London, as opposed to centrally planned and organized, like DC. I could see that happening here, as you layer metrics on top of each other, always considering what you need in the moment and worrying less about where it's all headed.

Not that that's a bad thing, necessarily (and it can be a very good thing). But it does seem like a thing.

Expand full comment

I like the city analogy. To stretch it :

Highways form a backbone. Planning and predictions around the most common, frequent, valuable journeys, 'Formal' data modelling. dbt/data projects don't do the planning part because they operate like London, great fires. What happened in London is a retrofitted solution (London tube). I presume there are more Londons than DCs in data.

The important bit about the tub is the Tube map. Super simplified, giving you guidance on how to get from Camden to Brixton. Once you are there, it is noisy, literally, as you step out the tube, but at least you're in the right part of the city.

Most of the data thinking is from the context of already being in Brixton. Which route to get to the destination, how quickly, walk or bus etc.

My feeling is that LLMs need a tube map. Slanting slightly more towards a human level understanding of the landscape. If 'they' are smarter than a BI tool, why constrain them to a BI interface. They need orientation and a simplistic way to get between key parts of the city.

This gets interesting when thinking of the modality. LLM systems won't only query, but action, update, communicate, ask, refer. Google Maps provides the ability to use the tube map or avoid it. As it stands, it is possibly a pinnacle of human/data/reality interface design.

Expand full comment

That's an interesting comparison. Ignoring the LLM for a minute, I wonder if there are ways to build the tube through London dbt project. Like, if you end up with that kind organically created structure (which is really hard to avoid), how much can you put another layer on top of it?

The semantic layer needs a semantic layer, I guess.

Expand full comment

Questions posed to all:

1) Along the lines of what Benn wrote about multiple options:

does it make sense to include another option for using RAG?

That is convert all the strucutred data back into unstructured text and use it as vector database.

2) and can we stack these options?

meaning to say,

combine the options that Benn listed for a better performance?

Performance in terms of speed and accuracy

Expand full comment

This is how most serious efforts of that this work, I think. They're basically:

1. Build a vector DB of all the context you can collect (metadata, semantic model, etc)

2. Map the english question to that vector db.

3. Use that to find which pieces of context are likely to be most relevant.

4. Prompt the LLM with the question that context.

5. Ask it write a SQL query, or semantic query, or whatever.

I'm sure all of them have different flavors there, but I think most models follow roughly that pattern.

Expand full comment

Thank you for responding, Benn.

I have subscribed for a while to your newsletter. I enjoy reading it.

By the way, just to share, I came across dataherald.ai which focuses on natural language to sql (nl2sql) working on structured data alone.

I asked a similar question in their discord and cited this article.

I'm sharing it here.

> i am a web developer so i am not very versed. but here's my understanding of some of the prominent approaches:


> 1. convert structured data back into unstructured and then into a vector database so as to apply RAG

> 2. use nl2sql tools like dataherald for better context (similar to RAG with embeddings from unstructured data)

> 3. create a knowledge graph out of ontology describing domain and a mapping between the schema and the ontology (see https://roundup.getdbt.com/p/semantic-layer-as-the-data-interface) and then generate the SQL/SPARQL using the KG

> 4. i also see that dataherald also has "Golden Record Management" Technique https://medium.com/dataherald/how-to-connect-your-db-to-an-nl-interface-a-step-by-step-guide-7a9e9621bf1d


> for max performance (accuracy and speed), theoertically there's nothing to stop me from applying all of the techniques together right?

THere's one person who responded but not dataherald staff.

> I don't think the 1s step is necessary. nl2sql is merely a query engine layer for structured database, so you can keep the data unchanged, and by using dataherald, you should be able to query db by NL.

Based on what Benn wrote, I'm thinking of it this way. excalidraw diagram link here https://link.excalidraw.com/readonly/9p2IW29wvg0Klrt3EXzR

Expand full comment

For sure, and I appreciate that. Apologies for harassing you with dumb ideas every Friday...

And I think this is roughly the state of the art today, though there are so many of these chatbots now, I'm sure lots of people are trying different things. So, I could see some pattern emerge that works way better than others, and everybody moving towards that. But, the main thing people seem to be learning so far is that it's not easy to do. Whatever system you build, there will be lots of moving parts.

Expand full comment