The context engineering playbook (Claire Gouze)
nao co-founder and CEO Claire Gouze shares a practical playbook for building a context layer your agents can rely on.
Context is everything in data right now. Everyone is looking for the same thing: a single place where you can ask a natural-language question and get back a reliable answer. With that, you can build conversational analytics, but you can also build essentially any agent that needs to connect to your organization’s data. As long as you have context.
Claire Gouze is the co-founder and CEO of nao Labs, an open-source analytics agent built for context engineering she started with Christophe Blefari. Her path into data was unconventional: a business school graduate who taught herself to code, she became one of the first business school hires at BCG Gamma, then ran data at sunday, a QR-code payments startup, where she built a data stack from scratch as the company grew from 20 to 300 people. nao came out of calling 80 different data teams and listening to what was slowing them down.
Lots of people are talking about building context layers and hiring context engineers. Claire and her team are doing it: they’ve authored a context engineering playbook with specific guidance on how to build your own context layer and how to create evals. They’ve built a community learning together in the open, and they’ve built tooling to make it easier. What I appreciated most about this conversation was its pragmatism. Rather than talking about context, we should all get to work engineering it. Claire’s playbook is a great place to start.
Please reach out at podcast@dbtlabs.com for questions, comments, and guest suggestions.
The agenda for dbt Summit 2026 is live. Dozens of sessions across analytics engineering, AI-ready data, agentic workflows, and enterprise scale, September 15-18 at The Cosmopolitan in Las Vegas. dbt Summit is the world's largest gathering of dbt users. Level up and start building your schedule: Explore the sessions.
Listen now: Spotify · Apple Podcasts · YouTube · Amazon Music · RSS
Three ideas from the episode
1. Context engineering is the new analytics engineering. The job is the same as it always was, gathering tacit business knowledge and turning it into something structured and trustworthy. The medium is new: markdown and files instead of only models. Claire already knows data people who have been renamed context engineers.
2. The biggest reliability gains are unglamorous. Fancy context sources don’t move the needle as much as you’d hope. Cleaning up her data model and writing good documentation is what took Claire’s agent from 40% to 90% reliability. Anthropic found the same thing: query logs added little; keeping your house in order added a lot.
3. We’re in the “just plug it into production” era of agents. Connecting an agent straight to every raw source is the 2010s mistake of plugging your BI tool into the production database, repeated. Context will need its own stack: a way to ingest it, transform it, resolve contradictions, and expose a single source of truth.
Key takeaways
Lightly edited for clarity.
Tristan Handy: How did you get into data?
Claire Gouze: My background is unconventional for data. I graduated from business school about 10 years ago, but I wanted to learn technical things, so I joined BCG Gamma, the data science arm of BCG. I was the first business school hire there. I spent three years building ML models for clients: forecasting, personalization, optimization.
Then I joined a startup, sunday, doing QR-code payments for restaurants, because I wanted to build my own company one day. They had nothing, just their production database plugged into Metabase. It was 20 people when I joined and 300 a year later, so we had to move fast. I came from consulting, where you build everything custom, so I built an ETL by myself, an ingestion pipeline in Python for Salesforce data, a transformation layer in Python. Then people told me there are tools called dbt and Airbyte. So I had to migrate all my custom work onto the standard stack. That was a hard lesson, and a useful one.
Why did you move from a “cursor for data” to a context layer?
When we started two years ago, the loudest pain point from the 80 teams we called was that it takes too long to ship dbt models. So our first product was a cursor for data: your IDE plugged into your data, with the agent holding all the context. But as MCP took off, Cursor and Claude were handling that well. The new thing data teams wanted was a way to let anyone at the company use agents on the data.
We just want to work on what excites people. Data people aren’t excited about how they code. They’re excited when they help business users and get valued for it. Data teams carry the trauma of being seen as a support team. If we can help them be valued by the business, that’s the greatest thing we can do for them.
Is there actually a product to build in context, or is it just best practices?
The main value we add is evaluation and governance. Every team I talk to puts something different in their context: some have full documentation, some have a semantic layer, some have almost nothing on their tables. But they all use the framework to test the reliability of the agent and to keep testing it over time in CI/CD. They also study the conversations users have with the agent, which shows them what people care about and where to improve.
We try not to overcomplicate it. Some people want ontologies and semantics. We want you to start simple. The context layer is a file system. We help you build it like a GitHub repo that belongs to you and doesn’t lock you in, and we add evaluation, permissions, and a UI on top.
What does the context engineering playbook involve?
It’s more about method than about exactly what to put in your context, because that differs for a startup with 10 tables and an enterprise with thousands. Start focused. Pick the team that asks for the most analytics, or your main company metrics, and reduce the scope to maybe 10 or 20 tables. Plug in what you already have, usually your dbt docs, and run your tests. That gives you a baseline reliability number. Then you iterate: see where the agent fails, redesign part of the data model, add documentation, profile a table. It’s an iterative loop.
Where do the eval questions come from?
Either you already know your most important questions and have the queries somewhere in your BI tool, so you collect those, or you use a skill we built that looks at the main metrics of your tables and suggests key questions to test. I still recommend you review them, but it gives you a first basis. Then you can say: on my 50 most important questions, I have 90% accuracy, and it’s going to stay that way. That number is what reassures a data team enough to roll the agent out.
Is “context engineer” a real job title?
Yes, I know data people who were renamed context engineers, so it’s already happening. Data teams are the perfect fit. Analytics engineering was about gathering business knowledge from stakeholders and translating it into something structured and technical. Context engineering is exactly that. Context is just company knowledge that you want structured, optimized so it doesn’t explode your token cost, and treated as a source of truth, the same way you’d want a metric source of truth. Data teams already think this way.
What context actually moves the needle on reliability?
It’s funny, the biggest jump in agent reliability is just your data modeling and your data docs. I ran the experiment: I started from no context and added sources step by step, measuring reliability each time. Profiling, query history, those kinds of things left me stuck around 40%. The agent was failing because of ambiguity between two columns, or a metric that differed slightly across two tables. When I redid parts of the data model and wrote documentation, I got to about 90%. It’s deep work to keep a data model clean and unambiguous, but it pays off with agents.
Where should human context live?
In our framework everything ends up as a markdown file eventually, so the starting format doesn’t matter much. What matters is where it gets maintained. If someone asks whether to document something in the agent’s context or in the dbt docs, I say the dbt docs, because it has to live as close to your daily work as possible. When you change a dbt model, you change the docs, and it syncs to the agent. If your support processes live in Notion, keep them in Notion. A separate markdown file you never touch again is worthless.
We’re in the “just plug it into production” moment. What comes next?
We’re at the phase where people say, let me just connect Claude Code to my Snowflake MCP, what could go wrong? It’s the same as when you plugged your BI tool straight into the production database, before the data stack gave us tools to ingest, transform, and create a source of truth. We’ll need the same thing for context. Right now we’re building the first wave, but we’ll start to see context sprawl and contradictions. So we’ll need a stack to ingest context, transform it, merge old and new and contradictory context, and expose a single source of truth to the agent. Where are the tools to do that? I don’t know yet.
How does this connect to memory?
When an agent corrects itself after eight queries to find the right field, you don’t want it to repeat that next time, you want it in its context. Same when a user tells the agent, no, this is the real definition. We should learn from all of it. It’s a memory mechanism. But the tricky part is making sure you learn the right memory. Locally with Claude Code it’s just you and the agent, so it can learn whatever you tell it, even if it’s wrong. At the company level you have to make sure people don’t teach the agent wrong things, so the data team still approves what enters the global memory of the company.
Where does MetricFlow fit if “file systems are all you need”?
I tested the skill you all built for querying through the MetricFlow semantic layer. The logic was: query through MetricFlow first, and if you don’t find it, read the dbt docs and write regular SQL. I think that’s exactly right. The metric layer is governance for your most critical, high-value metrics that have to be 100% accurate, but you shouldn’t have to define a metric before you can do anything at all.
Why open source, and how do you make money on it?
Open source makes sense for a few reasons. You want to be used by agents, not just humans, and if your code is open, an agent building an analytics agent already knows about nao and will build with our framework, which is great distribution. And nobody has the answers on what makes good context yet, so we need to learn together. If startups and enterprises can share what context worked and how the semantic layer affected their reliability, we get a common language and improve reliability across the board. We already know what we sell: our open-source product gives everyone the same data access, which works for a small company but not a big one. The enterprise license handles data permissions, context permissions, and token budgets at scale.
Chapters
Timestamps are approximate.
00:00 — The missing piece is context
01:45 — Welcome, and a few words on the French accent
02:52 — Claire’s path into data: business school to BCG Gamma
04:09 — Joining sunday and building a data stack from scratch
06:39 — Founding nao, an open-source analytics agent
10:42 — The journey so far: 1,300 GitHub stars, 80 companies in production
13:45 — Why pivot from “cursor for data” to the context layer
15:22 — Context layer hype at Snowflake Summit
17:48 — Is there a product in context, or just best practices?
21:23 — The context engineering playbook: start small, iterate
24:45 — Where the 50 eval questions come from
25:40 — The questions data teams never get asked
27:06 — Is “context engineer” a real job title?
28:55 — Flattening roles on the data team
29:56 — Machine vs. human context, and where to keep it
32:43 — The highest-signal context: clean data models and docs
35:36 — Where nao is headed: a source of truth across Slack, MCP, and more
37:41 — From the data lake for analytics to infrastructure for agents
39:05 — The “just plug it into production” moment and the context stack
41:49 — Roadmap: automating context creation
43:33 — Context as organizational memory
46:00 — File systems, MetricFlow, and the semantic layer question
49:05 — Why open source, and the commercial model
51:39 — Wrap-up
This newsletter is sponsored by dbt Labs. Discover why more than 80,000 data teams use dbt to accelerate their data development.

