Inside Snowflake’s AI roadmap (w/ Chris Child)
Snowflake's VP of Product Management on the vision for open table formats, governed agents, and the future of the data engineer
This season of The Analytics Engineering Podcast is focused on how the current data landscape is impacting the developer experience. Snowflake plays a major role in what that developer experience looks like.
In this episode, Snowflake VP of Product Management Chris Child joins Tristan to unpack Snowflake’s AI roadmap and what it means for data teams. They discuss the evolution from Snowpark to Cortex and Snowflake Intelligence, how to govern agents with row- and column-level controls, and why Snowflake is investing in Apache Iceberg and the Open Semantic Interchange initiative. dbt Labs recently open sourced MetricsFlow, the technology that powers the dbt Semantic Layer, to align with the goals of OSI.
Chris also shares a vision for the next five years of data engineering: fewer bespoke pipelines, more standardization and semantics, and a bigger focus on business context and data products.
Please reach out at podcast@dbtlabs.com for questions, comments, and guest suggestions.
Listen & subscribe from:
Key takeaways
Tristan Handy: Where have you spent your time professionally?
Chris Child: I didn’t end up in data on purpose. I found myself here through a series of hops. I was working at Redpoint Ventures and got excited by a company we invested in, RelateIQ. I left to join RelateIQ, building an intelligent CRM. We captured emails and meetings and built profiles of everyone you interacted with. We were acquired by Salesforce. Looking at what sales teams needed, I realized they also needed product usage data, marketing data, and campaign data, with a platform to pull it all together. That led me to Segment. I joined when it was about 50 people. Segment was mostly analytics.js then, loading different JavaScript on your webpage for tracking. We had just built the first warehouse connector to Redshift and got huge usage sending click and user data to Redshift.
The original Redshift connector was a nightmare to work with.
Like many startup things, one engineer built it in a week. Suddenly a ton of people used it, and enterprise customers depended on it. We had to rebuild it several times. You could see the future there. Folks I worked with went on to start companies like Census and Hightouch, thinking the CDP should be built on top of the warehouse, which Segment evolved toward. We also built a Snowflake connector because customers demanded it in addition to Redshift.
It’s funny to think back a decade to how small Snowflake was.
A couple customers demanded it; we built it, and we were sending a ton of data. That led to the realization that a customer data platform is one instance of a data warehouse, and there are others you need. Seeing how fast Snowflake was growing, I wanted to build the next layer of infrastructure.
I joined Snowflake seven and a half years ago. I’ve had three key roles. First, I built areas of the product: the UI, billing, product-led growth engines and free trial infrastructure, and application capabilities for connecting into and building on Snowflake. After Sridhar became CEO, he asked me to reconnect product and sales by leading solutions engineering, reporting to the CRO. Leading a global technical seller org was very different for a product person, but it helped align teams at scale.
About eight months ago, I returned to lead data engineering: how people bring data into Snowflake, how they transform it—spending a lot of time with dbt—and work around Iceberg and interoperability for worlds where not all data sits in Snowflake.
I didn’t realize the path started in investing. Are you a finance person way back?
My undergrad is in computer science. I started programming in fifth grade on an Apple IIe, learned C before high school, and followed that thread. In college I noticed business folks often made the decisions. I wanted to learn that side. After college I joined a consulting firm, then private equity, then an MBA. I realized I didn’t want to be a finance person. I moved to venture as a bridge to building products, but I wanted to build, so I jumped into operating roles.
Tell the story of Snowflake and AI. In the 2010s there was huge demand for easier, scalable, cloud-oriented data solutions. Then 2022 happened, ChatGPT launched, and the world changed. How did Snowflake respond, and where are you today?
Even pre‑2022 we saw customers putting their most important business data into Snowflake, then pulling data out for things they couldn’t do inside: training ML models and other analyses that SQL wasn’t a great fit for. Customers told us they didn’t like losing governance and lineage when data left. We invested in ways to bring more of that work to Snowflake.
Snowpark was the first big step: a runtime for non‑SQL code (Python, Java, Scala) with APIs inspired by Spark, plus capabilities like forecasting. It’s great for some workloads, but most customers don’t train most ML models inside Snowflake yet. We also acquired Applica for document extraction using early LLM techniques, and Neeva for web search based on LLM approaches.
When ChatGPT arrived, we saw two major influences. First, people wanted to chat with data they’d brought into Snowflake and transformed with dbt. That’s hard because LLMs are great with unstructured data and less great at turning business questions into correct SQL. Second, LLMs are very good at writing code, including Python and even dbt code. They’re not perfect for data engineering code yet, but they help.
Our goal is to help customers activate important enterprise data safely in AI models, deploy agents at scale under existing governance, and keep up with exploding data volumes without 10x headcount.
What are the key product pieces—Cortex, Snowflake Intelligence, etc.—in the Snowflake AI stack?
First, you need a great data foundation. That isn’t new: get the data in one place, apply good governance and permissions, know your data, tag PII, and raise the standard of care.
AI raises the bar because agents can expose sensitive data faster than dashboards. OSI (Open Semantic Interchange) work is part of this; LLMs need explicit semantics and cataloging they can consume, not tacit knowledge hidden in downstream tools.
Companies with strong hygiene move faster with AI. Roles matter; if a product manager role has access to certain rows and columns, an agent acting within that role can safely answer questions. Agents can run inside or outside Snowflake, but should assume appropriate roles when querying.
On the AI stack, after the data foundation, Cortex provides higher‑level APIs for unstructured processing, RAG, and structured processing. You can choose models (OpenAI, Anthropic, Mistral, Gemini, Llama, etc.), but most folks don’t want to manage prompts and GPUs. Cortex AI SQL lets you express intent like sentiment filters or fuzzy joins. It’s powerful for exploration but non‑deterministic, so you need care in production. Costs map to tokens at higher abstractions, with budgets and guardrails similar to variable compute in the cloud.
At the top, Snowflake Intelligence is a UI and agent framework. You define agents with access to specific datasets and semantic models, plus gold queries and usage guidance. It looks like a chat interface over your governed data. Inside Snowflake, we’ve deployed a GTM assistant that blends product usage, Salesforce, notes, docs, and content—structured and unstructured—respecting row‑level security for every seller while giving leaders broader access.
Let’s talk open formats and Iceberg. Why lean in when it opens up the data?
Our aim isn’t to lock up data, it’s to help customers get value. Snowflake began as a reaction to Hadoop—betting on SQL at cloud scale with our own formats and catalog because they didn’t exist then. Those proprietary pieces let us evolve quickly. Iceberg is now almost as good, and we’re contributing to make it better.
Openness is a win for customers and expands the universe of data Snowflake can query, run Cortex on, and power Intelligence with. The tradeoff is standards move slower. Variant type support is a good example—we contributed our approach and shepherded it into the v3 spec.
Next up, the community is wrestling with fine‑grained access control beyond table‑level policies. It’s hard and will take time, but the outcome should be better for everyone.
Give us your view on the future of data engineering.
Data volume is exploding, including unstructured data that’s now usable. You can’t hand‑build every pipeline. Demand is also exploding as agents query more things in more ways. Teams must operate at a higher level: automate, standardize, and reduce bespoke pipelines.
Expect more shared semantic models across consumers and packaged semantics coming from systems like SAP. You’ll also build data‑engineering agents to do work and monitor pipelines. The role looks more like architect and manager, allocating budgets, deduplicating work, and—most importantly—deeply understanding the business. The best data engineers shift from code output to data products, with clear semantics and context.
Talk more about context.
The day‑to‑day activity shifts, but the output is still data products. Great data products come with instructions, definitions, lineage, quality expectations, and how to get correct answers to common questions.
We need that context captured where work happens—models, visualization, quality systems—and made available everywhere: catalogs, agents, and UIs. As you build, you should also document, and those semantics should flow consistently into tools like Snowflake Intelligence so agents can reason correctly.
A big part of the challenge is selecting just‑enough context per question.
Chapters
00:01:50 — Chris’s path: RelateIQ, Segment, Snowflake
00:05:40 — Roles at Snowflake: product, solutions engineering, data engineering
00:09:00 — Snowflake and AI: foundations before ChatGPT
00:11:40 — Why keep ML and non-SQL work closer to governed data
00:13:40 — Applica and Neeva acquisitions, enterprise search context
00:14:50 — Two big AI influences: chat with data and code generation
00:16:50 — Scaling agents while preserving governance and cost controls
00:18:40 — Why governance must live at the data layer (roles, rows, columns)
00:22:00 — Inside vs. outside Snowflake: how agents assume roles
00:23:02 — Cortex: higher-level APIs over many LLMs
00:24:06 — AI SQL: joins/where by intent and the non-determinism tradeoff
00:27:40 — Cost models, tokens, and guardrails
00:29:10 — Snowflake Intelligence: agents over a governed foundation
00:32:10 — Open formats and Iceberg: Why Snowflake leaned in
00:36:00 — Standards tradeoffs: variant type and community progress
00:38:40 — Fine-grained access control for Iceberg: thorny but necessary
00:40:40 — The future of data engineering: scale, unstructured data, agents
00:43:20 — No more bespoke pipelines; standardized models, and semantics
00:44:50 — Data engineers as architects and business partners
00:50:00 — Code vs. context: data products and shared semantics
00:53:10 — Capturing context where work happens (models, viz, quality)
00:55:00 — Selecting just enough context for agent reasoning
00:56:30 — Closing
This newsletter is sponsored by dbt Labs. Discover why more than 60,000 companies use dbt to accelerate their data development.

