2 Comments
User's avatar
Rich O’Brien's avatar

Curious how you are handling PII while using Codex and dbt MCP to query customer data? Are you setting up at the database level, in dbt, in your AI coding tool, combo?

We've tried a number of ways but run into issues with each:

- Snowflake (Snowflake CLASSIFY + Tags + Masking). Drops tags during ETL and dbt reloads

- dbt (Use Snowflake tags or dbt_snow_mask). Missing APPLY TAG posthook and misses raw tables

- Coding Agent (Claude Hooks, .mds). Can't be trusted alone

Tristan Handy's avatar

It's a good question, and recognize that YMMV on this. As long as models are approved for usage internally we treat them the same way that we treat other forms of compute. So: we make Bedrock available internally and do not attempt to prevent it from accessing PII.

I do not have a strong perspective the broad question of "when is it appropriate to give models access to PII" but ... my general feeling is that companies share PII with compute providers all the time. We're used to doing this with our data platforms, our cloud vendors, etc. Very normal. Models are just another form of compute, and especially if you're procuring your models directly through your cloud vendor, it's not clear to me exactly why we should think about them differently.

I think all of the PII attention here has been more about employees putting PII directly into consumer AI tools, which certainly, that's a no-go. But if you have a negotiated agreement with a model provider, it's not clear to me why that should be treated differently as any other cloud compute. Thoughts?