Hunting for Tokens. Snowflake Summit. Agent Use Cases.
Things are not quieting down for the summer, that's for sure.
Oh good lord. The last issue I wrote was two weeks ago. In the meantime:
Anthropic filed a confidential S-1 as the fastest-growing company…ever.
A bunch of dbt stuff launched: State, Wizard, Core 2.0, more. I’m not going to write about them here; those three links cover them better than I could. They’re each a very big deal.
Snowflake Summit happened and I had about 1e4 conversations about agents with data people.
Two of the smartest things I’ve read on AI were published.
I’ve been on a personal token hunt.
Any one of these topics could be a whole post. I’ll do my best.
Hunting for Tokens
Have you heard that Uber is setting token budgets at $1500 / engineer? The 18k / year is 11% of their average annual salary for engineers.
Apparently this was newsworthy…companies are setting budgets for tools. 🤷 Is your company not doing this? I’m legitimately curious, leave a comment.
Anyway, we set an internal limit at $500 / month, although that’s just kind of a number picked out of thin air and we’re not particularly attached to it. Anyone can come back and ask for more tokens as long as they write 1-2 sentences about what they need them for in a public channel. It’s a reasonable social / accountability mechanism to make sure you’re not doing random crap that has no business value. That said, I’m super open to the idea that engineering token spend looks more like Jensen’s 50% of salary rather than Uber’s 11%.
I recently hit my own personal token limit 15 days into May. I had just finished a news-aggregating-and-summarizing agent that I use to follow several hundred news feeds and, between the actual build and then the subsequent API usage for batch inference, I hit $500 pretty quickly. Of course I could have just used my privileged position in the hierarchy to demand more tokens, but I decided to take it as more of a learning opportunity: what would I do if I couldn’t just demand more tokens? Constraints breed creativity.
I’m SO happy I did this. It started me down a fascinating (and fun!) learning journey. Some quick notes:
Local inference is hard. I bought a Mac Mini, ran ollama with GPT-OSS 20b and one of the qwen models (I forget). This was not good. Token rate was super-slow, I had to architect my LLM calls to minimize context because of memory constraints, and model quality was observably much less good. I often just got straight up timeouts. Overall this was a dead-end with this setup. There are now new hardware options tuned for this use case, with MSFT and Nvidia collaborating to ship a pretty meaningful token-producing supercomputer. I’m not convinced that this is really fit-for-purpose for real software engineering work though. They can do a reasonable job for 1 concurrent inference request on a 120b-param open weight model today…but how often are you really just using a single-threaded inference call?
It is easy to build model routing into your own agents. If you build agents that include batch (non-interactive) inference, this is an absolute requirement. I didn’t even implement some package or anything fancy, just a layer that allows you to configure models and model providers for each different type of inference call. I don’t currently love dynamic model routing, although that’s an option…right now I feel like it’s something that the agent developer should control rather than abstract away. But you likely don’t need the same model for every single inference call you make.
Don’t use your Claude Code CLI interface to do batch inference. It injects a bunch of random Claude Code-relevant context that has nothing to do with your batch inference calls. It’s a LOT of context, it makes every batch call slower and MUCH more expensive. Switching to an API key made a dramatic impact on speed and cost. If you are rolling out Claude Code org-wide IMO you have to also give access to API keys because otherwise you are forcing your employees to use the CLI interface as their only way to get tokens, and it’s not a good path. We haven’t done this yet and should.
Bedrock, and equivalent services, are an absolute cheat code. The minute I got myself a Bedrock API key this all just got very easy. For interactive workloads like your actual coding, I can imagine having personal preferences around the model you use…Opus 4.7 and GPT-5.5 have different idiosyncrasies and you tend to learn them and form your habits around them. But for batch it’s just an empirical question: which performs better? Before you productionize something, even something small, you should absolutely test like 10 different models, from different providers. Your harness will make this easy, just ask it to do this and show you example outputs along with its judgment around which the top contenders are on cost and quality. When you have a single Bedrock API key this all becomes effortless.
Building your own inference cost tracking is very easy. You should do it. Just use OTEL to trace every inference call you make and then have your harness analyze some logs for you. Maybe there’s more to do in prod (an exploration for another time) but you should be doing this in the prototyping process too; don’t over-build it.
Inference optimization is a part of the process of learning to build agents. And usage caps help enforce that discipline. But often CXOs think “I don’t want to limit AI adoption, how can I cap token spend?!”
And certainly, with more time and energy I could absolutely blow through any sane token budget with business-valuable spend regardless of any optimization. So ultimately I’m not trying to say “optimize until you can fit into a box”. Rather: inference compute is a resource you’re responsible for just like compute/storage/etc have always been and before you go and solve the problem with $$, you should first do some obvious best practice stuff. YOLO.
Even if you have carte blanche right now, it’s shouldn’t be a point of pride to be profligate. While real engineers ship, real engineers also optimize.
Snowflake Summit and The Data and AI Cloud
I have a couple of things to say here:
thoughts on Snowflake’s strategy
thoughts on vendor strategy
thoughts on enterprise conversations I had
First: Snowflake’s strategy.
My read on the direction that Snowflake is headed is essentially this. They are saying “We have customer trust, access to customer data (which is really hard to get!), and we’ve solved a lot of the thorny data management problems out there. We’ve earned the right to also run AI workloads on top of that data. So, we’re going to build a bunch of the categories of AI tooling that customers clearly want. We’re going to provide inference, an agentic coding harness, agent orchestration, agent sandboxes, etc. And because all of this stuff will already be behind the firewall and have direct access to customer data, it’ll become the default solution for AI that touches our customers’ data. Thus, consumption will go up alongside customers’ AI adoption.”
(To be clear this is not based on any special information I have access to, it’s just my read based on public announcements.)
I am both surprised and not surprised by the approach. It is not crazy, and it may well turn out to be optimal. But it’s so different than what I would do that it took me a second to internalize the logic of it.
I tend to think, by default, in ecosystems. I generally don’t want to try to compete with companies, or certainly entire industries, I’d prefer to align with and build alongside them. It turns out that agent sandboxes are an important piece of AI infrastructure. Cool! My default question would be: which one of the many very successful agent sandbox infra companies do I want to partner with?
Same for inference—the last thing I would want to be doing right now is trying to be a general-purpose inference provider when Anthropic is about to IPO as the fastest-growing at-scale company literally ever…and their main product is inference, powered by their own custom models that are optimized for the exact workload (coding) that is sucking down all of that inference. The full-stack competitive advantages building up inside that organization right now, from applications all the way down to atoms, are truly impressive.
So: what I saw in all of these launches was, largely, Snowflake deciding to have entries into multiple AI categories, each of which has multiple independent billion-to-trillion-dollar companies competing for it, and each of these companies has nearly unlimited access to capital as long as their growth curves keep doing what they’re doing.
Snowflake’s reasons to win customer dollars: they have existing customer paper and pre-existing access to the data. These are very real moats! But the entire AI infra space is gonna try to find ways around them.
If you’re curious, my default belief about what I would do in their shoes is simply try to answer the question: how do we make Snowflake the most AI-native data cloud used by agents? How do we, like Vercel for Javascript apps, become the default agentic choice for the workload we’re already really damn good at? That’s a question that you answer with an ecosystem, not a platform. Maybe it’s a smaller company though? I don’t know.
Most / all of what I said here will likely be just as applicable to the Databricks launches coming up in a couple of weeks—my read of both of their strategies in AI is fairly similar.
Thoughts on vendor strategy
I don’t have super-sophisticated thoughts here, but I will say two things.
I believe that the marketing copy on every single vendor booth at the show used the word “agent”. I am not exaggerating, I think this is a literally true statement. Of course, our booth did too! The shift to agent-focused messaging was overwhelming; while this word existed at the event last year, it was nowhere near this prevalent. Last year everyone wanted to say “AI”.
For those that I engaged with more than superficially, I was unable to grasp what anyone was doing that mapped to anything I understand as “agentic”. That is not a criticism, just a statement of where the ecosystem is. Our (dbt + Fivetran) agentic story is real, but it’s still early, and I don’t know whether you would have fully understood if you randomly showed up at our booth unless you spoke directly to a small handful of PMs and engineers. Charitably, that may be where everyone else is too. Regardless, it’s clear that there is a LOT of vendor attention on agents and the stories are all in early stages.
Thoughts from customer conversations
I had maybe ~15 30m+ meetings with customers while I was in town, skewing towards larger enterprises, but inclusive of multiple at-scale digital natives. I also had dozens of in-passing interactions. Not a huge sample, but the signal I got was fairly consistent. A few thoughts:
First, the cost of migration from legacy systems isn’t theoretically going down, it is actually going down. There are now lots of case studies of this. Projects that were taking 18 months are now taking 4-6 weeks due to coding agents, and customers are happy with the results. This success is becoming widely-understood and baked into customer expectations.
Migrations have been the biggest blocker to modernization over the last ten years within the enterprise; I expect enterprises to be a lot closer to best-practice in the next few years as they un-anchor themselves from multi-decade-old legacy data systems.
Second, true agentic use cases in the enterprise are still somewhat rare outside of the well-paved paths. Coding agents are obviously here, although still have high variance of deployment inside the enterprise. But coding agents do not actually need access to your data lake. Customer support agents and conversational analytics do, and those are widely-understood and often being built or deployed.
But when people talk about “agents powered by your data” I don’t think they mean these three categories, they are referring to the rapidly-arriving agentic reinvention of all knowledge work. Thing is, while it seems to be fairly conventional wisdom that this is the direction of travel, production use cases are sparse, and even meaningful ideas seem sparse! I did find them, certainly. But the ones I did find didn’t bowl me over as ‘wow this is going to reshape that business’.
The funny thing is…having built some myself, I can attest to the fact that there really aren’t a ton of purely technical barriers to deployment today.
So: the “build agents on your data lake” thesis is either just really early, and enterprises are just putting the requisite pieces together and figuring out how to think about this category of things, or the thing that basically every single vendor is oriented around is vaporware.
I’m sure some of you will constitutionally be inclined to believe the latter. I disagree, and have become somewhat obsessed with this topic. What agents will we all build on top of our data lake? How quickly will this deployment happen in practice?
Two really good posts
I’ll wrap by linking you to two of the best strategic / structural posts I’ve read about AI. They’re both by Azeem Azhar at Exponential View, and are paywalled. You can read one of them for free if you subscribe to his newsletter.
The first one talks about inference business models: seat-based, usage, etc. It’s a great frame for how inference business models will evolve over the coming few years and has direct implications for how you’ll get access to tokens.
The second one is about how value is actually created within a business via AI, and why often times the impressive person-level efficiency gains don’t transfer to company-level gains. Hint: the answer is optimizing for entire loops!
Are you having fun yet?
- Tristan
