The rapid experimentation of AI agents

Yohei Nakajima, general partner at Untapped Capital and creator of BabyAGI, on AI agents, and where they might take us

Jun 09, 2024

Yohei Nakajima is an investor by day and coder by night. In his day job, he invests in early stage companies as a general partner at Untapped Capital. As a hacker, he's been focused of late on the applications of AI.

In particular, one of his projects, called BabyAGI, got a ton of attention about a year ago. BabyAGI is an AI agent framework, creating a plan-execute loop. If you give it a goal, it will create a plan to achieve this goal, and then go execute on this plan. All of this plays out in a long chain of LLM API calls, and you can observe every step along the way.

The truth is that this is an extremely experimental space, and depending on how strict you want to be with your definition, there aren't a lot of production use cases to point to today of AI agents. When you watch the demo videos on Yohei's Twitter feed, you can immediately see the promise.

Join data practitioners and leaders in Las Vegas this October at Coalesce, the Analytics Engineering Conference built by data people, for data people. Register now for early-bird tickets to save 50%. The sale ends June 17th, so don’t miss out.

Listen & subscribe from:

Key takeaways from this episode.

What do you spend your time on these days?

Yohei Nakajima: I'm a VC by day, builder by night. It's always been a hobby I've always had on my side. I've never been an engineer by trade, but I've been coding as a hobby since high school. And I became more of a no-coder for a while, just because with the limited time to build, no-code was easier. But when I started using AI, I realized that I could pump out some pretty cool code in a matter of two to three hours, so I switched back to code about a year after I started using OpenAI. That's where all my GitHub projects come from.

Do you want to take a minute to plug a company or two that you've invested in recently, for the audience to see what you're seeing on the VC side?

One of the tools that I use every day, which is the easiest plug, is a company called Wokelo.ai. They do AI research and due diligence. When I'm interested in a company, I can just type in the company name and click Generate Report. Thirty minutes later, I get a 20-50 page report that includes market insights, recent news, competitors, management profiles, all that kind of stuff. It really is the kind of work you’d ask analysts to do and it’d take them two weeks to do, but it's in 30 minutes.

Based on the deal flow that you're seeing, what are some underreported things that will be true about the world in two or three years that no one's paying attention to?

I've done a couple of talks on autonomous agents, and one of the things that gets more of a reaction is the idea of an agent who is a CEO. When people think about AI right now, you imagine AI at the bottom of the rung. But an example I give is Amazon Mechanical Turk or Upwork. These are technology platforms that help you manage people. And now imagine taking this capability of LLMs and rebuilding something like an Amazon Mechanical Turk.

You can imagine how you could build an agent that can manage a college ambassador program and that AI can do that more efficiently than a human, in theory. If you extend that, then why can't you have a CEO who's available 24-7, has access to every single piece of data in the company, can take feedback from every single employee and synthesize it, and has bias that's at least measurable and transparent, so you can adjust it, versus wondering if they're being unfair.

One of our values as a company is we move up the stack. It's an invitation to always replace yourself. And if there is a very effective AI CEO, I'm ready.

The initial idea around BabyAGI was to prototype an autonomous startup founder.

I think AI is far from being able to build an innovative startup that becomes a unicorn. But if we pick something that's a little bit more straightforward and purely digital, like an e-commerce dropshipping business, I think it's within reach to build an AI system that can run and manage something like that.

Tell us about BabyAGI. What is it? And what have people built with it?

I think the reason BabyAGI went viral was at that point, ChatGPT was still relatively new. People were still building on top of that chat interface. BabyAGI was one of the first popular open-source projects to loop in LLM.

I added some capabilities so when you're given an objective, I’d first have a task list creation agent to generate a task list based on that objective. And then I would use code to parse that out. And then send each task one by one to an execution agent to execute that.

When I was generating new tasks, I had a task prioritization agent, which would review past results and update the task list. It would then check for the most similar tasks first, so that it would try to generate new kind of tasks. And what happens when you press run, build a business, it would just come up with things to do.

At this point it was just LLM calls; it was just generating tasks. But when I asked it to start a business—I need a marketing plan, I need to build a website, I need to come up with product ideas—it just kept going one by one on, like we do around building a business.

This has become commonly referred to as an agent, this idea that there's an AI that operates in a loop and it plans and it executes, and it can to a certain extent take actions in the real world. I think this is not a new idea, but it is an idea that maybe it's time has come. Is that a fair way to think about it?

I agree. It's not a new idea to let a software program autonomously run. I think with LLM capabilities you could finally get it to reason, do similarity search, and do it in a way that was much more robust.

Can you talk at all about agent architectures? Are there better and worse ways to construct agents?

When new technology emerges, there's a period of rapid experimentation. If you look at old cars, there are three-wheeled cars, there are steam-engine cars.

But if you look today, a lot of the cars or phones, they all look the same today. With autonomous agents, we're in the rapid experimentation phase, where there's a whole bunch of different frameworks, a whole bunch of different architectures, a whole bunch of different approaches.

It's hard to say which one's going to stick, but ultimately there’ll be some consolidation. So I think it helps if you're thinking about architecture, just look at them each one by one. Task planning is probably one of the main ones. The two major ways is the react style, which is to do one thing at a time and reflect on it. And then there's the BabyAGI style—plan and execute—which is generate a task list first and then go through them.

But of course there's a fuzzy line because you can generate a task list first and then reflect on each result to update the task list. It's not one or the other.

How should we think about where we will see agents impacting our lives first?

I think about it in three buckets. I think of the first bucket as what I call handcrafted agents, where you are writing each problem. And you're chaining it specifically with API calls and it's a very specific flow. Some people would call that not autonomous because it's a human generating the task list and you're not AI running it autonomously. That being said, with the idea of Wokelo, if I can give an AI a company and it's going to give me back a 30-50 page report, from a usage standpoint, it feels very autonomous to me.

And then there's what I call the specialized agent, which is the I think where it becomes truly autonomous, it's dynamically generating its own task list or recursively figuring out what to do, but it's within a specific set of skills and tools. So you can imagine a coding-specialized agent for a VC specialization that just knows how to look up CrunchBase.

And then there's the general-purpose, fully generalized autonomous agent.

Handcrafted agents are useful today. I mean, Wokelo is one specific example, but I know many companies that are doing handcrafted agents. This is humans generating the task list and they're charging lots of money with low churn, and they're creating value.

With specialized agents, what I'm seeing right now is pretty interesting demos with a lot of promise. These companies are raising capital; they're starting to have conversations with enterprises because they're interested, and they are starting to line up pilots because these companies are willing to test it.

But I haven't seen anything that's blowing me away in terms of reliability and value creation.

And then when it comes to general automation, I haven't seen anything remotely close to reliable. I also don't think of them as three fully separate buckets, but actually more of starting here and slowly moving to there.

Can you talk at all about why agent designers switch between different models for different tasks?

Different models have different strengths is the shortest answer. Some models are better than others at writing code. Some models are better than others at writing long pieces in an eloquent manner. And it's constantly shifting.

Developers are looking at the cost, the speed, and the quality of each model, and when I say quality, specific to the use case. So with an agent system, you have parts that need to write code, parts that need to respond to the user, so depending on the specific need you're swapping out models to see which one is optimal.

Now that language models can write code, can formulate their outputs and inputs as JSON or other structured data, is there any particular limit on what types of actions they could take in the world?

The first thing that popped to mind was anything that requires a physical body will be hard for a digital agent to do unless you give them the robotic parts to do so, but again, I wouldn't say that's a limit, like we can build the robotic parts and we can give them division capabilities and we can do those things.

What's so interesting to me is messaging. Are there agents in the world today that are connected to the Twilio API or to SMTP and sending emails? Yeah,

Definitely. There are tons of emails that are automatically sent. But I think the question you're asking is, are there any that are fully dynamically managing it on its own? Probably.

Do you think that most AI workflows are going to be created by software engineers? Or because the capabilities of AI are so powerful in the realm of writing code, are they actually going to be created by less technical folks who are closer to the business?

This is a little bit more of a hypothesis. I don't know is the short answer, but I would guess, based on my experience building it, that you need a really good core framework, and that has to be built by the engineers. And the whole point of the framework should really be around a good agent should be one that the more you use it, the better it gets.

In the future, when you say, who's building the task list, I don't think it's the engineers. It's the users who are going to ask an AI to do something, and then when an AI doesn't do it the right way, it'll give it guidance, and the AI will remember how to do it that way and keep you getting better.

The engineers will be building essentially how the brain works itself, but just like how you and I have gotten better at things, the AI is going to get better by working for somebody and getting feedback from that person.

What do you hope is true about the data or in this case, the AI space in five years?

I hope that we’ll see more use cases of AI with the goal of helping people better understand each other. I actually did a TED talk on it, the idea that AI can help us better understand ourselves and each other, both through the usage and building of it.

This newsletter is sponsored by dbt Labs. Discover why more than 30,000 companies use dbt to accelerate their data development.

Book a demo