The current state of the AI ecosystem (w/ Julia Schottenstein)

Former Analytics Engineering Podcast co-host Julia Schottenstein, now at LangChain, returns to the show

Oct 06, 2024

Welcome to Season 6 of The Analytics Engineering Podcast. Thank you for listening, and please reach out at podcast@dbtlabs.com for questions, comments, and guest suggestions.

Host now guest! Former co-host Julia Schottenstein returns to the show to go deep into the world of LLMs. Julia joined LangChain as an early employee, in Tristan’s words, to “Basically solve all of the problems that aren't specifically in product and engineering.”

LangChain has become one of, if not the primary frameworks for developing applications using large language models. There are over a million developers using LangChain today, building everything from prototypes to production AI applications.

If you're looking for a comprehensive overview on the state of the AI ecosystem today, this is the episode for you.

Join data practitioners and data leaders in Las Vegas this October at Coalesce—the analytics engineering conference built by data people, for data people. Use the code podcast20 for a 20% discount.

Listen & subscribe from:

Key takeaways from this episode

What is LangChain and what kind of impact is it having?

Julia Schottenstein: The two co-founders, Harrison and Ankush, got started on LangChain, an open-source project, back at the end of 2022, right before ChatGPT became an enormous household name.

And what LangChain does is give you building blocks that you can build your own applications much more easily. It's a framework that can be used in Python or TypeScript. It helps you bring your company or your personal information, your APIs, or up-to-date information to the context of the LLM reasoning API, so it can do more useful things.

As listeners probably know, LLMs can do a lot. They've been trained on an enormous corpus of information, but they have no idea about your private data or your private information. To do useful things for your business, you need to prompt the LLM in very specific ways, and what LangChain does is it helps bring that pipeline of data and computation to the LLM, so you can build apps and do things like customer support bots, or personal assistants, co-pilots, things of that nature.

This is a lossy analogy, but I often think about LangChain as the dbt for AI.

I think about it the same.

With dbt, we said there's this fundamentally new compute construct—the cloud data platform—and it needs a programming framework on top of it to enable users to do the things that they want to do.

I think LangChain came out of that same kind of need. There's a new kind of fundamental compute construct—the large language model—so what are the things that you need to do to make it useful? Is that right?

Obviously I have similar biases to you having been a part of the dbt journey, but I think it's pretty similar in the sense that LangChain has become a way for people to learn about generative AI and how to build applications.

There are these common building blocks. With dbt you have your models, tests, documentation, all of these components as a recipe to do work. It’s pretty similar to LangChain. We have the components you need to build an LLM application. And they're pretty simple. A lot of times it's, prompts plus a chat model, plus an input parser that, that take the raw text and makes it structured, so it can be useful in downstream workflows.

It's a framework. It's open source. There's a big ecosystem around it. I think we have a million developers now who use Langchain, and our partner ecosystem is really important for making LangChain as special as it is. There are the vector databases—dozens of those—and hundreds of different models, and people will swap in and out these components as they need.

You can define your logic once for your application logic, and then we've abstracted away a lot of the commonality among the underlying technologies.

LangSmith is our commercial offering, and it's a companion product. The reality with all LLM applications is that the hardest part is quality. It's easy to build a proof of concept in an afternoon. But it's hard to make it work consistently well at large scale and not have embarrassing moments when it's in production.

A lot of engineering work goes into polishing those rough edges, and making it worthy for end users to interact with it. And so Langsmith helps solve that problem with testing and observability. It gives you tools so that when you're making a prototype and moving to production, you can experiment with changing logic and see how it affects metrics you care about.

Once it's live, it's a monitoring and observability platform so you know what people are asking of your application and how well is it responding to questions. LangSmith supports you in making your application higher quality, getting to production faster, and giving you the visibility you need once it's live.

I wonder if the industry knows what it means to test or observe an LLM application. Software engineers have been testing and observing classic software applications for forever. But I feel like we're trying to invent the tools and processes that we need to make our AI applications high-quality.

Do you know enough about what your users need to build what they need?

We're definitely learning as we go. The big advantage of being open source is that we get to learn from teams around the world who are cutting-edge, and we can build alongside them.

We try our best to move quickly, to build product, to, to help solve it, but it is an evolving practice. We do wonky things in the LLM app world where you use an LLM as a judge to run your tests. You just ask another LLM.

That feels really awkward and clunky at first, until you realize that LLMs are fantastic at grading the response of other LLMs. You get this new way of bringing engineering best practices adapted to this non-deterministic, highly variable, type of application.

I interviewed Yohei Nakajima, the creator of Baby AGI and he talked about an agent that he built with a prompt to create a business—it decomposed the steps needed to start a business, and then it decomposed those steps and then it started doing the thing. And you could just leave it running for an arbitrarily long time.

When people build applications with Langchain, are they generally trying to build things where there's a request response? Or is it more building an agent that's behind the scenes?

There's a reality of what the technology can do today versus what we want it to eventually do. The majority of use cases in Langchain, just because of where people are, tend to be chat applications and RAG, as we discussed.

What you're describing is what we call agentic applications, where people have different definitions for it. Some people have the definition that agents are LLM apps that act on your behalf, so things get automated. Our definition is more that you use the LLM to decide which steps to take.

In this example of “build me a business” you don’t have to predefine all of the steps of, first incorporate, then hire your employees, then, then, then. You let the LLM decide the routes to take and you can let it run. Both of those worlds are hard as you can imagine. I don't think the business that this LLM built is going to be very successful.

What we've been trying to do is find a nice balance where you give the LLM the ability to route and make decisions. But if you fall into certain states, we define it in code, which actions they’re allowed to take. And so that's where the conditional edges come into place.

People end up building things that fall into a few categories. There's concierge search, a new way to discover information instead of clicking on a bunch of filters. You can chat and have the experience of, “I'd like to go on a cruise with my family of 10, can you send me some options?”

There are also a lot of copilots built into applications to help with tasks for operational efficiency—things that are repetitive or time-consuming—to automate with an LLM.

Anytime there's a new technology, it tends to be adopted first by a cohort of folks who work at small, digital -native companies.

There has been a tremendous amount of attention in the enterprise of the AI trend. I am not clear if the enterprise is AI curious or is really putting applications in production. Do you have insight into the relative stages of digital native versus traditional enterprise?

We've seen a lot of activity in the enterprise, a lot of investment. They usually start with an internal application.

The first thing that people will build is productivity for their employees. We've seen really large organizations with a mandate for their employees to get trained and learn about AI. And it could be as simple as “how do you interact with an LLM?”

Rakuten, which is one of our customers, built an internal GPT, where all of their employees can build mini apps. The most common app that people built or use at Rakuten is just for practicing their English. We see frequently mandated from the top for employees to have access to an internal chat app, which is generative AI, so people can learn to prompt it and to get their work done faster.

I'm curious about the industry structure of the AI space. The cloud data space evolved in the way that it did almost completely because there was a standard to unite around, which is SQL.

Do you have beliefs about what the emerging categories are and how the lines are being drawn?

Obviously, the model layer is the one with tremendous amount of interest. You're not an LLM application without an LLM. It's the one part of the stack that you don't get to avoid.

There's a big question about whether there'll be a lot of value accrues to the application layer or to the picks and shovels layer. Picks and shovels would be Langchain, because we're an orchestrator. We build tools for developers to build end applications.

Then there's the application layer, where you see a tremendous amount of interest. AI SDRs, AI marketing assistance, customer support.

Customer support is an enormous category. These companies are doing really, really well overnight. You'll hear some of the numbers of going from zero to 20 million ARR in one year. It's an open question of where all the value will accrue or how sustainable those revenue streams are.

You always will have both. You'll have fewer on the picks and shovels side that accrue a ton of value, and a lot of people will try to go after that to be the currency of the new industry, and then you'll have many, many more successes with the application layer. I think you'll have companies that will rise and fall pretty quickly there.

One of the criticisms of the modern data stack has been that if you want all of the appropriate picks and shovels, you have to go to 12 different hardware stores to buy them all. There are players in the AI space that are a little bit more narrowly scoped than the stuff that you folks are building.

But my understanding is that your goal is to provide all of the tooling required to put applications in production. Is that right?

It's part of our philosophy that the data you collect in production is so important to help you improve your application that it needs to be all in one.

To put a finer point on that, it's really hard to come up with realistic examples or edge cases that you need to test against to improve your application. And so it's important to be able to observe production traffic, understand where your app is falling short, and collect those examples of where you need to improve, and then feed that back in to your testing and design process so you can iterate faster.

And the other cool thing that we're, we're starting to in and experiment with is feedback from production. People liked your responses; can we use that to influence and improve your prompting strategy or improve the way that you actually can design your application. All in one is our approach.

Some people are more narrowly scoped and they do testing for financial data really, really well, or they provide an LLM that's really good at SQL or data analytics.

It's just where people are picking their lanes. We have aspirations to support you throughout your entire development life cycle.

Let’s talk about design. We all have built this idea of what good design looks like over the last 30 years of the internet, but we just don't know a lot about what good design looks like for these systems yet, do we?

Like, are there people who consider themselves like quote unquote AI designers and what does that mean?

There are two types of interesting design happening.

There's the user experience—how an end user interacts with this application. Oftentimes it's chat. We have companies that are shopping online where you're now chatting instead of browsing. It's a totally new user interface.

There are other more creative user experiences that we're seeing as well that are ambient where there's just an agent listening and observing and then knows to take action.

As you start to dissect and think about the different types of user experiences that are now possible, how do we design for them? It's totally new worlds that I took for granted. It is these small new ways to interact with your application and software.

In the run up to Snowflake Summit, we released a native app, and it had a a feature in it called ask dbt, powered by the dbt Semantic Layer. You can ask natural language questions and it compiles them into semantic layer requests and it gives you governed answers back in text.

But the interesting thing was not the user interface layer that you are talking about. The user experience that we were trying to think about was the types of questions will people ask.

That's the second design pattern that I was going to go into. Your application logic has to reflect how people are going to use your app. And so if they're coming in with certain types of problems that they need to get solved, you're going to design your app in a different way.

How much do you think the creativity being unlocked in the LangChain ecosystem is still modest relative to what will happen with the next major innovations in the underlying model architecture? Are we seeing the thing or are we just seeing a tiny little precursor to the thing?

It has to be the latter. I've been in this space now for a year, and the types of applications people are building today versus last year are very different.

Because the models have gotten better or because people have gotten better at using it?

Skillset mostly. People have figured it out. They've gotten more of the basics down. They've figured out how to push the technology further. Yes, the models are getting better as well and that helps. But as people learn best practices for prompting, for fine tuning, for testing, we can just go a lot further.

This newsletter is sponsored by dbt Labs. Discover why more than 30,000 companies use dbt to accelerate their data development.

Book a demo

The Analytics Engineering Roundup

Discussion about this post