Ep 53: Navigating AI Complexity (w/ Jonathan Frankle of MosaicML)
Jonathan Frankle is the Chief Scientist at MosaicML, which was recently bought by Databricks for $1.3 billion.
MosaicML helps customers train generative AI models on their data. Lots of companies are excited about gen AI, and the hope is that their company data and information will be what sets them apart from the competition.
In this conversation with Tristan and Julia, Jonathan discusses a potential future where you can train specialized, purpose-built models, the future of MosaicML inside of Databricks, and the importance of responsible AI practices.
Listen & subscribe from:
Key points from Jonathan in this episode:
How many LLM based systems are actually in production today? Where are they being used?
Jonathan: A lot of systems and chatbots are meant to generate potential content that a human will eventually review. Something like GitHub Copilot is that. They're generating potential content that a human will review. It happens, it's very short amounts of content, it's reviewed very quickly. But there's just the autocomplete application as an example of that. Jasper doing marketing copy is an example of that. I don't really trust these models to make big decisions on their own. I don't think anyone should right now.
But there are a lot of applications where it's expensive for someone to write something and cheap for someone to check it. Those are the great applications for LLMs right now. You do want a human in the loop, at least for most things.
At least anything that has stakes attached to it, you probably want a human in the loop, but there's a lot of stuff where you're generating content that might be expensive.
That's marketing copy, that's ideation and that’s even that code snippet. It's really cheap for a human to check. Copilot is a great example of a very tight loop. A lot of applications are much bigger, kind of slower loop because there's more content to generate, but same idea. It's a lot of summarization, a lot of content extraction, they're definitely in use.
I don't think the case has been proven that they deserve all the hype they're getting, but they're contributing to production.Â
What does success look like in your business case?
Jonathan: When you work with us, you also end up working with my research team. Customers like the honesty and the clarity of thought about what we can and can't know and how we work around that. There are a lot of folks who are promising too much clarity right now, and it will always disappoint you. I'd much rather give people direct scientific honesty and help them work from a place of understanding of what we do and don't know.
My big belief about the field, kind of my philosophy, is that everything we think we know about neural networks is probably wrong. And the first step on your journey to knowledge is accepting that you know nothing. So when people come in with a lot of knowledge or a lot of frameworks or a lot of belief about exactly what to do, or how things work, they're going to be wrong.
And when customers come in with that belief, they're going to be wrong. My job is to help them get to a point of greater knowledge and help them get to a point where they can make sophisticated decisions about really weird, complex, messy systems. That means being really honest about what I know as a person, what my team knows, what we know as a field. People appreciate that a lot.
How often do you have to update your training?
 Jonathan: It depends on what you're trying to do and whether time is a factor. There are a lot of situations where time just doesn't matter. You're a bank and you're processing, you've got a chatbot and you're giving customer service or something like that. Time may not be a factor. It may not matter who won the Super Bowl this year. We kind of over-indexed a little bit on the idea of time and factual knowledge for something like GPT-4, which is trying to know everything about the world.
However, when you're doing code generation or marketing copy or what have you, time isn't really a factor. You update your model when you have new data or you think you can build a better model or the paradigm changes. But at the end of the day, time isn't that much of a factor. And other situations where time matters, I don't know if the New York Times wrote a New York Times article bot or something like that, kind of the epitome of time mattering, or, you know, you're meta and you have recommendation models that need to change updated with how the trends are changing on a daily basis, or, perhaps even more often than that.
Time does matter. We've learned from the recommender system world, people do retrain really frequently where timeliness is important, and I would expect that to be the same thing here. Now, the question is whether you have to redo all of the training or whether you can get by with a shorter training run building on top of a model and appropriately mixing all the data together.
The model doesn't over-index on recent data that is still an open question right now. I'd give the same answer I'd give anywhere. Try it and find out. There's a little bit of tinkering, and it's going to differ by dataset and by application. It's really easy to test because if you have your dataset for the past five years, just cut off 2023, train the model up to 2022, and then try a bunch of different approaches for incorporating 2023.
Be it training further on 2023, be it doing retrieval in 2023, be it training further on, every year, including 2023, and maybe oversampling 2023. There are lots of hypotheses you could pose about the mechanism going on in the model. Then you can test it and see what kind of thing seems to work.
What do you hope will be true for the data industry looking 10 years out?
Jonathan: Â I'm not going to make any predictions about the future. It's not a prediction. It's just a hope. It's just a wish. My hope is that we'll all be a little more careful and thoughtful about how we relate to this technology. We will all respect it for its potency, but respect it for its sheer complexity. The impossibility of really understanding what's happening, and I think that's intrinsic.
I really do think that the complexity in these systems is such that it is intrinsically difficult or impossible to understand what's happening inside. So with that in mind, my hope is that we'll all be a little wiser and a little more thoughtful about what's happening and how to navigate this, both in day-to-day operations, what the tricks are for getting it to work properly, and what the best practices are.
In general, about how to be good scientists and how to be thoughtful and careful about what we're working with and how to understand the limitations of what we're dealing with. That's really my hope. And I hope MosaicML and Databricks have a part in making that a reality.