Ep 59: The 2024 Machine Learning, AI & Data Landscape
Matt Turck of venture capital firm FirstMark joins to discuss his annual view of the data, analytics, machine learning and AI ecosystem.
Matt Turck has been publishing his ecosystem map since 2012. It was first called the Big Data Landscape. Now it’s the Machine Learning, AI & Data (MAD) Landscape.
The 2024 MAD Landscape includes 2,011(!) logos, which Matt attributes first to a data infrastructure cycle and now an ML/AI cycle. As Matt writes, “Those two waves are intimately related. A core idea of the MAD Landscape every year has been to show the symbiotic relationship between data infrastructure, analytics/BI, ML/AI, and applications.”
Matt and Tristan discuss themes in Matt’s report: generative AI’s impact on data analytics, the modern AI stack compared to the modern data stack, and Databricks vs. Snowflake (plus Microsoft Fabric).
dbt Labs just published the 2024 State of Analytics Engineering—the annual survey of data practitioners and leaders. This year’s report delves into:
The top priorities and challenges of data teams
How data team objectives align to those of the broader organization
How data teams are investing for the future—including how they think about AI
Listen & subscribe from:
Key takeaways from this episode.
If I were going to author a brilliant takedown of your graphic, I would complain about the side of the graphic where you have a bunch of AI applications. If everybody's claiming to be an AI application now, it's hard to know who to include. Why include those types of companies?
Matt Turck: That's a great question. The core principle of what we're trying to do here is to show that data is a continuum. And you start with how to get data, how to store data, how to process data. And then somewhere in the middle, you got data analytics, which is a whole BI route. Or you have machine learning and AI, which is creating predictive analytics based on the data. But all of this ultimately is for the sole purpose of creating applications that enable end users, mostly business users, but sometimes data analysts, sometimes more technical users, to do something. Right?
So there absolutely has been a temptation to break down the landscape into multiple landscapes, one for data infrastructure, one for machine learning, one for applications. But the core design principle, and it's very much an editorial choice, is to show that symbiotic relationship between all those moving parts. So in some ways, it is combining multiple landscapes into one landscape for the same price.
Do you have a take on the statement that basically every software company is a data company and that on some level, if you're building a software product and you don’t allow your users to take advantage of some type of data analytics, then you're probably missing some type of opportunity.
I think there’s a lot of truth to that, especially when it comes to AI. That’s more of an end result rather than the current state of things. There’s a wide spectrum today in terms of how companies use and leverage data, machine learning, and AI for their applications.
Ultimately, nobody's going to talk about being a database-driven company. Maybe in a few years, nobody will talk about being an AI-driven company because the reward for success and ubiquity becomes invisibility. I think that's been true for databases or code or whatever that eventually will be true for AI, but not just yet.
I have a very different take on this, which is that I think for the most part, AI is still in the general territory of deep tech. And you read in the press that it's all been commoditized and look at these large language models and they don't build any kind of competitive advantage or any of the things. And I just don't think that's really true as of now.
Yes, you can build one of those thin wrappers on top of OpenAI to do something. But the second you want to try to do something that's more interesting, more specialized, more customized to your needs, you have to go into the more technical stuff. And as you go into the technical stuff, I think it's a lot more complicated than people think and requires specialized skills and specialized people.
There’s a difference between building Pinecone and using Pinecone or building Mixtral versus using Mixtral. And to the credit of the folks building these products, they are significantly easier to use than they were a year ago.
But your contention is that in order to get good results, you have to dip into the tech; it feels at odds with the developer experience, the level of accessibility of some of these things today. Do you think that to build the next Jasper AIs of the world, that you really need to dip that deep into the tech?
I think it's a spectrum. And there's a lot of products on the market. If there's one thing that the MAD Landscape shows is that it's a lot of companies, lots of products. And different products are going to be at different levels of accessibility and abstraction.
From the perspective of a venture investor in companies that are trying to do something with AI to deliver a product for their end customers, it seems that a lot of the value comes both from the application but also the ability to go into the tech and do interesting things with it.
So I think of those companies that this isn’t just prompt engineering. This is providing interesting context. Yes, fine tuning, playing with the weights, all sorts of different variation. And certainly RAG and bringing different types of data to customize all of it. But I think of those companies as full-stack companies. My personal interest as an investor is to gravitate towards companies that have that full-stack approach.
One of your themes was focused on Snowflake and Databricks. You ended that section by talking about the entry of Microsoft’s Fabric into the equation. What's your take on how this plays into this two-party hegemony?
It's early. It feels mostly like an announcement. I personally don't have deep visibility into the reality of the product other than what I hear. Is it a kludgy effort that’s stitching together a different product that's not really integrated in all the things? Possibly, probably. That's how things tend to work and the reality of a product tends to be in large companies two years behind the announcement. I would suspect that there is some element of that going on.
But equally, when a Microsoft comes in with a full-platform approach and already has the formidable advantage of having some massively successful and ubiquitous products like Power BI, in addition to the footprint they have, that’s a major change for anybody that's in the space.
I do think the data infrastructure world is going towards some level of consolidation in an environment where customers have less budget and less resources, less time, less people. And they are evolving away from a best-of-breed approach to more of a platform approach where it may not be the best thing for all the different parts of what I need to do, but I only have one vendor, one procurement process, one contract, one rep, and everything kind of works together.
I think customers want that and are likely to want that for the next few years, and I think you're going to see companies big and small trying to evolve towards that, towards providing a wider footprint.
And then there's the questions of the dbts of the world, where I see you guys are expanding your footprint in analytics and becoming this platform that's going to cover lots of different things, which I think is an exciting play.
We touched on consolidation a little bit. How far into this trend do you think we are? Do you think we’re just seeing the tip of the iceberg? What are the forces that are going to make the 2,000 logos on your graphic go down to 1,500 logos next year?
I think one of them is the cash runway. I love the logos, our startups. And I'm always hesitant getting into that conversation because I'm a huge fan and supporter of the startup ecosystem. That's why I chose to do the job I do. So talking about the deaths of startups and mass extinction and all the things always makes me somewhat uncomfortable.
But having said that, if you look at the world of data infrastructure, the whole excitement around the modern data stack, which was really 2018, 2019, 2020, 2021, many companies were created, and so many companies raised money very quickly at very high valuations. Fast forward to today, it's been sort of crickets in that world.
The heat has really moved away from data infrastructure into machine learning and engineering AI. And there's a separate conversation about whether we're making the same mistakes as an ecosystem in AI now than we did in the past. But look, a lot of companies raise money, and only a few will be able to truly grow into the valuations and build meaningful businesses.
There needs to be functional consolidation. Customers are going to be looking for platforms that do more. So the companies that don't grow into the valuations don't build a platform and all those things. There's a lot of smart people out there that saw it coming and then cut their costs pretty heavily in 2021 and 2022.
But at some point, even after cost cutting, you can only manage the runway for so long. At some point, you just run out of money. And maybe your internal investors, your insiders, put in more money. Maybe you find somebody else. Maybe you do a down round, that type of thing. But even after that, there's going to be a large category of companies that I think are just not going to make it.
In data infrastructure, we’re heading in that general direction. And there's been a little bit of this happening. It's sadly something that happens behind the scenes. Obviously, you tend not to have the front page of TechCrunch talking about smaller companies going out of business. But some of this has already been happening. There's been some acquirers, some soft landings, and that type of thing. But there's going to be a lot more.
The big opportunity for those companies, I think, is to then position away from data infrastructure and become part of the AI stack, which a lot of companies are trying to do.
This newsletter is sponsored by dbt Labs. Discover why more than 30,000 companies use dbt to accelerate their data development.