Discover more from The Analytics Engineering Roundup
Ep 45: The arc of data innovation (w/ Bob Muglia, former CEO of Snowflake)
You probably know Bob Muglia. The former CEO of Snowflake led the company during its early, transformational years after a long career at Microsoft and then Juniper.
Bob recently released the book The Datapreneurs about the arc of innovation in the data industry, starting with the first relational databases all the way to the present craze of LLMs and beyond.
In this conversation with Tristan and Julia, Bob shares insights into the future of data engineering and its potential business impact while offering a glimpse into his professional journey
Listen & subscribe from:
Key points from Bob Muglia in this episode:
You started your career in databases; has that been a constant for you?
Mostly, but I've also done many other things. When I was at Microsoft, I was a pinch-hitting player that would go into organizations that needed transition in some form and would work on them for a few years, and then move on to another role.
I started in SQL Server, but then shortly after that became the OS/2 program manager, and my primary role there was to kill it. I was part of the firing squad, but I didn't really fire the gun. To be fair, it was senior management at Microsoft that really did that.
My boss, Paul Maritz, and Bill and Steve really made those decisions, but I was very much involved in the transition to Windows NT and then ran the Windows NT Program Management Team. After that, I took on Visual Studio. In '95 or '96 I was running that team and that was the year we created Visual Studio. It didn't exist before that. My job was to make the C++ team and the Visual Basic Team work together. They were fighting like cats and dogs! They mostly fought with each other to work together in one way. It was the creation of Visual Studio that really did that.
Tristan: Were you involved in .NET?
Bob: I was the poor dude that signed the Java contract for Microsoft. So, I was the one that went to court and you know, and dealt with all of the court cases on Java, including the Silicon Valley court case with Sony, and I was the primary witness for Microsoft.
Then I also played that miserable role in the DOJ trial that came shortly afterward. I often say that at Microsoft I saw the good, the bad, and the ugly, and that would be the ugly. I was very involved in the antitrust work that happened in the early 2000s. And in fact, in 2005, I was traveling to DC once a quarter to meet the DOJ to help dig us out of a protocol documentation hole that we had dug ourselves into. Microsoft is amazing. They've done such an amazing job becoming a responsible company, and I think many of the lessons that happened in that timeframe helped to teach the senior executives of the company an awful lot. It's really quite impressive to see how far they've come.
When did the dominoes start falling in the market where enterprises got comfortable with moving data to the cloud?
I would say it depended on the enterprise. Our first customers at Snowflake were customers that had a lot of data on the cloud but did not have a lot of regulatory issues and their privacy concerns were less. This meant the categories of customers were marketing, advertising, and online gaming. Those were our first customers and their requirements for security and compliance, while very much present, were not nearly the same level of concerns that a financial organization would have.
People had a lot of data. Often they were working with semi-structured data and if you go back to 2014, 2015, that was a misery for people. I mean, they had massive amounts of data being thrown off by internet and cloud-based systems.
The only tool they had to work with at the time was Hadoop, which is a very difficult product to manage and very few customers were successful at deploying Hadoop and getting good results from it. We were able to go in and find customers that had real data problems that couldn't solve them.
Snowflake was almost a miracle to them because it solved the problems in a way that nothing else did. The two places at first were these semi-structured customer sets. The other case that was important was Redshift, because Amazon did amazing things.
I give Amazon so much credit for the cloud. They paved the roads for the cloud, but they also paved the roads for the data warehouse in the cloud because they released in 2012 a decent product. Not a great product, but a decent product. Redshift works quite well until it hits a scale limitation, at which point it starts working very poorly. What we found was that many of our first customers were Redshift customers who would hit the wall and needed an escape hatch. Snowflake was the escape hatch for them.
Do you think the phrase “artificial general intelligence” (AGI) has any place in pragmatic, business-focused conversation today, or is it still a scientific curiosity that maybe one day we'll get there?
I don't think it's a short-term issue that business people need to be focusing on. To answer your question: no, I don't think so. However, the horizon for it has moved in dramatically from where I used to sit. I
If you'd asked me in 2020 when we would have something like artificial general intelligence, I would've said maybe 2100; I don't care about that because I won't be here. But now I think it may be 2030 and I'm like, "Holy cow! I hope to be here for that." The awakening to me is that the horizon has moved in by a huge amount, at least from my perspective.
Tristan: So it’s still maybe too far away to put it into your financial plans, but something that's not totally in the realm of science fiction anymore.
Bob: Definitely not. Not anymore. Here's the big thing, Tristan, which is just brand new for the first time in my entire career: we have intelligence in these machines that has characteristics like the ability to reason and think and think through things, which was never present before. It was never there before. In essence, my history and focus has been on taking data and turning it into knowledge. And knowledge is basically data that has been analyzed and conclusions reached from it. That's what data analytics is all about. To build knowledge about an organization. When you can take knowledge that you've built up through the data collection that you have, and then you can combine the analysis that you've done from it with the intelligence of these models, the potential for business is quite dramatic.
I think that's what we're going to see in the next five years. We don't need AGI to have massive business transformation because we now have the ability to more easily encapsulate knowledge and then take intelligence, which can run in a machine, and apply that intelligence to knowledge. Including potentially the analysis of data, which is one of the most interesting character characteristics of this.
Finally, machines can actually do the data analysis for us. We'll start to see that. I think BI is going to change dramatically in the next three years. We're going to move from the primary language of BI being SQL to the primary language of BI being English, with SQL being the intermediate language that determines the actual query that's running.
It seems reminiscent now that enterprises are a bit fearful of sending their data to third-party cloud-hosted LLMs, like OpenAI. Is that similar to the early days of Snowflake where Goldman Sachs said no and eventually came around? What will it take?
What it took literally was: to win the financial service industry we set upon winning Capital One. Our goal was to win Capital One. They were the most advanced financial organization on the cloud. They were the first to adopt the cloud and the most committed, certainly. They were on AWS, which was our primary cloud. Working with Capital One was incredible because, at first, they were very strong Amazon customers, so they were very strong on Redshift.
We were aware they were having some challenges on Redshift, so we gave them a little time where they had to figure out what was right for their company. I wound up together with a number of my engineers meeting with Capital One.
They specified what turned out to be the product they deployed. The virtual private Snowflake product. That product was specked. I can tell you that the product was created on an airplane from San Francisco. I had the PowerPoint slides. I was flying out to DC to meet with them. And I described the whole product because I'd met with Capital One once and learned what they really needed. My program manager specified all the products, then I called my team when I landed in DC and said "Okay, this is what I'm gonna present tomorrow. Tell me if I'm way off base." And they corrected me on a few things. That became the initial definition of the specification.
There were some key things, like any place where a key exists in memory, it has to exist in a virtual machine that is isolated, not shared in any way. All the data had to be stored in a place where Capital One had control and all the metadata as well. That's what resulted. And we needed to do multi-region failover. That was the other big thing. The ability to support business continuity. So, that eventually became the product spec that turned into both virtual private Snowflake. That also was fundamental in the business critical product, and that defined what enterprises could use.
Now, what's interesting is that the architecture of Snowflake, which is you run Snowflake inside a Snowflake BPC, that architecture is somewhat problematic for some classes of customers with the highest level of security requirements where they want to run it inside their own environment.
That's certainly true for the federal government, but it's also true for some enterprises. I've learned there's another class of enterprise that's maybe even more sensitive than financials. And that's enterprises like Raytheon and Northrop Groman, all the companies that do government contracting.
People are moving more and more to an architecture where you run inside the customer's VPC. I think Snowflake will do more of that too now that they support Iceberg, where the data is sitting inside a customer environment.
What all of this means is you talk to your customers and you learn from them. That's all I just said. I just said I met with my customers and I listened to them and we learned from them.
There's a quote in Datapreneurs, “SQL served us well, but the time has come to augment the new approach. The days of SQL dominance are over.” So what does “augment” look like to you?
It’s important to go back and look at where SQL came from. It was a database language that was focused on a new relational model, which was a mathematical model that Edgar Codd created. There’s an algebra, which is a set of ordered statements that you can make where you run them in order and it would perform a set of operations, and there's a calculus, which is a set of unordered, declarative statements that describe what you want to do, and Codd's law says that those two things are equivalent. They're semantically equivalent. That's why essentially, we have these query processors. In essence, SQL is not pure calculus, but it is a database language that acts like a calculus where you describe what you want to do and the query optimizer does the how. Now, that language is not complete for relational semantics, and in particular, you cannot model.
What you can model in SQL is tables. Data around tables. But if you look at the real world, a table is not necessarily the best representation for data. In fact, people are talking about different kinds of databases—knowledge graphs—where you abstractly define all of the different concepts and their relationships, and also potentially the equations that connect those together.
In other words, the business model. I think we're going to start to see languages that allow you to model the full business and the full business semantics. That will become the underlying semantic model that allows us to create a data model from that.
I will go further and say that I think data engineering will evolve over time to become business engineering. And that will become really the new discipline. By 2030 that will be the discipline for sure. Probably sooner.
Thanks for reading The Analytics Engineering Roundup! Subscribe for free to receive new posts and support my work.