Ep 26: What’s the role of AI in BI?

What human-in-the-loop AI looks like in practice, with Amit Prakash of Thoughtspot.

May 07, 2022

Amit Prakash is Co-founder and CTO at ThoughtSpot. He has a deep background in search, having previously led the AdSense engineering team at Google and served on the early Bing team at Microsoft.

In this conversation with Tristan and Julia, Amit gets real about the promise of AI in data: which applications are being widely used today, and which are still a few years out?

Listen & subscribe from:

Show Notes

Key points from Amit in this episode:

What was the vision for the company when you decided to found ThoughtSpot and how has it evolved?

So, when we were getting started, we looked at the market and what we saw — and not much has changed actually — is it when you talk to data teams, they're inundated with these ad hoc requests for data and analytics, and these things don't add too much value in the long-term, it's just point-in-time requests and they keep adding and accumulating, right? And that keeps the data teams away from doing real meaningful work. And if you talk to any of the execution teams, most of the people don't really engage with data in a meaningful way.

Beyond what's there is a static dashboard, they're not kind of asking the why questions, they're not asking how can I improve what I do by leveraging data. They're just kind of looking at a dashboard periodically and saying, yep, things are good or things are bad, something like that.

And so we wanted to solve both of these problems where you can truly get the entire company to be data-driven. And when I mean the entire company, it goes all the way from CEO to frontline merchandiser or a frontline customer success person, all the way to even customers in some cases, to truly have access to data and be able to ask their questions and get answers. So that was the vision.

The way we solved it, I think it's kind of a unique combination of UX, AI and system design to build something that can really operate on granular data and not require a lot of different data sets being created from the same data to feed different requests. And on the front inside we built something that looks and feels like Google, but yet on the backend side is completely governed by the data teams and gives you precise answers to precise questions and it's intuitive and easy to use.

Where's the overlap in your mind between AI work and data work in the data consumption space?

I think, definitely on the modeling side, you mostly want humans to do the right thing. On the consumption side, I think there's a lot of merit in human-in-the-loop AI, essentially, right? So you don't want AI to kind of run wild on its own because it's like you said, it's a black box, just not explainable. And most importantly. No matter how many advances have happened in AI, it's really hard to teach an AI algorithm to learn the business context and the business, the meaning of data.

There's kind of two places where found AI to be super useful. And the first one is more like how Google uses AI. So the example I like to give is that if you go and search for pain in the bottom of your foot, the first result is going to be plantar fasciitis, right? And how does Google get to know? When it's not just one example, but millions of such examples where Google knows just the, exactly the right thing to the surface so that when you see it, then, you know, yeah.

That's the thing I needed. Right. And the answer is really great. Is learning this from all of us. We are the ones making Google smart because somebody knew plantar fasciitis means pain in the bottom of my foot. And they probably searched for pain at the bottom of my foot and then subsequently searched for plantar fasciitis and got some good results and spend time on it. And that told Google that this is a good thing to surface. And if enough of the people do it, then Google gets smarter.

So the same kind of thing happens in our search. We're building a machine learning model. That's personalized to you, but it's learning from everybody else. So if someone asks the question, like how much revenue did we have this quarter? When you're counting the revenue from different Salesforce opportunities, you care about the close date being in this quarter.

But if someone asks the question, what's the total pipeline created this quarter. Now you're counting the same opportunities, but you're looking at the creation date for each opportunity. And this is the kind of thing that if you ask a novice to use or they might miss, but if the right recommendation is in front of them, then they'll know that this is what they needed to do. And so, so this is one place where we found AI to be super powerful.

The traditional drag and drop interface that a million different products have gives you a menu to choose from. Do you have a way of showing people rather than just like dumping them on a blank page and saying like, go to town?

Yeah, so there are two ways in which the users interact. So, there's the pure Google like search experience and what we do is that your entire data model is sitting in the left panel. Think of it almost like when you're in an IDE, the left panel kind of gives you all the variables and class names you have, or sometimes all the filings you have.

So it's the same kind of thing that you can have. And if you don't like typing, you can just go double quick on revenue and sales and things like that, right? Then the next question is why is that any different? Why is that any different than just dragging it versus double-clicking?

The reason it's easier is that we take away a lot of overhead intellectual overhead, and formulate the question for you by a combination of things. So for example, if you say Texas, you don't need to know, Texas is sitting in some customer table and the customer state column or maybe the customer ID links to a geo table.

And then from there, Texas lifts their wheat. We've kind of indexed everything and we already know what the possible meaning of Texas in this context is going to be. And the other thing is that you talked about options rates. So there's an auto-completion engine in which a lot of smart schools that's trying to give the most likely five options that you're going to be interested in.

So if you just click on the blank search bar data completion engine is probably going to give you, like three most popular metrics and two most popular attributes that people have been asking questions about it. Then you kind of go on from there.

The other place where this becomes super useful is, and this is something that sometimes our messaging is a little bit confusing. What most people call it dashboard, we used to call them Pinboard, and we renamed it to be light boards, and people say, what's the difference? Why are you just kind of taking a familiar paradigm and calling it something else?

And the reason for that is that, when, when people design natural food, someone has to really think through all the different bats the user's going to take what kind of filters. And we do allow on top of it, what kind of drill-downs I'm going to allow and things like that. The way these things are designed is that there's kind of everything is backed by a search. And there's a concept of closure of going from one side to the other.

So if you're looking at a chart, let's say it's your monthly active users, every point you can do whatever you can think of in terms of his neighborhood and just get to another search through a UI. And that's the one that really takes away that fear of blank screen because now you're in the zone, now you're in the context and you're being suggested all sort of neighboring questions arrived there in the context, and then you can go from there.

Is it possible to get the insight of like, oh, well actually our website was down on Tuesday at 2:00 PM and that's why my signups were down auto surfacing. Those correlations are a very, very hard problem to solve in AI and data. Do you think it's possible for computers to do it better than humans can in creating these hypotheses?

I don't think you can fully automate these things but AI can help a lot. So again, I come back to my pieces that are in a human-in-the-loop AI is the most powerful thing to essentially elevate people to where they can have a meaningful discourse for data.

So this particular example, what we end up doing is say, "okay, you've seen two points that should have been close together, but they are far apart". You just select those two points and let our age and rep, and what it's going to do is seen that any human being will do. Okay, I'm going to compare these two points along all the dimensions that I know of to be interesting. So maybe I'll see, how does it split by geo and different? If my Monday and Tuesday activations, how did they compare? How do how's it split by different referral sources or how does it split by different data centers that my users are hitting it in.

And if it finds some statistical anomaly that says that, okay, this data center used to bring 20% of the new user. And on Tuesday it brought zero or maybe 1%. That means that the data center was down most likely or something is going wrong. Or maybe if you split it hourly and that hour has a statistical anomaly, then it can surface.

Now, what happens when you do this? Is that. So the algorithm is sort of relatively dumb. They don't really understand the business, meaning of it. All they see is a statistical anomaly, and this is where the human needs to come in and say, huh, this is the one that makes a lot of sense.

But what we did was we saved you a ton of time and stress by like in the next 10, 20 seconds and bringing all the possible hypotheses in front of you. So this is, again, something very close to my heart. I spent a lot of time working on it.

One of the things that tend to work in the space is, again, learning from the wisdom of the crowd. So the same algorithm that's suggesting new questions to ask, and the art of completion is actually guiding the exploration in this case. It's saying that when people talk about active users they tend to spit this number by state, by activation time by data center or by reference sources. So that's what I'm going to do because I've learned from all these humans who none the, and fast.

Now, if the humans haven't done it, then this guy is somewhat curious and then you might be able to intervene and direct the exploration in different directions.

Once you've done the hard work of getting into these larger organizations, or you've figured out how to deal with complex setups, what is the result? Is the time spent with the data team going to be focused on other activities now that more of your organization can be data analysts themselves and explore their own data?

So, both of these things are happening. What I see is the time to make decisions and the agility with which you can make decisions, it goes down dramatically. It's no longer the case that you were sitting in a meeting and some question came up and then you said, okay, we'll come back next week.

And then we'll ask the data team to bring back the data and then we'll look at it. You just kind of open it up right there and got it. One of the most iconic technology companies in the Valley is known for kind of a very strong central leadership what their culture is like is that the mid-level managers come to the meeting and they get drove and drove and drove by the leadership about like every aspect of their job, every possible data from operations. And what they used to do was essentially literally print out 70 pages of charts and go to the meeting. And someone asks the question, they're flipping through pages, and trying to answer the question. And so all of that transforms into, okay, here's the. Yep. You've questioned. Let's drill down here kind of thing.

And what's happening to the data teams is that they are now able to engage in much more meaningful long-term value things. So one of the things that I routinely see is when we go to a company, somebody or the other, we'll have this kind of question that are you trying to take away?

My job, are you trying to replace part of my job? What ends up happening after deployment is that they actually get promoted because now they're able to attach themselves to much higher value work and be more strategic.

The Analytics Engineering Roundup

Discussion about this post