Ep 3: Brian Amadio On Experimentation @ Stitch Fix

Dive with Brian into the world of experimentation at Stitch Fix, and learn how they execute experiments even with limited data.

Jul 29, 2021

Brian Amadio is a Data Platform Engineer at Stitch Fix, where experimentation underpins everything they do across merchandising, planning, forecasting, operations and more.

In this conversation with Tristan, Julia, and Brian you’ll get into the weeds of executing multi-armed bandit experiments and learn how you can perform experiments even with limited data.

Listen & Subscribe

Listen to the full episode from the player below, or find your player of choice from the links beneath it.

Listen & subscribe from:

Show Notes

Key points from Brian on the practice of experimentation at Stitch Fix:

How is data used at Stitch Fix?

Data science, from the beginning, has been core to the Stitch Fix business model. So, it's to this day really at the heart of everything that we do. The entire business has some kind of algorithm behind it.

You can start with merchandising, planning, and buying all the clothes that we need to buy, forecasting for that, demand modeling, even developing new styles - we use data science for that. And then you have things like operations, shipping, logistics, inventory management, and from there, it's the more sort of traditional data science that people might be familiar with. Of course, we have a recommendation engine for our styling, so the way that styling for fixes works is that we have an algorithm that will narrow down the inventory to a curated set, and then we have a human stylist at the end who always makes the final decision.

And now, of course, we have data science behind lots of experimentation, behind our client-facing website, which has been growing and is expected to grow up.

What is a multi-armed bandit? Why is it useful?

A multi-armed bandit is an algorithm to experimentally discover the highest ROI choice among a set of possible choices that all have some kind of random output.

The classic example and the reason it has the name multi-armed bandit that people talk about is if you had a bunch of slot machines and you have a limited amount of money that you can put into these machines, and your goal is to figure out which machine on average pays out the highest, while at the same time spending as little money as possible. So, the multi-armed bandit helps you to trade off between the exploration that's needed to gather evidence about each of the options with the exploit side, which is: you want to maximize your winnings by picking the one that seems to be the best.

I think it's maybe a little more than an optimization problem. I would even say it's kind of the first introductory reinforcement learning problem where you're incorporating feedback from some system that you're interacting with in order to improve your models as you go.

What does a multi-armed bandit look like in practice at Stitch Fix?

Recently we started using a multi-armed bandit for landing page optimization. So, this was actually a pretty big win - we just had the results announced about a week ago.

The idea is when a visitor comes to stitchfix.com through some external source - say they click on an ad for stitch fix - the first page that we show them probably gonna have a pretty big impact on how they perceive our brand and how they interact with us. So, we had a huge variety of options for these landing pages, lots of different creatives that we could be using, and we needed a way to figure out which one is best.

Because we had a big variety of options and because the traffic rate that we get is not like Google or Facebook scale we're talking about here - we have a limited amount of data to work with and a big variety of options to try out - a multi-armed bandit really became the right choice for this problem.

What do you do about false starts if you've sent a lot of the traffic to a landing page which looks really promising?

So, there's a few ways you can handle that. The first is by choosing the right strategy for selecting arms. Thompson sampling is the one we use for this particular experiment, and it sort of naturally handles this for you as you go.

In the early part of the experiment, you don't have much data. The way Thompson sampling works is it'll tend to select more randomly in the early stages. And then as you go through the experiment and collect more evidence, the bandit will naturally tend to start preferring the ones that appear to be better and selecting them more often.

So it takes into account the amount of evidence that you have - that helps. Also, something we haven't done, but I think is an interesting idea is to kind of stack bandits. You could have something like Thompson sampling, which could converge, but you want to make sure you're still doing a little bit of exploring even after you seem to have converged. You could add in a little bit of Epsilon greedy in there just to make sure you're, you're still picking the other arms at random, for example.

What can you do if you're trying to make sales and marketing experiments but have a limited amount of data?

There's a couple of things you can do. You can do AB tests and just reduce your power requirements by a lot with the idea that we don't actually care that much if being underpowered, as some people argue, is not that big of a deal.

If you're looking for big effects or if you're just trying a lot of different things, it may not be a problem. And if you have false positives, you could even argue that that's also not a big problem, right? That 5% threshold that people always say for false positive rate is pretty arbitrary and depending on how much risk you're willing to take you can play with that quite a lot.

And then, yeah, of course multi-armed bandits we've already mentioned before is another way to sort of get more out of a limited amount of data, which is pretty important at Stitch Fix as well. Like I mentioned, we're not huge scale like Google or Facebook, we are often experimenting on shipments, which take a couple of weeks to turn around and get outcome metrics on. So it's pretty common for us to have an experiment that lasts months.

We have shipments that are pretty big, so we can still have hundreds of thousands or millions. It can take a long time to get there. And yes, if you do have a limited sample size of bandit, especially for something like marketing, I think it could be a really valuable thing to use.

Looking 10 years out, what do you hope to be true for the data industry?

What I hope is that people, data scientists and engineers, in general, need to start caring more about the impact of their work. I think it's something that's really not focused on much, people care a lot about what you can do and how you can do it and less about if you should be doing it, what the broader impact your work can have on the people using your software or just on society in general. I do see some signs that people are aware of this and care about it. But I hope that continues to grow over the next 10 years.

In academia, whether academics have a responsibility to the public, at least from my experience, most academics would say "probably not". They're focused on their work and their responsibility is to their research because they view the work they do, at least in science, it's about advancing human knowledge for posterity. That's the goal, right?

But I think there has to be a little bit more responsibility because if you're going to argue that your research is for the benefit of society, actually it's not benefiting anyone because it hasn't been communicated or people are not being educated correctly to understand it or to get the value out of it. Maybe that is a responsibility of academics as well.

Links mentioned in the post:

Brian’s blog post about Multi-Armed Bandits and the Stitch Fix Experimentation Platform.
Beware the data science pin factory: The power of the full-stack data science generalist and the perils of division of labor through function by Eric Colson.

More from Brian Amadio:

You can also find Brian on Twitter @BrianAmadio and take a look at Brian’s blog to find all of his posts in one place.

The Analytics Engineering Podcast features conversations with practitioners inventing the future of analytics engineering.

New episodes are published every 2 weeks, along with the companion Analytics Engineering Roundup newsletter.

To get each edition of the Podcast + Roundup to your inbox, subscribe below:

The Analytics Engineering Roundup

Discussion about this post