I Was Wrong About Data Science

Also in this issue: why human in the loop is the best ML model metric, how consumption product pricing gets overcomplicated and how to properly dogfood your product.

Jul 24, 2022

This week I’m doing something a little different. I was inspired by this week’s OpEd series in the New York times called “I was wrong…” and wanted to take a stab at talking about something I was wrong about before in data.

Also featured in this issue:

The Open Loop of ML by Devesh Rajadhyax
DALL·E 2 from Open AI is in Open Beta by Alberto Romero
Self-narrating some usability testing for others by Randy Au

Enjoy the issue!

-Anna

PS: Pst! 🤫 Time to catch up on the Coalesce website: lots of speaker names have already dropped (I’m SO excited about this list of humans — this is going to be so much fun!!), with more to come alongside an agenda. Can’t wait to see you in October 💜

I was wrong about data science

This week, the New York Times opinion column did something quite unusual: every regular columnist published a piece about a time they convinced others of something, but in retrospect, were very wrong.

I found this inspirational. I’m wrong all the time. I can write about that?? Dope.

Data scientist is wrong, by the Dall-E mini team

It was 2017 and I was on the tail end of academia burnout. I remember sitting down with one of my amazing then colleagues and talking about the slog of writing papers.
”What do you mean?” they asked me. “I love writing papers!”. That’s when I knew this wasn’t for me.

I knew I wanted to make an impact with the knowledge I gained in academia over nearly 7 years. I had so much theoretical experience to turn into practice! And then a couple of opportunities presented themselves, and I picked one. I became a Data Scientist.

This isn’t a story about that. It’s the story about what I thought Data Science would be. What I thought it should be. And what I convinced others it should be. And it’s a story of how I was wrong about all of that.

In 2017, Data Science was what Analytics Engineering was in 2020: hot. Humans with experience working on complex problems participating in the development of software solutions in exciting tech companies? Sign me up!

It was a decent exit out of academia that I couldn’t feel bad about — good enough that my postdoc advisor got me a t-shirt: “I’m a data scientist so let’s assume I’m always right.”

I took that kind of literally.

The mandate of the team I joined was something like this:

”Analytics helps the business make decisions on problems they’re already aware of. Data Science looks 6-12 months in the future to surface problems and opportunities that the business should focus on, but doesn’t yet.”

Sounds great! Except for some reason, it seemed necessary to draw a line between this and Analytics. This feels wrong, but OK other people are into it, and I’m getting to work on interesting stuff, so let’s go with it.

There were also other Data Scientists who actually built machine learning solutions that integrated into the product of the tech company. It was made very clear I didn’t have the background or experience for that, but we’ll call me a Data Scientist anyway.

Cool.

The thing they don’t tell you about leaving academia is how much of your identity is wrapped up in the work you do. If you’re leaving, it must be to make an equivalent impact on the world through other means. So you latch on to your identity and sense of self-worth being whatever is the other thing that you end up doing. And you rationalize its value no matter how it is designed, or what it is called.

For me that identity became the person who sees things through data and research that others don’t, and then telling them about it.

I spent a lot of time justifying (mostly to myself) that I’m doing different work than an analytics team because I’m leveraging the experience I’ve built, more advanced methods, etc etc.

Needless to say, this didn’t land super well with the people actually making strategic decisions (or the Analytics team!). Suggestions from quarterly prioritization based on research were examined with modest curiosity, and promptly forgotten.

6 months later, the team was no more, and I was getting folded into the Analytics function. That outcome was my choice, but boy was it scary joining a team you’ve spent the bulk of your time so far distancing from in terms of mandate and day to day operations.

Getting folded into the analytics function also happened to become one of the best experiences of my career ❤️.

I was very wrong about data science. I needed it to fill a space in my identity, to give my career choices meaning and did not at all question what the business needed from my experience in that moment.

What the business needed from me wasn’t my opinions on what decision to make next. What the business needed from me was:

due diligence on metrics ahead of an exit event
understanding user sentiment as a result of business changes
developing mechanisms for data compliance
understanding product usage and helping calibrate pricing, packaging and continued investment

I now know how important those problems were to solve, and how valuable the experience I had solving them was. These are problems I’ve come to genuinely enjoy because they have tremendous business impact. But looking at them through the lens of what I thought was Data Science, none of these were in my job description, and this was my loss entirely.

Elsewhere on the internet…

The Open Loop of ML
By Devesh Rajadhyax

Devesh has shared the final article in a 3 article series and it is 🔥.

The series describes the gap between ML model accuracy in development, and the real world use and performance of the same models. You should read the entire series because Devesh says this better than I can, but in essence: model accuracy is often used to describe how good an ML model is, but it is often misleading. Model accuracy only tells you how well your model performs against already observed events (i.e. the data you have). It tells you nothing about the validity of your model in solving the problem you’re actually trying to solve in the real world.

Parts 1 and 2 are excellent and go into the details of why this occurs and why model accuracy isn’t helpful. But part 3 is where this series gets very interesting: Devesh proposes a different metric for evaluating ML model performance. It is quite literally a measure of the cost of human intervention in your ML loop. The lower the cost, the better your model at solving for a real world problem.

I love this so much. It’s so hard to define metrics and it’s so satisfying to see a metric definition that makes you immediately go: “YES!”. Nicely done!

DALL·E 2 from Open AI is in Open Beta
By Alberto Romero

I enjoyed this article not because of DALL·E 2, but because of the way Alberto talks about pricing of the future service. And the challenges he points out are not unique to AI products — they’re important for any consumption based pricing system.

Most consumption based pricing models are designed with a price point that most closely moves with the cost of providing the service. This makes a ton of sense — it’s easier to reason about business metrics like margins with a model like this, and it is often one of the higher hypothetical ways of pricing your consumption product.

Except a lot of time (just like in the case of DALL·E 2’s open beta), this pricing structure is quite misaligned with the actual value someone gets from the service. This 1) makes pricing hard to predict for a user/business, 2) encourages the development of lots of complex forecasting and estimation tooling for customers that is marginally helpful, and 3) mostly results in sticker shock and consumer wariness of surprise bills. Imagine if you couldn’t predict how your electricity consumption would translate to your monthly bill.

Alberto argues a better price structure would be tied to the value of the thing DALL·E 2 is used to produce — in this case, a “good [image] result”.

I love this conclusion from Alberto because it arrives at the same outcome other industries have already arrived at, such as the digital advertising world. When advertising on a social platform, the cost of the service to the customer has less to do with the cost of providing the service, and much more to do with the value being driven — usually impressions or clicks. We don’t have to look too far for examples like these, and they work very well for a reason :)

Self-narrating some usability testing for others
By Randy Au

I’m a big fan of dogfooding products internally. What could be better than using your own product and giving your development team feedback about it to shorten the iteration and innovation cycle? Except it doesn’t always work out quite this successfully, and a lot of this has to do with how feedback is presented. There’s a good reason user experience research exists as its own function — part of the talent and skill involved in the work is knowing when to ask your users to double click on something (metaphorically and literally) and show you something unexpected. You end up learning a lot not just about what’s broken, but what people are trying to do with your product. That last one may or may not be at all correlated with your actual feature set ;)

That said, dogfooding isn’t dead quite yet. Like I said, I love it. I also love Randy’s post this week because it does such a good job breaking down what you, the internal human working with pieces of the company’s product, need to do so you can communicate the best information about your product experience in absence of a great UX researcher helping you to do so.

It turns out (and I completely agree with Randy here) is that it’s better to show than tell. To describe exactly what you were trying to do (what problem you were trying to solve for yourself), steps you took to get there, and what happened. This gives the team working with this information SO much context: what if they hadn’t even considered the problem you’re trying to solve? What if the journey you took isn’t what was designed? Etc.

Putting this descriptive information into the hands of your product and engineering team is far better than the alternative of “I think you should improve X thing by doing Y thing”. The latter results in a pile of tickets for your product team that have little context to help them prioritize the thing you propose. Instead, do your best to let your experts, that is your product managers, your engineering and design teams, figure it out.

Great post, Randy!

That’s it for this week! 👋

The Analytics Engineering Roundup

Discussion about this post