Orchestration. Multi-Armed Bandits. Knowledge Sharing. Getting your Data Back Out. [DSR #234]
❤️ Want to support this project? Forward this email to three friends!
🚀 Forwarded this from a friend? Sign up to the Data Science Roundup here.
This week's best data science articles
Data Orchestration — A Primer.
Until recently, data teams used cron to schedule data jobs. However, as data teams began writing more cron jobs the growing number and complexity became hard to manage. In particular, managing dependencies between jobs was difficult. Second, failure handling and alerting had to be managed by the job so the job or an on-call engineer had to handle retries and upstream failures, a pain. Finally, for retrospection teams had to manually sift through logs to check how a job performed on a certain day, a time sink. Because of these challenges data orchestration solutions emerged.
Last issue I dug deep into orchestration and talked about why I’m excited about Dagster, but if you haven’t been living in orchestration-land for a long time that post may not have hit home. This is a fantastic overview of the space if you’re new to it.
The author, Astasia Myers, is a VC @ Redpoint, and she spends a lot of time in data. Not a bad feed to subscribe to; I read everything she writes.
Multi-Armed Bandits and the Stitch Fix Experimentation Platform
Just…wow. I really couldn’t be a bigger fan of this post. It combines whole-company experimentation efforts + custom in-house data tech + multi-armed bandits, plus…data culture.
Really, this is such a fascinating seam for me. The hard part isn’t data tooling or algorithms or experimentation… It’s the systematic application of tooling and algorithms and experimentation within the operating system of the company. Building processes and technology and culture that actually enable successful experimentation at scale:
The idea is to have #oneway to run and analyze experiments across the entire business. The same platform is used by front-end engineers, back-end engineers, product managers, and data scientists. And it’s flexible enough to be used for experiments on inventory management and forecasting, warehouse operations, outfit recommendations, marketing, and everything in between.
So yeah, this post is kind of about MAB’s (which are interesting in their own right) bit it’s really about the massive difference between doing some cute A/B tests on your home page vs. enabling mass experimentation across your entire organization. The former might get you a short-term boost in conversion rate; the latter requires alignment and investment across your entire org but can create a long-term sustainable advantage.
multithreaded.stitchfix.com • Share
…the problem that I actually feel is most pressing but that very few people seem to be working on is a consistent means of publishing, reproducing, and iterating on knowledge within an organization.
This has been a long-time hobby horse of mine, but Kaminsky takes more time to fill out the idea and explain why it’s a good one. I don’t have a lot to add—he says it perfectly.
I do want to say though, that it’s really irksome that those of us who aren’t super-interested in working inside of bigtechcos are on such an intense delay to get access to stuff like this. Airbnb published on this topic in 2016 and there’s still no corresponding commercial product. I want to live in this world today :(
Making your dbt models more useful with Census
Do more with your dbt models. With Census you now materialize them directly into your external tools like Salesforce, Marketo, Customer.io, etc.
Whoah. This is super-fucking-cool. We knew that the folks at Census were hip to dbt, but we didn’t know they were planning on writing a first-class product integration!! We’ve started playing around with it recently and 100% believe that this is going to be a massive new set of capabilities for the modern data stack.
I think it's clear that for many smaller companies that invested in deep learning, it turned out not to be essential and got cut post-Covid as part of downsizings. There are somewhat fewer people doing deep learning now than half a year ago, for the first time since at least 2010
Good thread—@fchollet presents data a couple of tweets down if you click through. I’m in agreement with his measured take; I think this is more about Covid than it is about another “AI winter,” although there are certainly lots of folks in this thread on HN who disagree.
New Tools I'm Watching
I’m always monitoring the data tooling landscape, and I figured I’d start sharing some of the more interesting products I come across.
Continual IQ: Super cool! The product is very early so there’s nothing to look at or try, but I got a demo last week. It’s auto-ML in the modern data stack. Plays very nicely with Fivetran / Snowflake / dbt. This solves a real need—hope the team releases something I can use soon!
Transform: Founded by some of the folks who made the Airbnb metrics platform, Transform aims to solve many of the same problems. I’m very interested in this theme, although have no insider knowledge of what exact approach they’re taking.
Thanks to our sponsors!
dbt: Your Entire Analytics Engineering Workflow
Analytics engineering is the data transformation work that happens between loading data into your warehouse and analyzing it. dbt allows anyone comfortable with SQL to own that workflow.
The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.
If you don't want these updates anymore, please unsubscribe here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue
915 Spring Garden St., Suite 500, Philadelphia, PA 19123