Discover more from The Analytics Engineering Roundup
Seasonality @ Lyft. Bandits @ Stitch Fix. Data Testing. Hiring for SQL. Retail Calendars. [DSR #162]
❤️ Want to support us? Forward this email to three friends!
🚀 Forwarded this from a friend? Sign up to the Data Science Roundup here.
This week's best data science articles
…a contextual bandit is a framework that allows you to use algorithms to learn the most effective strategy for each individual client, while simultaneously using randomization to continuously track how successful each of your different action choices are.
Wow. I love this post. There are plenty of posts on How to do This Thing in R! but very few good posts on how to actually change the way your organization works using data. This post does the latter.
Stitch Fix has baked multi-arm bandits deep into the core of their customer engagement processes. Instead of marketers doing static segmentation and making static campaign decisions, a series of campaigns are designed and then customers are fed through an algorithm that decides which campaign is most appropriate for each individual customer.
This is not how most marketing teams operate today. It’s a vision of how organizations will, in the future, run differently with data at their core.
How can we predict the daily demand and supply a few weeks in advance? Before starting to predict a raw time series, we need to understand how people ride and drive, and what affects their patterns of behavior; what we call seasonality. Only then will we predict the underlying evolution of the trend, the overall growth of driver hours and passengers ride requests.
If you work at a business that experiences seasonality and haven’t yet done the work to decompose your time series into seasonality and underlying trend, this is a great blog post on how Lyft does exactly that. This is the first step in any forecasting work for a seasonality-driven business.
In my experience, a unified testing philosophy is missing in the data world. If data issues have lasting consequences, why are analysts so much less sophisticated at testing than our software engineering counterparts?
This is one of my favorite absolute topics. There are literally millions of data analysts writing code today and almost none of them test their code to the same standards that software engineers test theirs.
Author Josh Temple, data analyst at Milk Bar, walks through how he implemented data testing, including his CI setup using Gitlab CI/CD.
The author is currently in the process of building a data team from scratch at Sawyer; this post is about his recommendations if you find yourself in the same shoes. Key recommendations:
Don’t worry about making things fancy. Do the simplest thing that works now.
Keep an eye on how things will scale, but rein in your impulses to optimize them. Analytics should be lean and agile, too.
Documentation, transparency, and reproducibility are interrelated and fundamental. Start good habits in these areas now, but expect to iterate on them and change how things are done as the organization matures and grows.
The entire post is great.
At Help Scout, we recently went through the process of hiring a new senior data analyst. In order to apply for the position, we asked anyone interested to answer a few short screener questions including one to help evaluate their SQL skills. Here’s the SQL screener question we asked.
Great SQL brain teaser if you want to do it and then check your answer! The entire post is an awesome dissection of choices analysts make and their ramifications.
Also: if you’re not currently using a technical screener like this in your hiring process, I’m a big believer in it. We have one for all of positions at Fishtown Analytics and it’s worked great. We don’t set the bar too high—the goal is not to get to the top 5% of applicants, it’s to winnow the bottom 50%. Once we apply that filter we can then interview for a broader set of attributes.
The 4-5-4 calendar is a guide for retailers that ensures sales comparability between years by dividing the year into months based on a 4 weeks – 5 weeks – 4 weeks format. The layout of the calendar lines up holidays and ensures the same number of Saturdays and Sundays in comparable months.
If you work at an ecommerce business, this is a must-read. Retail businesses thrive on their ability to compare sales to some prior period (month, quarter, year…) and the year-to-year variability of the Gregorian calendar makes this challenging. The 4-5-4 calendar is a well-established answer to this problem in retail, and the author goes deep on how to implement one in your data warehouse.
Thanks to our sponsors!
At Fishtown Analytics, we work with venture-funded startups to build analytics teams. Whether you’re looking to get analytics off the ground after your Series A or need support scaling, let’s chat.
Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.
The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.
If you don't want these updates anymore, please unsubscribe here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue
915 Spring Garden St., Suite 500, Philadelphia, PA 19123