Seasonality @ Lyft. Bandits @ Stitch Fix. Data Testing. Hiring for SQL. Retail Calendars. [DSR #162]

❤️ Want to support us? Forward this email to three friends!

🚀 Forwarded this from a friend? Sign up to the Data Science Roundup here.

This week's best data science articles

Stitch Fix: Your Client Engagement Program Isn't Doing What You Think It Is.

…a contextual bandit is a framework that allows you to use algorithms to learn the most effective strategy for each individual client, while simultaneously using randomization to continuously track how successful each of your different action choices are.

Wow. I love this post. There are plenty of posts on How to do This Thing in R! but very few good posts on how to actually change the way your organization works using data. This post does the latter.

Stitch Fix has baked multi-arm bandits deep into the core of their customer engagement processes. Instead of marketers doing static segmentation and making static campaign decisions, a series of campaigns are designed and then customers are fed through an algorithm that decides which campaign is most appropriate for each individual customer.

This is not how most marketing teams operate today. It’s a vision of how organizations will, in the future, run differently with data at their core.


How Lyft Deals with Seasonality

How Lyft Deals with Seasonality

How can we predict the daily demand and supply a few weeks in advance? Before starting to predict a raw time series, we need to understand how people ride and drive, and what affects their patterns of behavior; what we call seasonality. Only then will we predict the underlying evolution of the trend, the overall growth of driver hours and passengers ride requests.

If you work at a business that experiences seasonality and haven’t yet done the work to decompose your time series into seasonality and underlying trend, this is a great blog post on how Lyft does exactly that. This is the first step in any forecasting work for a seasonality-driven business.


Automated Testing in the Modern Data Warehouse

In my experience, a unified testing philosophy is missing in the data world. If data issues have lasting consequences, why are analysts so much less sophisticated at testing than our software engineering counterparts?

This is one of my favorite absolute topics. There are literally millions of data analysts writing code today and almost none of them test their code to the same standards that software engineers test theirs.

Author Josh Temple, data analyst at Milk Bar, walks through how he implemented data testing, including his CI setup using Gitlab CI/CD.



Building a Data Practice from Scratch

The author is currently in the process of building a data team from scratch at Sawyer; this post is about his recommendations if you find yourself in the same shoes. Key recommendations:

  • Don’t worry about making things fancy. Do the simplest thing that works now.

  • Keep an eye on how things will scale, but rein in your impulses to optimize them. Analytics should be lean and agile, too.

  • Documentation, transparency, and reproducibility are interrelated and fundamental. Start good habits in these areas now, but expect to iterate on them and change how things are done as the organization matures and grows.

The entire post is great.


Analyzing 89 Responses to a SQL Screener Question for a Senior Data Analyst Position

At Help Scout, we recently went through the process of hiring a new senior data analyst. In order to apply for the position, we asked anyone interested to answer a few short screener questions including one to help evaluate their SQL skills. Here’s the SQL screener question we asked.

Great SQL brain teaser if you want to do it and then check your answer! The entire post is an awesome dissection of choices analysts make and their ramifications.

Also: if you’re not currently using a technical screener like this in your hiring process, I’m a big believer in it. We have one for all of positions at Fishtown Analytics and it’s worked great. We don’t set the bar too high—the goal is not to get to the top 5% of applicants, it’s to winnow the bottom 50%. Once we apply that filter we can then interview for a broader set of attributes.


Creating a 4-5-4 Retail Calendar using SQL and dbt

The 4-5-4 calendar is a guide for retailers that ensures sales comparability between years by dividing the year into months based on a 4 weeks – 5 weeks – 4 weeks format. The layout of the calendar lines up holidays and ensures the same number of Saturdays and Sundays in comparable months.

If you work at an ecommerce business, this is a must-read. Retail businesses thrive on their ability to compare sales to some prior period (month, quarter, year…) and the year-to-year variability of the Gregorian calendar makes this challenging. The 4-5-4 calendar is a well-established answer to this problem in retail, and the author goes deep on how to implement one in your data warehouse.


Thanks to our sponsors!

Fishtown Analytics: Analytics Consulting for Startups

At Fishtown Analytics, we work with venture-funded startups to build analytics teams. Whether you’re looking to get analytics off the ground after your Series A or need support scaling, let’s chat.


Stitch: Simple, Powerful ETL Built for Developers

Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.


By Tristan Handy

The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.

Tweet Share

If you don't want these updates anymore, please unsubscribe here.

If you were forwarded this newsletter and you like it, you can subscribe here.

Powered by Revue

915 Spring Garden St., Suite 500, Philadelphia, PA 19123