Data in Tech: Insights from Spotify, Airbnb, Instacart, Zulily, Soundcloud, and Lyft [DSR #107]

Oct 15, 2017

There were tons of excellent posts about data from innovative tech companies this week. Each article tells a story of a data team operating at the top of its game, achieving truly unique things.

If you’re in or around Philly, we’d love to have you join us, Stitch, and Snowflake at an event we’re hosting on 10/25. Free drinks and plenty of data folks to hang out with. Register here.

Enjoy!

- Tristan

❤️ Want to support us? Forward this email to three friends!

🚀 Forwarded this from a friend? Sign up to the Data Science Roundup here.

Focus on: Data in Tech

Spotify’s Discover Weekly: How machine learning finds your new music

Whoah. I’ve wanted to read this article for two years. I’ve been a Discover Weekly fan since they launched the feature in 2015 and this is the first writeup I’ve seen of how it works. My favorite surprise: it incorporates a raw audio CNN to find tracks with directly similar characteristics, skipping the human element completely.

Really excellent read.

hackernoon.com • Share

How R Helps Airbnb Make the Most of Its Data

At Airbnb, R has been amongst the most popular tools for doing data science in many different contexts, including generating product insights, interpreting experiments, and building predictive models. Airbnb supports R usage by creating internal R tools and by creating a community of R users. At the end of the post, the authors provide some specific advice for practitioners who wish to incorporate R into their day-to-day workflow.

This is the deepest writing to-date on one of the most innovative, collaborative data teams in the world. Must read.

peerj.com • Share

Instacart: No order left behind; no shopper left idle.

Using Monte Carlo simulations to balance supply & demand in a marketplace.

tech.instacart.com • Share

Zulily: How we used Kubernetes and Google Cloud to expose our Big Data platform as a set of RESTful web services

Zulily’s data infrastructure focuses on building a wrapper around the entire stack so that they could deliver heterogenous services with a homogenous API. Powerful, but quite a challenging task.

Unique read.

zulily-tech.com • Share

SoundCloud's Data Science Process

Based on the experiences of our Data Scientists, we distilled a set of steps, tips and general guidance representing the best practices that we collectively know of and agree to as a community of practitioners.

developers.soundcloud.com • Share

Lyft: A case study in multivariable testing

After figuring out what the issue was, we revamped the experimental design and learned that what we intuited was right — user churn did increase with increased auth frequency.

Fascinating story of an subtle flaw in experimental design.

eng.lyft.com • Share

Other great posts this week

Segment vs Fivetran vs Stitch: Which data ingest should you use?

Data Pipelines can be time-consuming to build. Should you buy Segment, Stitch, or Fivetran to do it for you?

Stephen Levin, the author, is an analyst @ Zapier, member of the dbt community, and super-smart guy. I don’t 100% agree with everything in his post (I’m less of a Segment fan), but this is an excellent take on an important topic.

www.stephenlevin.co • Share

What Makes A Good Data Scientist At A Small Company

I’ve noticed that the things that I value and the things that work for me aren’t always the same things that big companies tend to recruit for. So in this post, I wanted to list a few things that I think make a really good data scientist [at a small company].

This is an excellent post. Relevant if you’re seeking or hiring for a role at a startup.

medium.com • Share

How to teach technical concepts with cartoons

Incorporate cartoons into your presentations and writing more. Bring your iPad to work, download Paper, and have at it.

jvns.ca • Share

Python 2 vs Python 3

“Should I learn Python 2 or Python 3?” Here’s the definite answer, based on the opinion of practicing data scientists.

The multiple Python versions confused me when I first learned the language. This is worth a read for other new Python devs.

data36.com • Share

Data viz of the week

Simple, very powerful.

Thanks to our sponsors!

Fishtown Analytics: Analytics Consulting for Startups

At Fishtown Analytics, we work with venture-funded startups to implement Redshift, Snowflake, Mode Analytics, and Looker. Want advanced analytics without needing to hire an entire data team? Let’s chat.

fishtownanalytics.com • Share

Stitch: Simple, Powerful ETL Built for Developers

Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.

www.stitchdata.com • Share

By Tristan Handy

The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.

Tweet Share

If you don't want these updates anymore, please unsubscribe here.

If you were forwarded this newsletter and you like it, you can subscribe here.

915 Spring Garden St., Suite 500, Philadelphia, PA 19123

The Analytics Engineering Roundup

Discussion about this post

Ready for more?