Optimization @ Stitch Fix. Bias-Variance Tradeoff. Facebook's DensePose. [DSR #141]

❤️ Want to support us? Forward this email to three friends!

🚀 Forwarded this from a friend? Sign up to the Data Science Roundup here.

The Week's Most Useful Posts

Stitch Fix: Add Constrained Optimization To Your Toolbelt

This post is an introduction to constrained optimization aimed at data scientists and developers fluent in Python, but without any background in operations research or applied math. We’ll demonstrate how optimization modeling can be applied to real problems at Stitch Fix. At the end of this article, you should be able to start modeling your own business problems.

This post is a great intro to a surprisingly underutilized technique. I personally don’t see many companies attempting to answer “how can I perform this process X% better” using data science—I see data science being more commonly used today to classify and predict.

Think about the problems in your org. Can you frame any of them as constrained optimization problems?


Model Tuning and the Bias-Variance Tradeoff

Model Tuning and the Bias-Variance Tradeoff

The authors’ original piece, from 2015, was gorgeous and I linked to it when it came out. Just released, part two focuses on the inherent tradeoff between bias and variance. Very impressive D3.


Facebook Research Open Sources DensePose

Facebook Research Open Sources DensePose

Recent research in human understanding aims primarily at localizing a sparse set of joints, like the wrists, or elbows of humans. This may suffice for applications like gesture or action recognition, but it delivers a reduced image interpretation. We wanted to go further. Imagine trying new clothes on via a photo, or putting costumes on your friend’s photos. For these tasks, a more complete, surface-based image interpretation is required.

The article is very accessible, and DensePose seems to be quite impressive:

Earlier works on this problem would require computation in the order of minutes. DensePose operates at multiple frames per second on a single GPU and can handle tens or even hundreds of humans simultaneously.


Agile Analytics, Part 2: The Bad Stuff

I linked to the first part in this series back in May; it went through all of the great things about organizing your analytics team around Agile. The author came back with part two: all the bad things. Here’s my favorite bit:

I often say that analytics is a discipline that is half software engineering and half research. The aspects of analytics that don’t work particularly well with scrum are the parts that are more aligned with the “research” half of the analytics discipline.

I completely agree with this. We do almost all of our client work in Agile, and we do find it awkward to write up stories that are focused on data exploration instead of answering well-defined questions. We’ve found ways to shoehorn this type of research into the process, but it’s clearly not a natural fit. Overall, we feel like the strengths of Agile still outweigh this weakness, but your mileage may vary.

Great, nuanced perspective.


Visualizing Social Network Data with NetworkX and Basemap

One of the most common ways data scientists are introduced to graphs is via Airflow, which helps you build a directed acyclic graph (DAG) for your data pipeline. Graphs are at the heart of our data modeling tool, dbt, as well. Graphs are much more broadly relevant than simply constructing data pipelines however: nodes and edges turn out to be a great way to model data.

This post uses a common Python graph processing library, NetworkX, to create and draw graphs. NetworkX is an awesome library: we use it to do all of dbt’s graph processing. This post is a great intro.

Very useful tool in your tool belt.


Programming Best Practices For Data Science

Often, the entire data science life cycle ends up as an arbitrary mess of notebook cells in either a Jupyter Notebook or a single messy script. In addition, most data science problems require us to switch between data retrieval, data cleaning, data exploration, data visualization, and statistical / predictive modeling.

But there’s a better way! In this post, I’ll go over the two mindsets most people switch between when doing programming work specifically for data science: the prototype mindset and the production mindset.

If you find yourself writing cell after cell of Jupyter code, this is for you.


Thanks to our sponsors!

Fishtown Analytics: Analytics Consulting for Startups

At Fishtown Analytics, we work with venture-funded startups to build analytics teams. Whether you’re looking to get analytics off the ground after your Series A or need support scaling, let’s chat.


Stitch: Simple, Powerful ETL Built for Developers

Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.


By Tristan Handy

The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.

Tweet Share

If you don't want these updates anymore, please unsubscribe here.

If you were forwarded this newsletter and you like it, you can subscribe here.

Powered by Revue

915 Spring Garden St., Suite 500, Philadelphia, PA 19123