ETL @ StitchFix. DS Career Progression. Advice from a DS @ Lyft. Semi-Supervised Learning. [DSR #187]

❤️ Want to support this project? Forward this email to three friends!

🚀 Forwarded this from a friend? Sign up to the Data Science Roundup here.

This week's best data science articles

Becoming a Level 3.0 Data Scientist

Becoming a Level 3.0 Data Scientist

This post aims to shed light on what’s expected and what’s outside of the scope of each Data Science Career Level.

This is the clearest discussion of the Data Scientist career path that I’ve seen. The core insight flows from the above diagram, whereby the Junior > Senior > Principal path adds first stat then engineering then business skills. I could not agree with this more and could not have said it better myself.

Must read.


StitchFix | Maintainable ETLs: Tips for Making Your Pipelines Easier to Support and Extend

This post gives practical advice that will help make your ETL pipelines easier to debug, maintain, and extend.

These themes will likely not be brand new to you if you’ve read the Roundup for a while now. You should test your production data pipelines, you should express as much logic as possible in SQL, you should focus on modularity…

The best part of the post is the clear, simple way that the author goes through each recommendation and the learnings of how each is applied within StitchFix.

Read this, then set yourself a reminder to read it again in a year.


Data Scientists Are Thinkers

Data scientists serve a very technical purpose, but one that is vastly different from other individual contributors. Unlike engineers, designers, and project managers, data scientists are exploration-first, rather than execution-first.

Short, useful. If you’re a data scientist and have a hard time getting out from behind your Jira backlog, this is highly recommended—it’ll reconnect you to your real mandate and give you tips for how to create this balance in your day-to-day.


The Quiet Semi-Supervised Revolution

The Quiet Semi-Supervised Revolution

One fascinating trend is that the landscape of semi-supervised learning may be changing to something that looks more like [the above graph]. And that would change everything. First, these curves match one’s mental model of what semi-supervised approaches should do: more data should always be better. The gap between semi-supervised and supervised should be strictly positive even for data regimes where supervised learning does well. And increasingly this is happening at no cost and remarkably little additional complexity. The ‘magic zone’ starts lower, and equally importantly, it isn’t bounded in high data regimes.

The post is actually quite short and frames recent research in semi-supervised learning quite effectively. There are links at the end if you want to dig in further.


Lyft Data Scientist Shares Five Pieces of Career Advice

Solid advice from a data scientist at Lyft. I especially like his thoughts about starting a new job: read documentation, schedule meetings with a broad range of people, identify quick wins, and develop trust.


Thanks to our sponsors!

Fishtown Analytics: Analytics Consulting for Startups

At Fishtown Analytics, we work with venture-funded startups to build analytics teams. Whether you’re looking to get analytics off the ground after your Series A or need support scaling, let’s chat.


Stitch: Simple, Powerful ETL Built for Developers

Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.


By Tristan Handy

The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.

Tweet Share

If you don't want these updates anymore, please unsubscribe here.

If you were forwarded this newsletter and you like it, you can subscribe here.

Powered by Revue

915 Spring Garden St., Suite 500, Philadelphia, PA 19123