Data Science Roundup #93: Women in Data, Viz @ Instacart, Modular SQL @ Stitch Fix, and more!
Lots of great stuff happened in the past two weeks! Hope you’re ready to settle in and do some reading…
Enjoy :)
- Tristan
❤️ Want to support us? Forward this email to three friends!
🚀 Forwarded this from a friend? Sign up to the Data Science Roundup here.
Two Posts You Can't Miss
The past month has seen the topic of women in tech really take center stage. One of my favorite articles written on this topic is Let’s Stop Talking About the Sexism in Venture, where Claudia Iannazzo, Managing Partner at AlphaPrime (a VC), recommends concrete actions to improve the status quo that go beyond just talk. Many of those recommendations are centered around correcting the gender imbalance in the first place—in our employees, panels, columns, and more.
This post highlights some of the biggest women influencers in data. If you’re not following them, you should start. Hillary Mason and Cathy O'Neil in particular are two of my favorite Twitter feeds.
This is a topic I care a lot about, and one that everyone in the data community should be deeply invested in.
www.datasciencecentral.com • Share
At Instacart, we deliver a lot of groceries. By the end of next year, 80% of American households will be able to use Instacart. Our challenge: complete every delivery on-time, with the right groceries as fast as possible. How do we bring order to the chaos?
In the remainder of this post, we’ll first introduce the logistics problem Instacart is solving, outline the architecture of our systems and describe the GPS data we collect. Then we will conclude by touring a series of datashader visualizations.
This post is really very impressive. It lays out the (quite challenging) problem the Instacart team is faced with solving, then takes the reader step-by-step through the visualizations the team has developed to solve it.
Most expertise associated with solutions of this scale and complexity is locked up in the brains of people at a small number of companies. This knowledge needs to become more widely dispersed, and I love that the Instacart team is helping make this happen.
This Week's Top Posts
Building Out Analytics Functions in Startups
Mark Rittman, host of the Drill to Detail podcast, was kind enough to invite me to be a guest on the most recent episode. In it, we talked BI, data warehouse tech, and team building. I enjoyed doing the episode and think we covered a lot of ground. If you’re not already a subscriber of D2D, highly recommended.
This one weird trick will simplify your ETL workflow
The data engineers at Stitch Fix use Jinja2 templating on top of SQL to keep their code DRY and their ETL workloads in SQL. This resonated very strongly, as our open-source tool dbt uses Jinja at its core and we’ve used a lot of these same techniques ourselves.
Excellent read.
multithreaded.stitchfix.com • Share
Two Decades of Recommender Systems at Amazon.com
Amazon is well-known for personalization and recommendations, which help customers discover items they might otherwise not have found. In this update to our original paper, we discuss some of the changes as Amazon has grown.
No one knows recommenders better than Amazon.
How HBO’s Silicon Valley built “Not Hotdog” with mobile TensorFlow, Keras & React Native
If you don’t know about “Not Hotdog”, well, you should probably catch up on Season 4 of HBO’s Silicon Valley. This post is a really excellent walkthrough of how the real-life app was built, including some very interesting work on getting their TensorFlow network to run locally on-device.
(Complete with hilarious hotdog detection fail pics, of course.)
Two years as a Data Scientist at Stack Overflow
A year ago, David Robinson wrote a post about his first year as the first data scientist at Stack Overflow. It was awesome. Another year in he’s posted an update, focused on spreading R throughout Stack Overflow, working in teams, writing production code, and more.
Highly recommended for practicing data scientists.
In a few years, no investors are going to be looking for AI startups
…investors will stop looking for AI-powered startups in exactly the same way they don’t look for database-inside or cloud-native or mobile-first startups anymore. All those things are just assumed.
Agree.
Bar Plots and Modern Alternatives
This post presents some excellent variations on the standard bar chart, including relevant R code to produce them.
If the above post is an exploration of subtle differences and clever ways to use different versions of a bar chart, this does the same for a line chart that is comparing two populations. There are a surprising number of options and each brings out subtly different perspectives on the data.
The title speaks for itself! Solid analysis, great visual exploration.
Data viz of the week
Animated viz of subway development in China. *Very* impressive.
Thanks to our sponsors!
Fishtown Analytics: Analytics Consulting for Startups
At Fishtown Analytics, we work with venture-funded startups to implement Redshift, Snowflake, Mode Analytics, and Looker. Want advanced analytics without needing to hire an entire data team? Let’s chat.
Stitch: Simple, Powerful ETL Built for Developers
Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.
The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.
If you don't want these updates anymore, please unsubscribe here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue
915 Spring Garden St., Suite 500, Philadelphia, PA 19123