There were tons of excellent posts about data from innovative tech companies this week. Each article tells a story of a data team operating at the top of its game, achieving truly unique things.
If you’re in or around Philly, we’d love to have you join us, Stitch, and Snowflake at an event we’re hosting on 10/25. Free drinks and plenty of data folks to hang out with. Register here.
❤️ Want to support us? Forward this email to three friends!
🚀 Forwarded this from a friend? Sign up to the Data Science Roundup here.
Focus on: Data in Tech
Whoah. I’ve wanted to read this article for two years. I’ve been a Discover Weekly fan since they launched the feature in 2015 and this is the first writeup I’ve seen of how it works. My favorite surprise: it incorporates a raw audio CNN to find tracks with directly similar characteristics, skipping the human element completely.
Really excellent read.
At Airbnb, R has been amongst the most popular tools for doing data science in many different contexts, including generating product insights, interpreting experiments, and building predictive models. Airbnb supports R usage by creating internal R tools and by creating a community of R users. At the end of the post, the authors provide some specific advice for practitioners who wish to incorporate R into their day-to-day workflow.
This is the deepest writing to-date on one of the most innovative, collaborative data teams in the world. Must read.
Using Monte Carlo simulations to balance supply & demand in a marketplace.
Zulily’s data infrastructure focuses on building a wrapper around the entire stack so that they could deliver heterogenous services with a homogenous API. Powerful, but quite a challenging task.
Based on the experiences of our Data Scientists, we distilled a set of steps, tips and general guidance representing the best practices that we collectively know of and agree to as a community of practitioners.
After figuring out what the issue was, we revamped the experimental design and learned that what we intuited was right — user churn did increase with increased auth frequency.
Fascinating story of an subtle flaw in experimental design.
Other great posts this week
Data Pipelines can be time-consuming to build. Should you buy Segment, Stitch, or Fivetran to do it for you?
Stephen Levin, the author, is an analyst @ Zapier, member of the dbt community, and super-smart guy. I don’t 100% agree with everything in his post (I’m less of a Segment fan), but this is an excellent take on an important topic.
I’ve noticed that the things that I value and the things that work for me aren’t always the same things that big companies tend to recruit for. So in this post, I wanted to list a few things that I think make a really good data scientist [at a small company].
This is an excellent post. Relevant if you’re seeking or hiring for a role at a startup.
Incorporate cartoons into your presentations and writing more. Bring your iPad to work, download Paper, and have at it.
“Should I learn Python 2 or Python 3?” Here’s the definite answer, based on the opinion of practicing data scientists.
The multiple Python versions confused me when I first learned the language. This is worth a read for other new Python devs.
Data viz of the week
Simple, very powerful.
Thanks to our sponsors!
At Fishtown Analytics, we work with venture-funded startups to implement Redshift, Snowflake, Mode Analytics, and Looker. Want advanced analytics without needing to hire an entire data team? Let’s chat.
Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.
The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.
If you don't want these updates anymore, please unsubscribe here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue
915 Spring Garden St., Suite 500, Philadelphia, PA 19123