100s of PBs @ Uber. 5 Stat Concepts. KPIs @ Airbnb. A Massive Border Flyover. My Little Ponies. [DSR #159]

❤️ Want to support us? Forward this email to three friends!

🚀 Forwarded this from a friend? Sign up to the Data Science Roundup here.

This week's best data science articles

Uber's Big Data Platform: 100+ Petabytes with Minute Latency

Uber’s data platform has evolved significantly from when the company was processing 100s of GB to 10s of PB to 100s of PB. This in-depth post talks about each of their stages of evolution and the tools and technology associated with each.

There are few companies operating at this level and I liked seeing the choices that Uber has made along the way. Impressive work.

eng.uber.comShare

The 5 Basic Statistics Concepts Data Scientists Need to Know

If you skip the intro (which is a bit of buzzword salad), the content is quite good, if basic. Might be relevant for you, might be relevant for someone you know whom you work with. 4800 claps on Medium in the past week says its worth it.

towardsdatascience.comShare

An interactive look at the barriers that divide the US and Mexico

An interactive look at the barriers that divide the US and Mexico

What is along the nearly 2,000 miles of border that divides the U.S. from Mexico?

I’ve never seen anything like this—it’s a truly impressive piece of interactive content from the Washington Post. It’s worth a look purely as an experimental piece of data viz whether or not you’re personally invested in the topic.

www.washingtonpost.comShare

How Leaving Data Science Made Me a Better Data Scientist

This slide deck from a recent talk tells the story of Joel Grus’ path into, and then out of, data science. His takeaways are familiar ones: data scientists should focus on the readability, reproducibility, and test coverage of their code. If you’ve been reading the Roundup for long, you’ll know that this viewpoint resonates deeply with me.

If you read through the talk (which I did—it’s worth it) make sure to read the slide notes.

docs.google.comShare

My Internship at Airbnb Plus

12 weeks, 1 internship, 1 very good blog post. The post is an interesting overview of working in data @ Airbnb, but also a very insightful look at how Airbnb trains employees to think about and design KPIs. The section about KPI design is covered in the later parts of the post so make sure to get there—it’s a great treatment of the topic.

medium.comShare

New My Little Ponies, Designed by Neural Network

The author used a neural network to create new My Little Pony names. The names, and the author’s commentary on them, are hilarious. I actually LOLed at several of them. Some of my favorites:

  • Sob Dancer

  • Tardy Pony

  • Princess Sweat

…ok that’s probably enough. If you think this is the dumbest thing I’ve ever linked to, well, to each their own I suppose.

aiweirdness.comShare

Thanks to our sponsors!

Fishtown Analytics: Analytics Consulting for Startups

At Fishtown Analytics, we work with venture-funded startups to build analytics teams. Whether you’re looking to get analytics off the ground after your Series A or need support scaling, let’s chat.

www.fishtownanalytics.comShare

Stitch: Simple, Powerful ETL Built for Developers

Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.

www.stitchdata.comShare

By Tristan Handy

The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.

Tweet Share

If you don't want these updates anymore, please unsubscribe here.

If you were forwarded this newsletter and you like it, you can subscribe here.

Powered by Revue

915 Spring Garden St., Suite 500, Philadelphia, PA 19123