Data Science Roundup #61: Automating your Workflows! Plus: Uber's new mapping library (+ more!)

Nov 20, 2016

Why is machine learning hard? Designing physical spaces with ML. Automating your data science workflows. Uber releases a high-performance mapping library. Running your DS team with Agile. A deep dive into gradient descent.

Enjoy! 😁 😁

- Tristan

Like the roundup? Share via Twitter | Share on Facebook | forward this email

Referred by a friend? Sign up.

This week's best data science articles

Why is machine learning 'hard'?

I love this post so much. Here’s the spoiler: machine learning is hard because it “is a fundamentally hard debugging problem. Debugging for machine learning happens in two cases: 1) your algorithm doesn’t work or 2) your algorithm doesn’t work well enough. What is unique about machine learning is that it is ‘exponentially’ harder to figure out what is wrong when things don’t work as expected. Compounding this debugging difficulty, there is often a delay in debugging cycles between implementing a fix or upgrade and seeing the result.”

ai.stanford.edu • Share

Designing with Machine Learning

ML is infecting the design of software and networks around the world, but WeWork is going further. The company now has enough real estate and usage data to begin using ML to understand the behavior of humans in their built environment, and will be using this insight as an input into plans for future locations. This post is a quick read, but it’ll leave you thinking deep thoughts about what the world will really look like once AI infects everything.

www.wework.com • Share

🤖 Data Scientists Need More Automation

One of the defining characteristics of a good software engineer is her laziness. Lazy software engineers hate doing things that could be automated and spend significant chunks of their time building tooling to make themselves more efficient. Data scientists often lag on the automation front, and this post is both a call-to-action and a list of suggestions.

stiglerdiet.com • Share

Visualize Data Sets on the Web with Uber Engineering’s deck.gl Framework

Uber has one of the most important datasets in the world, and their data team just released the tool they use to visualize it. deck.gl contains optimizations allowing the rendering of massive map-based datasets that previously couldn’t be handled in-browser. This is a project to watch.

eng.uber.com • Share

Managing Shifting Priorities in Exploratory Data Science Projects

Data science project methodology is a fascinating topic, and one that often gets lost in the buzz about libraries and algorithms. This post discusses applying the Agile software development process to exploratory data analysis. In my opinion, this is the way to do data science. Too many teams either over-plan (wasting time in areas that turn out to be useless) or under-plan (little to no direction). Agile allows a group to learn quickly and converge on solutions aggressively.

www.predictiveanalyticsworld.com • Share

An overview of gradient descent optimization algorithms

Warning: drink ☕ before reading!

Gradient descent is the grandfather of all optimization algorithms. The fundamental insight is fairly straightforward and falls directly out of calculus (finding the slope at a point on a curve), but there are many different algorithmic implementations and all have their own trade-offs.

This is an important topic, and I’ve never seen a better primer.

sebastianruder.com • Share

Data viz of the week

A lot of artistic license went into this. Click through to appreciate in full!

Made possible by our sponsors:

Fishtown Analytics: Analytics Consulting for Startups

Fishtown Analytics works with venture-funded startups to implement Redshift, BigQuery, Mode Analytics, and Looker. Want advanced analytics without needing to hire an entire data team? Let’s chat.

fishtownanalytics.com • Share

Stitch: Simple, powerful ETL built for developers

Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.

www.stitchdata.com • Share

By Tristan Handy

The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.

Tweet Share

If you don't want these updates anymore, please unsubscribe here.

If you were forwarded this newsletter and you like it, you can subscribe here.

915 Spring Garden St., Suite 500, Philadelphia, PA 19123

The Analytics Engineering Roundup

Discussion about this post