Data Science Roundup #61: Automating your Workflows! Plus: Uber's new mapping library (+ more!)
Why is machine learning hard? Designing physical spaces with ML. Automating your data science workflows. Uber releases a high-performance mapping library. Running your DS team with Agile. A deep dive into gradient descent.
Enjoy! 😁 😁
Referred by a friend? Sign up.
This week's best data science articles
I love this post so much. Here’s the spoiler: machine learning is hard because it “is a fundamentally hard debugging problem. Debugging for machine learning happens in two cases: 1) your algorithm doesn’t work or 2) your algorithm doesn’t work well enough. What is unique about machine learning is that it is ‘exponentially’ harder to figure out what is wrong when things don’t work as expected. Compounding this debugging difficulty, there is often a delay in debugging cycles between implementing a fix or upgrade and seeing the result.”
ML is infecting the design of software and networks around the world, but WeWork is going further. The company now has enough real estate and usage data to begin using ML to understand the behavior of humans in their built environment, and will be using this insight as an input into plans for future locations. This post is a quick read, but it’ll leave you thinking deep thoughts about what the world will really look like once AI infects everything.
One of the defining characteristics of a good software engineer is her laziness. Lazy software engineers hate doing things that could be automated and spend significant chunks of their time building tooling to make themselves more efficient. Data scientists often lag on the automation front, and this post is both a call-to-action and a list of suggestions.
Uber has one of the most important datasets in the world, and their data team just released the tool they use to visualize it. deck.gl contains optimizations allowing the rendering of massive map-based datasets that previously couldn’t be handled in-browser. This is a project to watch.
Data science project methodology is a fascinating topic, and one that often gets lost in the buzz about libraries and algorithms. This post discusses applying the Agile software development process to exploratory data analysis. In my opinion, this is the way to do data science. Too many teams either over-plan (wasting time in areas that turn out to be useless) or under-plan (little to no direction). Agile allows a group to learn quickly and converge on solutions aggressively.
Warning: drink ☕ before reading!
Gradient descent is the grandfather of all optimization algorithms. The fundamental insight is fairly straightforward and falls directly out of calculus (finding the slope at a point on a curve), but there are many different algorithmic implementations and all have their own trade-offs.
This is an important topic, and I’ve never seen a better primer.
Data viz of the week
A lot of artistic license went into this. Click through to appreciate in full!
Made possible by our sponsors:
Fishtown Analytics works with venture-funded startups to implement Redshift, BigQuery, Mode Analytics, and Looker. Want advanced analytics without needing to hire an entire data team? Let’s chat.
Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.
The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.
If you don't want these updates anymore, please unsubscribe here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue
915 Spring Garden St., Suite 500, Philadelphia, PA 19123