Forecasting at Uber. Reproducible ML. Optimizers on AI. Column-Store Databases. GDPR. [DSR #155]

❤️ Want to support us? Forward this email to three friends!

🚀 Forwarded this from a friend? Sign up to the Data Science Roundup here.

This Week's Most Useful Posts

Forecasting at Uber: An Introduction

In this article, we provide a general overview of how our teams leverage forecasting to build better products and maintain the health of the Uber marketplace.

This is an amazing behind-the-scenes look at how Uber does forecasting. Includes detailed thoughts on model selection, training approach, uncertainty estimation, and more. Must-read.


Help! I can’t reproduce a machine learning project!

Data science code doesn’t always produce the same results when you run it multiple times, which can make debugging a nightmare. This article goes into the three areas that cause reproducibility problems in code—non-deterministic code, changes in data, and changes in environment—and gives practical tips for improving reproducibility.

I love this post. Such an important topic, and I’ve never actually seen a good post that covers it before.


Illustrated Guide to LSTM’s and GRU’s

Illustrated Guide to LSTM’s and GRU’s

In this post, we’ll start with the intuition behind LSTM ’s and GRU’s. Then I’ll explain the internal mechanisms that allow LSTM’s and GRU’s to perform so well. If you want to understand what’s happening under the hood for these two networks, then this post is for you.

There’s a ton of effort put into the illustrations in this post and it adds a lot. New to GRUs? Gated Recurrent Unit networks are an innovation on top of RNNs.


The Crossroads of AI and Database Algorithms: Query Optimization

tl;dr: We observed that Dynamic Programming is the common base of both database query optimization and reinforcement learning. Based on this, we designed a deep reinforcement learning algorithm for database query optimization we call DQ. We show that DQ is highly effective and more generally adaptable than any of the prior approaches in the database literature.

Database research meets AI. Academic; points towards the future .


The design and implementation of modern column-oriented database systems

The design and implementation of modern column-oriented database systems

Have you ever found yourself answering the question “But why is Redshift (or Snowflake or Bigquery, etc) faster than Postgres?” I answer this question a lot—multiple times a week. This is the single best answer to that question I’ve ever seen, and it goes deep into topics like column pruning and compression. Dense; very worthwhile.


Machine learning and the right to explanation in GDPR

…the right to an explanation as defined in GDPR may be harder than expected to implement. This does not invalidate the basic premise that individuals have a right to know what is being done with their data, but – particularly with novel machine learning techniques – it means that we need to look beyond simple calls for transparency.

Since GDPR has fully gone into effect, conversation about what exactly the implications are for data processors and controllers have been ongoing. One of them is over the “right to an explanation”. There is debate as to whether GDPR provides such a right at all. The main article (linked in the headline) discusses the difficulties in providing such a right in the context of modern ML.

This is not a settled topic, and probably doesn’t impact you today. But it’s an important topic to follow and the the authors at Open Rights Group are the most authoritative voices I’ve seen on it.


Data Viz of the Week

Unusually good map: immediately draws attention to the salient points.

Unusually good map: immediately draws attention to the salient points.

Thanks to our sponsors!

Fishtown Analytics: Analytics Consulting for Startups

At Fishtown Analytics, we work with venture-funded startups to build analytics teams. Whether you’re looking to get analytics off the ground after your Series A or need support scaling, let’s chat.


Stitch: Simple, Powerful ETL Built for Developers

Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.


By Tristan Handy

The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.

Tweet Share

If you don't want these updates anymore, please unsubscribe here.

If you were forwarded this newsletter and you like it, you can subscribe here.

Powered by Revue

915 Spring Garden St., Suite 500, Philadelphia, PA 19123