Forecasting at Uber. Reproducible ML. Optimizers on AI. Column-Store Databases. GDPR. [DSR #155]

Sep 30, 2018

❤️ Want to support us? Forward this email to three friends!

🚀 Forwarded this from a friend? Sign up to the Data Science Roundup here.

This Week's Most Useful Posts

Forecasting at Uber: An Introduction

In this article, we provide a general overview of how our teams leverage forecasting to build better products and maintain the health of the Uber marketplace.

This is an amazing behind-the-scenes look at how Uber does forecasting. Includes detailed thoughts on model selection, training approach, uncertainty estimation, and more. Must-read.

eng.uber.com • Share

Help! I can’t reproduce a machine learning project!

Data science code doesn’t always produce the same results when you run it multiple times, which can make debugging a nightmare. This article goes into the three areas that cause reproducibility problems in code—non-deterministic code, changes in data, and changes in environment—and gives practical tips for improving reproducibility.

I love this post. Such an important topic, and I’ve never actually seen a good post that covers it before.

blog.kaggle.com • Share

Illustrated Guide to LSTM’s and GRU’s

In this post, we’ll start with the intuition behind LSTM ’s and GRU’s. Then I’ll explain the internal mechanisms that allow LSTM’s and GRU’s to perform so well. If you want to understand what’s happening under the hood for these two networks, then this post is for you.

There’s a ton of effort put into the illustrations in this post and it adds a lot. New to GRUs? Gated Recurrent Unit networks are an innovation on top of RNNs.

towardsdatascience.com • Share

The Crossroads of AI and Database Algorithms: Query Optimization

tl;dr: We observed that Dynamic Programming is the common base of both database query optimization and reinforcement learning. Based on this, we designed a deep reinforcement learning algorithm for database query optimization we call DQ. We show that DQ is highly effective and more generally adaptable than any of the prior approaches in the database literature.

Database research meets AI. Academic; points towards the future .

databeta.wordpress.com • Share

The design and implementation of modern column-oriented database systems

Have you ever found yourself answering the question “But why is Redshift (or Snowflake or Bigquery, etc) faster than Postgres?” I answer this question a lot—multiple times a week. This is the single best answer to that question I’ve ever seen, and it goes deep into topics like column pruning and compression. Dense; very worthwhile.

blog.acolyer.org • Share

Machine learning and the right to explanation in GDPR

…the right to an explanation as defined in GDPR may be harder than expected to implement. This does not invalidate the basic premise that individuals have a right to know what is being done with their data, but – particularly with novel machine learning techniques – it means that we need to look beyond simple calls for transparency.

Since GDPR has fully gone into effect, conversation about what exactly the implications are for data processors and controllers have been ongoing. One of them is over the “right to an explanation”. There is debate as to whether GDPR provides such a right at all. The main article (linked in the headline) discusses the difficulties in providing such a right in the context of modern ML.

This is not a settled topic, and probably doesn’t impact you today. But it’s an important topic to follow and the the authors at Open Rights Group are the most authoritative voices I’ve seen on it.

www.openrightsgroup.org • Share

Data Viz of the Week

Unusually good map: immediately draws attention to the salient points.

Thanks to our sponsors!

Fishtown Analytics: Analytics Consulting for Startups

At Fishtown Analytics, we work with venture-funded startups to build analytics teams. Whether you’re looking to get analytics off the ground after your Series A or need support scaling, let’s chat.

www.fishtownanalytics.com • Share

Stitch: Simple, Powerful ETL Built for Developers

Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.

www.stitchdata.com • Share

By Tristan Handy

The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.

Tweet Share

If you don't want these updates anymore, please unsubscribe here.

If you were forwarded this newsletter and you like it, you can subscribe here.

915 Spring Garden St., Suite 500, Philadelphia, PA 19123

The Analytics Engineering Roundup

Discussion about this post

Ready for more?