SEO Experimentation @ Airbnb. AI Safety @ DeepMind. Hexagons(!?) Building Your Own GPU Rig. [DSR #156]

❤️ Want to support us? Forward this email to three friends!

🚀 Forwarded this from a friend? Sign up to the Data Science Roundup here.

This Week's Most Useful Posts

Airbnb: Experimentation & Measurement for Search Engine Optimization

This is an insanely good post—by far the smartest thing I’ve ever seen written about SEO experimentation.

SEO (search engine optimization) is an incredibly hard problem to deal with quantitatively. It doesn’t lend itself to traditional digital experimentation approaches because you can’t serve two different versions of a page to a search engine and see which version results in more traffic! Because of this inability to apply standard experimentation approaches, most companies just don’t really try very hard to treat SEO with any quantitative rigor.

Airbnb solves this problem with clever experimental design and thoughtful stats. Must read—I’ve really never read such a thoughtful post on this topic.

medium.comShare

Building safe artificial intelligence: specification, robustness, and assurance

Building safe artificial intelligence: specification, robustness, and assurance

In this inaugural post, we discuss three areas of technical AI safety: specification, robustness, and assurance.

Written by the DeepMind safety team, some of the smartest people thinking about this topic today. They present an excellent structure within which AI safety can be studied and worked on.

medium.comShare

Prefect: Positive and Negative Data Engineering

One of the leaders of the Apache Airflow project has been working on a (potentially?) new and improved data engineering platform called Prefect. Its focus: detect when pipeline issues arise without forcing you to write a million conditionals to detect every possible error scenario.

The post is interesting, but the tutorial doesn’t really talk about how that secret sauce actually happens. I’m a little skeptical at this juncture, but at the same time have a lot of respect for the author’s background in the problem space. I’d recommend following Prefect if you’re an Airflow user today.

medium.comShare

H3: Uber’s Hexagonal Hierarchical Spatial Index

H3: Uber’s Hexagonal Hierarchical Spatial Index

Uber developed H3, our open source grid system for optimizing ride pricing and dispatch, to make geospatial data visualization and exploration easier and more efficient.

This is super-cool. Uber built an entirely new way to look at geospatial data to meet the specific needs of the ride-hailing business. Some of the decisions they had to make along the way and how they thought about them are really fascinating. Especially: why hexagons?? Games and GPUs love triangles, so…why hexagons?! Turns out, there’s actually a really good answer to that.

eng.uber.comShare

Why building your own Deep Learning Computer is 10x cheaper than AWS

If you’ve used, or are considering, AWS/Azure/GCloud for Machine Learning, you know how crazy expensive GPU time is. And turning machines on and off is a major disruption to your workflow. There’s a better way. Just build your own Deep Learning Computer. It’s 10x cheaper and also easier to use. Let’s take a closer look below.

Whoah—building your own is actually cheaper? I haven’t built my own computer since…college? The fascinating part of the article is why this is true:

There’s a reason why datacenters are expensive: they are not using the Geforce 1080 Ti. Nvidia contractually prohibits the use of GeForce and Titan cards in datacenters. So Amazon and other providers have to use the $8,500 datacenter version of the GPUs, and they have to charge a lot for renting it. This is customer segmentation at its finest folks!

Huh. I’d be a little surprised—assuming this is in fact true—if this situation continued for long. Even if it actually does make economic sense to build (vs rent) for the moment, my guess is that won’t be true in 2 years.

medium.comShare

xkcd: Data Pipeline

xkcd: Data Pipeline

…hits a little close to home, eh? This got posted in every Slack channel that I participate in over the past week 😛

xkcd.comShare

Thanks to our sponsors!

Fishtown Analytics: Analytics Consulting for Startups

At Fishtown Analytics, we work with venture-funded startups to build analytics teams. Whether you’re looking to get analytics off the ground after your Series A or need support scaling, let’s chat.

www.fishtownanalytics.comShare

Stitch: Simple, Powerful ETL Built for Developers

Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.

www.stitchdata.comShare

By Tristan Handy

The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.

Tweet Share

If you don't want these updates anymore, please unsubscribe here.

If you were forwarded this newsletter and you like it, you can subscribe here.

Powered by Revue

915 Spring Garden St., Suite 500, Philadelphia, PA 19123