Learning Dexterity. Data Ethics. Black Box Models. Land Usage. [DSR #147]

Tristan Handy

Aug 05, 2018

❤️ Want to support us? Forward this email to three friends!

🚀 Forwarded this from a friend? Sign up to the Data Science Roundup here.

This Week's Most Useful Posts

OpenAI: Learning Dexterity

We’ve trained a human-like robot hand to manipulate physical objects with unprecedented dexterity.

This publication got a ton of press over the past week, but the writeup on the OpenAI blog is much more substantive than what you’ll read elsewhere. The demonstrations are immediately impressive—very worth clicking through just to see them.

blog.openai.com • Share

Data's Day of Reckoning

We can build a future we want to live in, or we can build a nightmare. The choice is up to us.

This article has really been making the rounds this week and so I wanted to make sure to link to it. It’s a worthwhile read, and the topic is very important to the industry. I have a bit of a contrarian take on it, so please bear with me if I say just a bit more about this piece than usual.

Let me first just say that the three authors—Hillary Mason, DJ Patil, and Mike Loukides—are all much smarter and more plugged-in than I am. The article is thoughtful, and it rightly points out the futures our increasing data sophistication make possible. I agree with their sense of urgency.

Even so, I just don’t know that I agree with the prescriptions present in the article. It is easy to recommend that training programs teach data ethics, that companies have guiding principles for data ethics and bake it into their corporate cultures. The issue is that not that these recommendations are wrong, it is that they are fundamentally insufficient—enough so that they don’t meaningfully address the problem.

The issue here is incentives. Data tech today is asymmetric and opaque: a single party owns a data set and applies it in ways that are largely unknown to outsiders. This asymmetry paired with data’s economies of scale has created one of the most valuable sources of competitive differentiation in the history of capitalism. The incentives for the data owner to use their data with dubious ethical standards—or to fail to consider ethics altogether—are tremendous.

The power of John D. Rockefeller wasn’t counterbalanced by a growing culture of ethics. It was eventually restrained by powerful anti-trust legislation and the breakup of Standard Oil. It’s not clear (to me) exactly what is required to constrain the actions of data owners, but I’m fairly sure it will be rather more extreme than this post suggests.

www.oreilly.com • Share

The Blacker the Box

Another amazing post by Michael Kaminsky:

There has been a lot of discussion in the data science community about the use of black-box models, and there is lots of really fascinating ongoing research into methods, algorithms, and tools to help data scientists better introspect their models. While those discussions and that research are important, in this post I discuss the macro-framework I use for evaluating how black the box can be for a prediction product.

The faster the feedback on prediction accuracy, the blacker the box can be. The slower the feedback, the more your models should be explicit and formal.

Practical.

www.locallyoptimistic.com • Share

New Data Podcast! — In Context

A podcast exploring the latest developments in Artificial Intelligence, within the context of the personal biographies, motivations, world views and beliefs of the leading researchers, practitioners, and entrepreneurs in the field.

So far they’ve made 9 episodes and I just finished listening to the last couple. In episode 9, Stitch Fix Chief Algorithms Officer Eric Colson gives a great interview.

I’m hopeful that this will become one of my regular weekly listens.

soundcloud.com • Share

L1: Tensor Studio: A playground for tensor computations

From the project description:

L1 is a playground for differentiable linear algebra, heavily used in Machine Learning. This project is a combination of a programming language, interpreter, standard library and IDE in one unified experience.

The goal? Only to “become the standard tool for prototyping new Machine Learning ideas.” You know, nothing too ambitious :)

We are still in the very early days of providing good tooling for machine learning. Algorithms, data, and compute resources make machine learning possible, while good tooling makes it accessible and efficient.

github.com • Share

Data Viz of the Week

"Here's how America Uses its Land". The article goes much deeper.

Thanks to our sponsors!

Fishtown Analytics: Analytics Consulting for Startups

At Fishtown Analytics, we work with venture-funded startups to build analytics teams. Whether you’re looking to get analytics off the ground after your Series A or need support scaling, let’s chat.

fishtownanalytics.com • Share

Stitch: Simple, Powerful ETL Built for Developers

Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.

www.stitchdata.com • Share

By Tristan Handy

The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.

Tweet Share

If you don't want these updates anymore, please unsubscribe here.

If you were forwarded this newsletter and you like it, you can subscribe here.

915 Spring Garden St., Suite 500, Philadelphia, PA 19123

The Analytics Engineering Roundup

Discussion about this post