Data Science Roundup #64: 2016 Deep Learning Advances, Major OpenAI Release & more!

Interesting week! Major advancements in deep learning in 2016; OpenAI makes a major new release; catching aberrant trains with data; an extensive guide to data pipelines, dimensionality reduction techniques; how people like you spend their time.

If you’ve been sent this newsletter by a friend, do me a favor and sign up. It’s your subscriptions that keep The Data Science Roundup growing!

Thanks 😁 😁

- Tristan

PS: If you’re in Philly, come join this Wednesday when as I teach a session called SQL for Analysts! Sign up here.

This week's best data science articles

The Major Advancements in Deep Learning in 2016

This is a stellar post, well worth reading. I’m including the introduction in full here in the hopes that you’ll take the time to read the entire post.

Deep Learning has been the core topic in the Machine Learning community the last couple of years and 2016 was not the exception. In this article, we will go through the advancements we think have contributed the most (or have the potential) to move the field forward and how organizations and the community are making sure that these powerful technologies are going to be used in a way that is beneficial for all.

One of the main challenges researchers have historically struggled with has been unsupervised learning. We think 2016 has been a great year for this area, mainly because of the vast amount of work on Generative Models.

Moreover, the ability to naturally communicate with machines has been also one of the dream goals and several approaches have been presented by giants like Google and Facebook. In this context, 2016 was all about innovation in Natural Language Processing (NLP) problems which are crucial to reach this goal.


Introducing Universe

Wow. OpenAI is living up to the hype in its first year. Its most recent release, Universe, is an environment that allows “an AI agent to use a computer like a human does: by looking at screen pixels and operating a virtual keyboard and mouse. We must train AI systems on the full range of tasks we expect them to solve, and Universe lets us train a single agent on any task a human can complete with a computer.”

This is a very directed step by OpenAI towards Artificial General Intelligence: “If we are to make progress towards generally intelligent agents, we must allow them to experience a wide repertoire of tasks so they can develop world knowledge and problem solving strategies that can be efficiently reused in a new task.”


How the Circle Line rogue train was caught with data

This post is insanely cool. It’s a data “whodunnit” featuring a team of analysts trying to get to the bottom of a problem plaguing Singapore’s train system. The plot starts with a spate of emergency braking incidents having travelers on the edge of their seats, then adds a couple of CSV files and Jupyter Notebook…

I’m not going to tell you what happens from there because, honestly, I don’t want to spoil the plot! It’s that cool.

Why you need a data pipeline and which one you should choose

Data pipelines have become an increasingly important part of the modern analytics stack, and this post is a much-needed resource in deciding on yours. If you need a primer on what a data pipeline is, why you need one, or how to choose one, read the post itself. Or, skip right to the amazing matrix at the end that shows today’s top data pipelines and compares them feature-by-feature.

Thanks to Fletcher @ Galvanize for pulling this together! 👏 👏


Dimensionality Reduction and Intuition

Stellar article by the team @ Fast Forward Labs on dimensionality reduction. Any time you’re dealing with matrices that are more than three-dimensional, dimensionality reduction techniques are critical to make sense of it. Included in the post is a link to a new Google tool called Embedding Projector that is well worth a look.

If you’ve never run a PCA this post is a must-read.


🕒 How People Like You Spend Their Time

A short read to wrap up this week’s issue: this is a neat tool to see how your time usage compares with others in your demographic groups. Yay open data!


Data viz of the week

The US needs new bridges 🙁  Click through for more impressive infrastructure viz.

The US needs new bridges 🙁 Click through for more impressive infrastructure viz.

Thanks to our sponsors!

Fishtown Analytics: Analytics Consulting for Startups

Fishtown Analytics works with venture-funded startups to implement Redshift, BigQuery, Mode Analytics, and Looker. Want advanced analytics without needing to hire an entire data team? Let’s chat.


Stitch: Simple, powerful ETL built for developers

Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.


By Tristan Handy

The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.

Tweet Share

If you don't want these updates anymore, please unsubscribe here.

If you were forwarded this newsletter and you like it, you can subscribe here.

Powered by Revue

915 Spring Garden St., Suite 500, Philadelphia, PA 19123