ML @ Facebook. AI Year in Review. Structuring the data team @ Coursera. Feature Engineering. [DSR #118]

❤️ Want to support us? Forward this email to three friends!

🚀 Forwarded this from a friend? Sign up to the Data Science Roundup here.

AI and Deep Learning in 2017 – A Year in Review

If you read a single “2017 year in review” post, this is the one—it’s incredibly exhaustive. Here are just a selection of the topic headings:

  • Reinforcement Learning beats humans at their own games

  • Evolution Algorithms make a Comeback

  • WaveNets, CNNs, and Attention Mechanisms

  • Applications: AI & Medicine, Art & GANs, Self-driving cars

  • Deep Learning, Reproducibility, and Alchemy

  • Artificial Intelligence made in Canada and China

  • Hardware Wars: Nvidia, Intel, Google, Tesla

  • Hype and Failures

  • Startup Investments and Acquisitions


What is the most effective way to structure a data science team?

From 2012 to 2017, I had the privilege to build the Data and Analytics organization at Coursera from scratch. Over that period of time, we experimented with a variety of different team structures as the company grew in size and the business evolved.

This is a common pain point for leaders growing teams, and this is the single most well-informed post I’ve seen on the topic.


Transitioning From Academia to Industry: Perspectives from Indeed’s Data Scientists

Since I left the academy two years ago, some of the more common questions I receive are: “Why did you leave academia?” “What should I do to make myself more hireable?” And the most existential question of all: “Will everything be OK if I leave?”

The most significant difference highlighted by the post, from which all others flowed:

One data scientist commented that her “pay doubled and the amount of work required halved” and another about “how good it feels to not be struggling financially!”

If you’re making the transition, this post is golden. If you know someone who is, share this with them.


Understanding Feature Engineering (Part 1): Continuous Numeric Data

Understanding Feature Engineering (Part 1): Continuous Numeric Data

Feature engineering is an essential part of building any intelligent system. Even though you have a lot of newer methodologies coming in like deep learning and meta-heuristics which aid in automated machine learning, each problem is domain specific and better features (suited to the problem) is often the deciding factor of the performance of your system. Feature Engineering is an art as well as a science and this is the reason Data Scientists often spend 70% of their time in the data preparation phase before modeling.

This post is a great playbook for executing on the feature engineering phase on your next project.


An Introduction to GPU Optimization

I’ll admit it: I’ve never written a line of GPU-accelerated code. I understand the concept (many cores, slower clock speed), but have never had a reason to dive in myself. This post is an awesome look at the basics of writing GPU-accelerated code. Whether or not you plan on needing this tool in your box, this is a worthwhile read—it’ll demystify the topic.



10 Best Data Visualization Projects of 2017

10 Best Data Visualization Projects of 2017

Nathan Yau of FlowingData is the single voice I credit most when it comes to assessing data viz. Here’s his 2017 list—if you’ve been following the Roundup through the year you’ll recognize several of these.

I need to start putting more effort into making my own custom diagrams; this is great inspiration. What’s your data New Year’s resolution?


Analyze one year of radio station songs with SQL, Spark, Spotify, and Databricks

One crazy data scientist set out to answer the question that we’ve all wondered: didn’t I just hear that song two hours ago? The author scraped websites of four French radio stations, then enriched the data via the Spotify API. The results are really fascinating. The #1 French radio station repeats their songs an average of 3.5 times per day and allocates fully 15 minutes out of every hour to commercials.

This research is exhaustive, and was definitely a ton of work to produce. Very impressive.


Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective

I’ve never seen anything quite like this: this paper is a walkthrough of how ML @ Facebook works, all the way down to server designs. It’s a dense read, but pretty damn unique—it’s not often that one of the giants actually shares this much info on their internal platforms.


Thanks to our sponsors!

Fishtown Analytics: Analytics Consulting for Startups

At Fishtown Analytics, we work with venture-funded startups to implement Redshift, Snowflake, Mode Analytics, and Looker. Want advanced analytics without needing to hire an entire data team? Let’s chat.


Stitch: Simple, Powerful ETL Built for Developers

Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.


By Tristan Handy

The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.

Tweet Share

If you don't want these updates anymore, please unsubscribe here.

If you were forwarded this newsletter and you like it, you can subscribe here.

Powered by Revue

915 Spring Garden St., Suite 500, Philadelphia, PA 19123