ML @ Facebook. AI Year in Review. Structuring the data team @ Coursera. Feature Engineering. [DSR #118]
❤️ Want to support us? Forward this email to three friends!
🚀 Forwarded this from a friend? Sign up to the Data Science Roundup here.
If you read a single “2017 year in review” post, this is the one—it’s incredibly exhaustive. Here are just a selection of the topic headings:
Reinforcement Learning beats humans at their own games
Evolution Algorithms make a Comeback
WaveNets, CNNs, and Attention Mechanisms
Applications: AI & Medicine, Art & GANs, Self-driving cars
Deep Learning, Reproducibility, and Alchemy
Artificial Intelligence made in Canada and China
Hardware Wars: Nvidia, Intel, Google, Tesla
Hype and Failures
Startup Investments and Acquisitions
From 2012 to 2017, I had the privilege to build the Data and Analytics organization at Coursera from scratch. Over that period of time, we experimented with a variety of different team structures as the company grew in size and the business evolved.
This is a common pain point for leaders growing teams, and this is the single most well-informed post I’ve seen on the topic.
Since I left the academy two years ago, some of the more common questions I receive are: “Why did you leave academia?” “What should I do to make myself more hireable?” And the most existential question of all: “Will everything be OK if I leave?”
The most significant difference highlighted by the post, from which all others flowed:
One data scientist commented that her “pay doubled and the amount of work required halved” and another about “how good it feels to not be struggling financially!”
If you’re making the transition, this post is golden. If you know someone who is, share this with them.
Feature engineering is an essential part of building any intelligent system. Even though you have a lot of newer methodologies coming in like deep learning and meta-heuristics which aid in automated machine learning, each problem is domain specific and better features (suited to the problem) is often the deciding factor of the performance of your system. Feature Engineering is an art as well as a science and this is the reason Data Scientists often spend 70% of their time in the data preparation phase before modeling.
This post is a great playbook for executing on the feature engineering phase on your next project.
I’ll admit it: I’ve never written a line of GPU-accelerated code. I understand the concept (many cores, slower clock speed), but have never had a reason to dive in myself. This post is an awesome look at the basics of writing GPU-accelerated code. Whether or not you plan on needing this tool in your box, this is a worthwhile read—it’ll demystify the topic.
Nathan Yau of FlowingData is the single voice I credit most when it comes to assessing data viz. Here’s his 2017 list—if you’ve been following the Roundup through the year you’ll recognize several of these.
I need to start putting more effort into making my own custom diagrams; this is great inspiration. What’s your data New Year’s resolution?
One crazy data scientist set out to answer the question that we’ve all wondered: didn’t I just hear that song two hours ago? The author scraped websites of four French radio stations, then enriched the data via the Spotify API. The results are really fascinating. The #1 French radio station repeats their songs an average of 3.5 times per day and allocates fully 15 minutes out of every hour to commercials.
This research is exhaustive, and was definitely a ton of work to produce. Very impressive.
I’ve never seen anything quite like this: this paper is a walkthrough of how ML @ Facebook works, all the way down to server designs. It’s a dense read, but pretty damn unique—it’s not often that one of the giants actually shares this much info on their internal platforms.
Thanks to our sponsors!
At Fishtown Analytics, we work with venture-funded startups to implement Redshift, Snowflake, Mode Analytics, and Looker. Want advanced analytics without needing to hire an entire data team? Let’s chat.
Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.
The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.
If you don't want these updates anymore, please unsubscribe here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue
915 Spring Garden St., Suite 500, Philadelphia, PA 19123