What's Next in Analytics Tech. ML @ Spotify. Developing Talent. [DSR #116]

Welcome to the last new issue of 2017! I’ll be taking a break next week and then returning the following week with the top posts from the past year. Enjoy the issue and I’ll see you in a couple of weeks! Happiness and health to you and yours.

- Tristan

❤️ Want to support us? Forward this email to three friends!

🚀 Forwarded this from a friend? Sign up to the Data Science Roundup here.

The Missing Layers of the Analytics Stack

I’ve been thinking a lot about the analytics stack recently. Last week I linked to a blog post about how the modern analytics stack is modular, which started me thinking about what layers we’re still missing. This is my list:

  • Automated data cleansing

  • Column- and row-level data security

  • Automated entity resolution

  • Data enrichment

  • Anomaly detection and notification

  • Data (re-)integration

  • Data provenance

Read the full article for my complete thinking. I’d love any and all feedback.


Grow Your Own Experts

This post provides one possible answer to your company’s shortage of data talent: grow it internally from adjacent fields. We’re pursuing this strategy hiring for Fishtown Analytics—after realizing just how few people have the actual hard skills we need, we’re putting a ton of time and energy in on talent pipeline and training. It’s going to be a process building out the team, but we’re confident we’ll be happy with the results.

If you need data talent, you might want to do the same—just realize that this will require lots of work and long-term planning.


Opening the Google AI China Center

Speaking of talent, Google just opened an AI research center in China. From Fei-Fei Li, Chief Scientist @ Google Cloud:

I believe AI and its benefits have no borders. Whether a breakthrough occurs in Silicon Valley, Beijing or anywhere else, it has the potential to make everyone’s life better for the entire world. As an AI first company, this is an important part of our collective mission. And we want to work with the best AI talent, wherever that talent is, to achieve it.

This has become a serious arms race. See chart below.


Machine Learning 101

I don’t link to a lot of “Neural Networks Explained” content because I assume most of you, by now, have covered the basics. This slide deck really made the rounds this week though; it’s by a designer at Google and is an excellent intro. Go through it yourself or share with your Uncle who asked you what you did when you saw him at Thanksgiving.


Machine learning at Spotify: You are what you stream

I’ve been listening to O’Reilly’s Data Show Podcast for over two years at this point. It’s consistently solid, but this past week’s episode I thought was particularly good. Ben interviewed Christine Hung from Spotify, who had really excellent practical advice on how she runs the data org at Spotify. Skip the summary and head straight to the episode.


Introducing the SWD Podcast: Storytelling with Data

Sorry to be a bit of a podcast index this week but this is very cool—I like the SWD blog and am very excited to check out the podcast. Visualization is still an under-exposed topic.


Multivariate Map Collection

Multivariate Map Collection

Here is my attempt to collect examples of multivariate maps I’ve found and organize them into a loose categorization. With this collection, I am just trying to enumerate the various methods that have been attempted, without too much judgement as to whether it is a ‘good’ or ‘bad’ encoding.

Many of these maps are fairly…hideous…but I do appreciate the opportunity to see so many takes on the question “how do I display multi-dimensional geographical data?” The above is my favorite.


Thanks to our sponsors!

Fishtown Analytics: Analytics Consulting for Startups

At Fishtown Analytics, we work with venture-funded startups to implement Redshift, Snowflake, Mode Analytics, and Looker. Want advanced analytics without needing to hire an entire data team? Let’s chat.


Stitch: Simple, Powerful ETL Built for Developers

Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.


By Tristan Handy

The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.

Tweet Share

If you don't want these updates anymore, please unsubscribe here.

If you were forwarded this newsletter and you like it, you can subscribe here.

Powered by Revue

915 Spring Garden St., Suite 500, Philadelphia, PA 19123