Data Science Roundup #84: How to Build your Data Team, 2 ML Products & an Intro to Spatial Algos

This week includes lots of advice on building your data team, a topic that I’ve become obsessed with. Plus a couple of useful applications of ML and some solid data journalism from the NYT.

Enjoy 😀

- Tristan

Referred by a friend? Sign up here!

Two Posts You Can't Miss

What are the challenges of building a data team at a startup?

This is a topic I think a lot about, and I happened across this fantastic set of Quora answers today. All three of the top responses are incredible. Some selected quotes:

Don’t expect anyone in the organization to understand exactly the value you will bring and the specific things you are going to work on to bring that value. It is your job to shine a light on the areas you can help the company innovate.

You have to make sure you are working on what will actually drive the business forward - objectively, what is best for the business? This means learning how to say “No”.

The thing that has become incredibly clear to me over the past year is that there is also a severe talent shortage in the industry. Not a lack of people who know R or Python or SQL, but a lack of people who have experience using modern tools to solve real problems. Right now I believe that sourcing and training talent is the hardest problem in growing a startup analytics team. I’m working on a post on this topic now—if you have thoughts on this topic please email me.

I originally found this article linked from Monica Rogatti’s recent and wonderful piece How Not to Hire Your First Data Scientist—also a must-read.


Every single Machine Learning course on the internet, ranked by your reviews

Want to choose an ML course? This is the single best resource on the internet.

For this guide, I spent a dozen hours trying to identify every online machine learning course offered as of May 2017, extracting key bits of information from their syllabi and reviews, and compiling their ratings. My end goal was to identify the three best courses available and present them to you, below.

For this task, I turned to none other than the open source Class Central community, and its database of thousands of course ratings and reviews.


This Week's Top Posts

A Dive into Spatial Search Algorithms

A Dive into Spatial Search Algorithms

Given thousands of points, such as city locations, how do we retrieve the closest points to a given query point? An intuitive way to do this is:

  • Calculate the distances from the query point to every other point.

  • Sort those points by distance.

  • Return the first K items.

This is fine if we have a few hundred points. But if we have millions, these queries will be too slow to use in practice.

I had never had any need to dive into spatial search; this post is an excellent introduction. Deep yet approachable.


What do you look for when hiring an entry-level data scientist?

Amazing Quora response that could save you a lot of pain. Cliff notes:

  • Drive and determination to be a self-directed learner

  • Fundamentals of “enough” programming

  • Analyze data when the goals and metrics are not explicit or time boxed.

Highly recommended.



Take a picture, get recipe suggestions. This and the next link are both examples of what can happen when ML tech gets more broadly dispersed. This probably isn’t a full product, but as more and more folks play around with interesting consumer applications, more and more will support entire businesses.

Mobile-first had its decade; we’re in an ML-first world today.


Font Map · An AI Experiment by IDEO

Hundreds of fonts arranged using machine learning. Narrow but super-useful application of a CNN. Here’s how it was made.


Data Visualization “Versus” UI and Data Science

The data viz community has been going through some public introspection in recent months with a series of widely-read blog posts. This is the most recent, and worth the read.

I’m very interested in seeing “designer” become accepted as a top-level data role, alongside analyst, scientist, and engineer, although I also think it’s the most specialized and least common of the four. Curious to hear disagreement with this.


Why We Feel So Squeezed When We Fly

Great NYTimes exploration of recent airline trends. Simple viz, straightforward storyline: the brilliance of this piece is picking exactly the right data to tell the story and then getting out of the way.


No Feigning Surprise

You’ve almost definitely done it: “You didn’t know that?!” This is a short and sweet post (with two comics!) to prevent you from ending up on /r/iamverysmart.

I do this too much and this post has me determined to stop. Included here as a public service announcement for any of you who might be in the same boat.


Data viz of the week

Depressing yet effective: worth more than an entire article on the topic.

Depressing yet effective: worth more than an entire article on the topic.

Thanks to our sponsors!

Fishtown Analytics: Analytics Consulting for Startups

At Fishtown Analytics, we work with venture-funded startups to implement Redshift, Snowflake, Mode Analytics, and Looker. Want advanced analytics without needing to hire an entire data team? Let’s chat.


Stitch: Simple, Powerful ETL Built for Developers

Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.


By Tristan Handy

The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.

Tweet Share

If you don't want these updates anymore, please unsubscribe here.

If you were forwarded this newsletter and you like it, you can subscribe here.

Powered by Revue

915 Spring Garden St., Suite 500, Philadelphia, PA 19123