Data Science Roundup #84: How to Build your Data Team, 2 ML Products & an Intro to Spatial Algos
This week includes lots of advice on building your data team, a topic that I’ve become obsessed with. Plus a couple of useful applications of ML and some solid data journalism from the NYT.
Referred by a friend? Sign up here!
Two Posts You Can't Miss
This is a topic I think a lot about, and I happened across this fantastic set of Quora answers today. All three of the top responses are incredible. Some selected quotes:
Don’t expect anyone in the organization to understand exactly the value you will bring and the specific things you are going to work on to bring that value. It is your job to shine a light on the areas you can help the company innovate.
You have to make sure you are working on what will actually drive the business forward - objectively, what is best for the business? This means learning how to say “No”.
The thing that has become incredibly clear to me over the past year is that there is also a severe talent shortage in the industry. Not a lack of people who know R or Python or SQL, but a lack of people who have experience using modern tools to solve real problems. Right now I believe that sourcing and training talent is the hardest problem in growing a startup analytics team. I’m working on a post on this topic now—if you have thoughts on this topic please email me.
I originally found this article linked from Monica Rogatti’s recent and wonderful piece How Not to Hire Your First Data Scientist—also a must-read.
Want to choose an ML course? This is the single best resource on the internet.
For this guide, I spent a dozen hours trying to identify every online machine learning course offered as of May 2017, extracting key bits of information from their syllabi and reviews, and compiling their ratings. My end goal was to identify the three best courses available and present them to you, below.
For this task, I turned to none other than the open source Class Central community, and its database of thousands of course ratings and reviews.
This Week's Top Posts
Given thousands of points, such as city locations, how do we retrieve the closest points to a given query point? An intuitive way to do this is:
Calculate the distances from the query point to every other point.
Sort those points by distance.
Return the first K items.
This is fine if we have a few hundred points. But if we have millions, these queries will be too slow to use in practice.
I had never had any need to dive into spatial search; this post is an excellent introduction. Deep yet approachable.
Amazing Quora response that could save you a lot of pain. Cliff notes:
Drive and determination to be a self-directed learner
Fundamentals of “enough” programming
Analyze data when the goals and metrics are not explicit or time boxed.
Take a picture, get recipe suggestions. This and the next link are both examples of what can happen when ML tech gets more broadly dispersed. This probably isn’t a full product, but as more and more folks play around with interesting consumer applications, more and more will support entire businesses.
Mobile-first had its decade; we’re in an ML-first world today.
Hundreds of fonts arranged using machine learning. Narrow but super-useful application of a CNN. Here’s how it was made.
The data viz community has been going through some public introspection in recent months with a series of widely-read blog posts. This is the most recent, and worth the read.
I’m very interested in seeing “designer” become accepted as a top-level data role, alongside analyst, scientist, and engineer, although I also think it’s the most specialized and least common of the four. Curious to hear disagreement with this.
Great NYTimes exploration of recent airline trends. Simple viz, straightforward storyline: the brilliance of this piece is picking exactly the right data to tell the story and then getting out of the way.
You’ve almost definitely done it: “You didn’t know that?!” This is a short and sweet post (with two comics!) to prevent you from ending up on /r/iamverysmart.
I do this too much and this post has me determined to stop. Included here as a public service announcement for any of you who might be in the same boat.
Data viz of the week
Depressing yet effective: worth more than an entire article on the topic.
Thanks to our sponsors!
At Fishtown Analytics, we work with venture-funded startups to implement Redshift, Snowflake, Mode Analytics, and Looker. Want advanced analytics without needing to hire an entire data team? Let’s chat.
Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.
The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.
If you don't want these updates anymore, please unsubscribe here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue
915 Spring Garden St., Suite 500, Philadelphia, PA 19123