Data Science Roundup #57: The SQL Issue!

There was a ton of great writing on the SQL analytics ecosystem this week! If SQL isn’t a huge part of your analytics stack, take this issue as a kick in the pants to change that :)

- Tristan

This week's best data science articles

Picking a cloud database for analytics: the SQL options

This is the single best overview I’ve read of the database technologies that are driving the massive changes in analytics today. The piece goes into the technology under the hood of MySQL, Postgres, Redshift, and BigQuery from the perspective of an analyst who wants to analyze large datasets. If you’re not already on Redshift or BigQuery, you should read this.


15 Awesome things you probably didn’t know about BigQuery

I recently attended a talk at Strata where the data team at the NY Times talked about their usage of BigQuery in glowing terms. They made a compelling case, and the product continues to make aggressive updates to its core tech. If you’re making a choice of data warehouse tech right now (or think you might be in the future), you should read this.


Explores GitHub Reliability with SQL

GitHub, like much of the internet, was down yesterday, which makes this post by a Google Site Reliability Engineer just uncannily timely. In it, he uses the GitHub dataset available in BigQuery to examine GitHub downtime and comes up with some really solid conclusions. I’m always impressed when bringing a new lens to an existing dataset produces novel results.

Don’t Become a Victim of One Key Metric

As long as we’re talking about BI, let’s talk about methodology for a second. There is a huge draw in BI towards having “one big number” to optimize towards; take a look at this dashboard for a good example. This is almost always bad practice, even if it’s good at grabbing attention, as reality is more nuanced than a single metric can describe. This post has excellent real-world examples from Pinterest and GrubHub on what kind of damage this can do.


Tracking Customer Service Metrics With SQL

This guide includes a dozen SQL queries for calculating customer service metrics with raw Intercom data so that you can:

  • Increase customer retention

  • Understand customer needs

  • Create brand evangelists

  • Increase revenue through upsells

My only editorialization on this piece is…don’t use Blendo to get your data into Redshift. Use Stitch or Fivetran.


📙 Amazon Web Services — A Practical Guide

This is a great resource to anyone using any AWS service, and if you’re working with data you’re likely using at least one of them. Their Redshift section is spot on and includes several tips that I had never known even after being a Redshift user for years. The sections on EMR, Kinesis, and S3 are extensive. Give this a scan and then bookmark it as a reference.


Data viz of the week

Amazing information density!

Amazing information density!

Thanks to our sponsors!

Fishtown Analytics

Fishtown Analytics is a boutique analytics consultancy serving high-growth, venture-funded startups. Have analytics questions? Let’s chat.



Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.


By Tristan Handy

The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.

Tweet Share

If you don't want these updates anymore, please unsubscribe here.

If you were forwarded this newsletter and you like it, you can subscribe here.

Powered by Revue

915 Spring Garden St., Suite 500, Philadelphia, PA 19123