Data Science Roundup #57: The SQL Issue!
There was a ton of great writing on the SQL analytics ecosystem this week! If SQL isn’t a huge part of your analytics stack, take this issue as a kick in the pants to change that :)
This week's best data science articles
This is the single best overview I’ve read of the database technologies that are driving the massive changes in analytics today. The piece goes into the technology under the hood of MySQL, Postgres, Redshift, and BigQuery from the perspective of an analyst who wants to analyze large datasets. If you’re not already on Redshift or BigQuery, you should read this.
I recently attended a talk at Strata where the data team at the NY Times talked about their usage of BigQuery in glowing terms. They made a compelling case, and the product continues to make aggressive updates to its core tech. If you’re making a choice of data warehouse tech right now (or think you might be in the future), you should read this.
GitHub, like much of the internet, was down yesterday, which makes this post by a Google Site Reliability Engineer just uncannily timely. In it, he uses the GitHub dataset available in BigQuery to examine GitHub downtime and comes up with some really solid conclusions. I’m always impressed when bringing a new lens to an existing dataset produces novel results.
As long as we’re talking about BI, let’s talk about methodology for a second. There is a huge draw in BI towards having “one big number” to optimize towards; take a look at this dashboard for a good example. This is almost always bad practice, even if it’s good at grabbing attention, as reality is more nuanced than a single metric can describe. This post has excellent real-world examples from Pinterest and GrubHub on what kind of damage this can do.
This guide includes a dozen SQL queries for calculating customer service metrics with raw Intercom data so that you can:
Increase customer retention
Understand customer needs
Create brand evangelists
Increase revenue through upsells
This is a great resource to anyone using any AWS service, and if you’re working with data you’re likely using at least one of them. Their Redshift section is spot on and includes several tips that I had never known even after being a Redshift user for years. The sections on EMR, Kinesis, and S3 are extensive. Give this a scan and then bookmark it as a reference.
Data viz of the week
Amazing information density!
Thanks to our sponsors!
Fishtown Analytics is a boutique analytics consultancy serving high-growth, venture-funded startups. Have analytics questions? Let’s chat.
Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.
The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.
If you don't want these updates anymore, please unsubscribe here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue
915 Spring Garden St., Suite 500, Philadelphia, PA 19123