Data Science Roundup #66: The 6 Top Data Science Articles from 2016

Jan 01, 2017

Happy New Year! This issue is a Roundup of Roundups; I went back through the 2016 archives and dug up the most-clicked headlines. Make sure you didn’t miss any of these posts.

On a personal note: thanks to every one of you for making this newsletter a part of your week. Your time and attention are valuable and I appreciate you sharing them with me.

I would love to make 2017 our best year yet. If you find the Roundup useful, the biggest way you can help is by sharing with your friends and colleagues. Thanks so much, and happy 2017! 😂 🎉 🍾

- Tristan

Share via Twitter | Share on Facebook | forward this email

Referred by a friend? Sign up.

2016's Most Popular Data Science Articles

#1: What's your analytics tech stack?

Business intelligence tech has changed really dramatically over the past 3–4 years, and the most common question I get from folks in the industry is “What’s your analytics tech stack?” This post lays out my recommendations, from ETL to data warehousing to data modeling to analysis. There are surprisingly few people doing this right.

blog.fishtownanalytics.com • Share

#2: It’s time for science to abandon the term ‘statistically significant’

I’ve been covering the replication crisis for over a year at this point and believe it is profoundly important. Most articles on the topic point to systemic problems, but I’ve never seen someone go so far as to assert that we’re fundamentally doing statistics incorrectly. Here’s a quote: “Even quite respectable sources will tell you that the p-value is the probability that your observations occurred by chance. And that is plain wrong.”

aeon.co • Share

#3: Machine Learning in a Year

This is perhaps my favorite “how I taught myself machine learning” post, specifically because it also highlights the author’s failures. Learning ML isn’t easy, especially for someone with a fairly light technical background, and learning from someone else’s mistakes is invaluable.

medium.com • Share

#4: 21 Must-Know Data Science Interview Questions and Answers

Q1. Explain what regularization is and why it is useful.

Q3. How would you validate a model you created to generate a predictive model of a quantitative outcome variable using multiple regression.

This might be the best data science study guide out there. It’s also a great personal check to find your own blind spots 🙈

www.kdnuggets.com • Share

#5: Practical advice for analysis of large, complex data sets

This post, originally written for internal consumption at Google, is absolute gold. in it, the author lays out specific technical, procedural, and social guidance for how to go about conducting analytics. My favorite quote: “Credibility is the key social value for any data scientist.” If you spend your days doing analytics, this is a must-read.

www.unofficialgoogledatascience.com • Share

#6: Design Better Data Tables

We’ve all seen poor visual design of tables: left-aligned numbers? Tons of useless formatting? There’s a lot that goes into making tabular data easy to consume, and with all the attention that goes into data viz today, the UI of tabular data often gets overlooked. No longer.

medium.com • Share

#7 (Bonus!): Handy Python Libraries for Formatting and Cleaning Data

These Python libraries will make the crucial task of data cleaning a bit more bearable—from anonymizing datasets to wrangling dates and times. I’m personally going to check out PrettyPandas, as I definitely need more formatting control over the data tables I output for my clients.

blog.modeanalytics.com • Share

Thanks to our sponsors!

Fishtown Analytics: Analytics Consulting for Startups

Fishtown Analytics works with venture-funded startups to implement Redshift, BigQuery, Mode Analytics, and Looker. Want advanced analytics without needing to hire an entire data team? Let’s chat.

fishtownanalytics.com • Share

Stitch: Simple, powerful ETL built for developers

Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.

www.stitchdata.com • Share

By Tristan Handy

The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.

Tweet Share

If you don't want these updates anymore, please unsubscribe here.

If you were forwarded this newsletter and you like it, you can subscribe here.

915 Spring Garden St., Suite 500, Philadelphia, PA 19123

The Analytics Engineering Roundup

Discussion about this post