Data Science Roundup #66: The 6 Top Data Science Articles from 2016
Happy New Year! This issue is a Roundup of Roundups; I went back through the 2016 archives and dug up the most-clicked headlines. Make sure you didn’t miss any of these posts.
On a personal note: thanks to every one of you for making this newsletter a part of your week. Your time and attention are valuable and I appreciate you sharing them with me.
I would love to make 2017 our best year yet. If you find the Roundup useful, the biggest way you can help is by sharing with your friends and colleagues. Thanks so much, and happy 2017! 😂 🎉 🍾
Referred by a friend? Sign up.
2016's Most Popular Data Science Articles
Business intelligence tech has changed really dramatically over the past 3–4 years, and the most common question I get from folks in the industry is “What’s your analytics tech stack?” This post lays out my recommendations, from ETL to data warehousing to data modeling to analysis. There are surprisingly few people doing this right.
I’ve been covering the replication crisis for over a year at this point and believe it is profoundly important. Most articles on the topic point to systemic problems, but I’ve never seen someone go so far as to assert that we’re fundamentally doing statistics incorrectly. Here’s a quote: “Even quite respectable sources will tell you that the p-value is the probability that your observations occurred by chance. And that is plain wrong.”
This is perhaps my favorite “how I taught myself machine learning” post, specifically because it also highlights the author’s failures. Learning ML isn’t easy, especially for someone with a fairly light technical background, and learning from someone else’s mistakes is invaluable.
Q1. Explain what regularization is and why it is useful.
Q3. How would you validate a model you created to generate a predictive model of a quantitative outcome variable using multiple regression.
This might be the best data science study guide out there. It’s also a great personal check to find your own blind spots 🙈
This post, originally written for internal consumption at Google, is absolute gold. in it, the author lays out specific technical, procedural, and social guidance for how to go about conducting analytics. My favorite quote: “Credibility is the key social value for any data scientist.” If you spend your days doing analytics, this is a must-read.
We’ve all seen poor visual design of tables: left-aligned numbers? Tons of useless formatting? There’s a lot that goes into making tabular data easy to consume, and with all the attention that goes into data viz today, the UI of tabular data often gets overlooked. No longer.
These Python libraries will make the crucial task of data cleaning a bit more bearable—from anonymizing datasets to wrangling dates and times. I’m personally going to check out PrettyPandas, as I definitely need more formatting control over the data tables I output for my clients.
Thanks to our sponsors!
Fishtown Analytics works with venture-funded startups to implement Redshift, BigQuery, Mode Analytics, and Looker. Want advanced analytics without needing to hire an entire data team? Let’s chat.
Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.
The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.
If you don't want these updates anymore, please unsubscribe here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue
915 Spring Garden St., Suite 500, Philadelphia, PA 19123